
Jie Ding, The Ohio State University
Association test using general pedigree data with missing genotypes
Family-based association test is one way of mapping disease susceptibility genes by testing for association between marker genotypes and disease phenotypes in family data. Missing data usually exist in real data sets. We have proposed the Monte Carlo pedigree disequilibrium test (MCPDT) to test for association using general pedigree data with missing genotypes. It generates a Monte Carlo sample of missing genotypes conditioning on observed genotypes and calculates test statistics based on the MC sample. Since MCPDT uses population information, we are also trying to take into account population substructure in a Bayesian framework. An MCMC algorithm is used to estimate population substructure from pedigree data with multiple unlinked markers and the estimates are then used in MCPDT. Simulation studies have been done to evaluate the performance of these methods.
Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.
Yonggang Yao, The Ohio State University
Another Look at Linear Programming for Feature Selection via Methods of Regularization
We consider statistical procedures for feature selection defined by a family of regularization problems with convex piecewise linear loss functions and penalties of L1 nature. Many known statistical procedures (e.g. quantile regression and support vector machines with L1 norm penalty) are subsumed under this category. Computationally, the regularization problems are a special family of parametric linear programming (LP) problems, which are known as `parametric cost LP' or `parametric right-hand-side LP' in the optimization theory. Exploiting the connection with the LP theory, we lay out general algorithms, namely, the simplex algorithm and the tableau-simplex algorithm for generating regularized solution paths for the feature selection problems. Furthermore, by utilizing the structural traits of the relevant LP problems, we simplify the tableau-simplex algorithm for fast anticycling computation. The significance of such algorithms is that they allow a complete exploration of the model space along the paths and provide a broad view of persistent features in the data. The implications of the general path-finding algorithms are outlined for a few statistical procedures, and they are illustrated with numerical examples.
Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.