Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar: Lifeng Wang

Statistics Seminar
January 8, 2008
All Day
Cockins Hall (CH), Room 240

Title

Regularization and Variable Selection for Multiclass Support Vector Machine and Varying-Coefficient Models with Applications in Genomics

Speaker

Lifeng Wang, University of Pennsylvania

Abstract

Microarray technology has been widely used in biomedical research to study complex biological systems and disease processes. Due to its high-dimensionality, new computational and statistical methods and rigorous theoretical development are required to draw valid inferences from the data. In this talk, I will present two such problems and the methods and theory that we have developed to address these problems. The first problem is related to multiclass classification and variable selection in presence of a very large number of genes. We have proposed a regularized multiclass support vector machine, which performs classification and variable selection simultaneously through an L1-norm penalized sparse representation. A statistical learning theory is developed to quantify the generalization error, where the number of variables is allowed to grow much faster than the sample size. The second problem is related to the identification of transcription factors involved in gene regulation during a given biological process based on the time course gene expression data. To capture the dynamic behavior of gene expression, we propose to use a nonparametric varying-coefficient model for such data and present a regularized estimation procedure for variable selection that combines basis function approximations and the smoothly clipped absolute deviation penalty (SCAD). The proposed procedure simultaneously selects significant variables with time-varying effects and estimates the nonzero smooth coefficient functions. Under suitable conditions, we have established the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation. I illustrate these methods with simulations and real data examples.

Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.