Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar: Sinae Kim

Statistics Seminar
January 31, 2006
All Day
209 W. Eighteenth Ave. (EA), Room 170

Title

Variable selection in clustering via Dirichlet process mixture models

Speaker

Sinae Kim, Texas A&M University

Abstract

The increased collection of high-dimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. A typical example is the analysis of DNA microarray data, where there is interest in discovering disease subtypes and isolating discriminating genes. The results could lead to a better understanding of the underlying biological processes and help develop targeted treatment strategies. 

In this talk, I introduce a model-based method that addresses the two problems simultaneously. I adopt a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure. I update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a split-merge MCMC technique. I explore the performance of the methodology on simulated data and illustrate an application with a leukemia cancer DNA microarray study. 

Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.