
Title
Variable selection in clustering via Dirichlet process mixture models
Speaker
Sinae Kim, Texas A&M University
Abstract
The increased collection of high-dimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. A typical example is the analysis of DNA microarray data, where there is interest in discovering disease subtypes and isolating discriminating genes. The results could lead to a better understanding of the underlying biological processes and help develop targeted treatment strategies.
In this talk, I introduce a model-based method that addresses the two problems simultaneously. I adopt a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure. I update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a split-merge MCMC technique. I explore the performance of the methodology on simulated data and illustrate an application with a leukemia cancer DNA microarray study.
Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.