Seminar: Sinae Kim

Tue, January 31, 2006

All Day

209 W. Eighteenth Ave. (EA), Room 170

Title

Variable selection in clustering via Dirichlet process mixture models

Speaker

Sinae Kim, Texas A&M University

Abstract

The increased collection of high-dimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. A typical example is the analysis of DNA microarray data, where there is interest in discovering disease subtypes and isolating discriminating genes. The results could lead to a better understanding of the underlying biological processes and help develop targeted treatment strategies.

In this talk, I introduce a model-based method that addresses the two problems simultaneously. I adopt a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure. I update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a split-merge MCMC technique. I explore the performance of the methodology on simulated data and illustrate an application with a leukemia cancer DNA microarray study.

Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.