Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar: Rebecka Jornsten

Statistics Seminar
September 26, 2008
All Day
Cockins Hall (CH), Room 240

Title

Multi-Level Mixture Models and Simultaneous Model Selection via Rate-Distortion Theory, with Applications to Clustering and Significance Analysis of Gene Expression Data

Speaker

Rebecka Jornsten, Rutgers University

Abstract

The analysis of gene expression data present many challenges that can be formulated as model selection problems. In model-based clustering, we group genes that exhibit similar expression profiles across experimental conditions. To allow for direct and objective inference of the clustering outcome, we need to determine a sparse representation of each cluster; between which experimental conditions does the cluster expression profile truly differ? Model selection in clustering is combinatorial in the number of clusters and the number of experimental conditions, and thus presents a computationally challenging task. We introduce a simultaneous approach to subset model selection, which draws on results from rate-distortion theory. The rate-distortion formulation allows us to turn the combinatorial model selection into a fast and simple line search. Furthermore, by considering each gene as its own cluster, the simultaneous selection framework extends to significance analysis of differential expression. We can thus determine not only if a gene is differentially expressed, but also which are the disciminatory experimental conditions.

These days, data often have a complex structure, and the clustering techniques we apply should reflect this. We introduce multi-level mixture models to address this issue. The multi-level framework can incorporate multiple distance metrics into clustering simultaneously, and be used to analyze multi-factor experiments. Multi-level mixture models extend model selection in clustering to between-cluster comparisons, and can constitute a substantial savings of model parameters, allowing for more clusters to be detected than with standard clustering techniques.

Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.