Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar: Ju Hee Lee

Statistics Seminar
October 11, 2012
All Day
209 W. Eighteenth Ave. (EA), Room 170

Title

A Nonparametric Bayesian Model for Local Clustering

Speaker

Ju Hee Lee, The Ohio State University

Abstract

We propose a nonparametric Bayesian local clustering (NoB-LoC) approach for heterogeneous data. The NoB-LoC model defines local clustersas blocks of a two-dimensional data matrix and produces inference about these clusters as a nested bidirectional clustering. Using protein expression data as an example, the NoB-LoC model clusters proteins (columns) into protein sets and simultaneously creates multiple partitions of samples (rows), one for each protein set. In other words, the sample partitions are nested within the protein sets. Any pair of samples might belong to the same cluster for one protein set but not for another. These local features are different from features obtained by global clustering approaches such as hierarchical clustering, which create only one partition of samples that applies for all proteins in the data set. As an added and important feature, the NoB-LoC method probabilistically excludes sets of irrelevant proteins and samples that do not meaningfully co-cluster with other proteins and samples, thus improving the inference on the clustering of the remaining proteins and samples. Inference is guided by a joint probability model for all random elements. We provide extensive examples to demonstrate the unique features of the NoB-LoC model.

Ju Hee Lee is a visiting assistant professor of statistics at The Ohio State University. She received her PhD in Statistics at The Ohio State University under the direction of Dr. Steven N. MacEachern. She then worked with Dr. Peter Mueller and Dr. Yuan Ji as a post-doctoral fellow in the Department of Biostatistics at The University of Texas M.D. Anderson Cancer Center. Her research interests include the development of Bayesian models, exploration of their theoretical properties and construction of algorithms to fit the models. Her current focus lies in the development of models and methods for analyzing high throughput bioinformatics data, using parametric and nonparametric hierarchical models.