Seminar Series: Dr. Gemma Moran

Headshot of Gemma Moran
March 6, 2025
3:00PM - 4:00PM
EA170

Date Range
2025-03-06 15:00:00 2025-03-06 16:00:00 Seminar Series: Dr. Gemma Moran Speaker: Dr. Gemma Moran, Assistant Professor, Department of Statistics, Rutgers UniversitySpeaker website: https://www.gemma-moran.com/Title: Identifiable deep generative models via sparse decodingAbstract: In many domains, high-dimensional data exhibits variability that can be summarized by low-dimensional latent representations, or factors. In genomics, for example, genes that comprise a biological pathway may exhibit coordinated expression patterns; by learning which genes are co-expressed, we can develop new hypotheses about the underlying biology. To learn such factors, we propose the sparse variational autoencoder. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. In the genomics example, this means each gene is active in only a few biological processes. We prove such sparse deep generative models are identifiable, and apply the sparse VAE to movie ratings, text and genomics data.  We then introduce an extension of the sparse VAE to multi-study data. In this data, we expect some factors to be shared across studies, and some factors to be specific to a single study. We prove that the shared factors can be identified. Finally, we apply the multi-study sparse VAE to blood platelet gene expression data from patients across different disease conditions.   EA170 America/New_York public

Speaker: Dr. Gemma Moran, Assistant Professor, Department of Statistics, Rutgers University

Speaker website: https://www.gemma-moran.com/

Title: Identifiable deep generative models via sparse decoding

Abstract: In many domains, high-dimensional data exhibits variability that can be summarized by low-dimensional latent representations, or factors. In genomics, for example, genes that comprise a biological pathway may exhibit coordinated expression patterns; by learning which genes are co-expressed, we can develop new hypotheses about the underlying biology. To learn such factors, we propose the sparse variational autoencoder. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. In the genomics example, this means each gene is active in only a few biological processes. We prove such sparse deep generative models are identifiable, and apply the sparse VAE to movie ratings, text and genomics data.  We then introduce an extension of the sparse VAE to multi-study data. In this data, we expect some factors to be shared across studies, and some factors to be specific to a single study. We prove that the shared factors can be identified. Finally, we apply the multi-study sparse VAE to blood platelet gene expression data from patients across different disease conditions.