
Speaker: Dr. Gemma Moran, Assistant Professor, Department of Statistics, Rutgers University
Speaker website: https://www.gemma-moran.com/
Title: Identifiable deep generative models via sparse decoding
Abstract: In many domains, high-dimensional data exhibits variability that can be summarized by low-dimensional latent representations, or factors. In genomics, for example, genes that comprise a biological pathway may exhibit coordinated expression patterns; by learning which genes are co-expressed, we can develop new hypotheses about the underlying biology. To learn such factors, we propose the sparse variational autoencoder. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. In the genomics example, this means each gene is active in only a few biological processes. We prove such sparse deep generative models are identifiable, and apply the sparse VAE to movie ratings, text and genomics data. We then introduce an extension of the sparse VAE to multi-study data. In this data, we expect some factors to be shared across studies, and some factors to be specific to a single study. We prove that the shared factors can be identified. Finally, we apply the multi-study sparse VAE to blood platelet gene expression data from patients across different disease conditions.