Seminar Title: Spectral methods for spatial and multi-omics sequencing data
Abstract
In this talk, I discuss two statistical challenges motivated by new genomic technologies. First, I compare two widely used methods designed to estimate shared structure across multiple genomic data matrices. The first method (Stack-SVD) combines the data matrices first and then performs dimension reduction. The second method (SVD-Stack) performs these same operations in the reverse order. To better understand the trade-offs between these two distinct approaches, I derive the limiting behavior of each in the proportional asymptotic regime using recent advances in random matrix theory. This analysis reveals interpretable settings where Stack-SVD is expected to outperform SVD-Stack and vice versa. I then extend both methods to allow for dataset-specific weights and derive the optimal choice of weights for each.
Second, I consider the task of identifying genes with spatially variable expression. In many cases, the spatial locations of cells lie close to a one-dimensional manifold, making standard two-dimensional coordinate-based methods less effective as they overlook the underlying tissue structure. I introduce a spectral approach that estimates a parametric curve capturing this intrinsic structure and provides a corresponding curve-based coordinate transform. Spatially variable genes can then be identified by applying a generalized additive model in the transformed space. I demonstrate several examples where this framework leads to new scientific discoveries.