
Title
Principal components analysis for high dimensional data
Speaker
Dr. Debashis Paul, Stanford University
Abstract
Suppose we have i.i.d. observations from a multivariate Gaussian distribution with mean mu and covariance matrix Sigma. We consider the problem of estimating the leading eigenvectors of Sigma when the dimension p of the observation vectors increases with the sample size n. We work under the setup where the covariance matrix is a finite rank perturbation of identity. We show that even though the ordinary principal components analysis may fail to yield consistent estimators of the eigenvectors, if the data can be sparsely represented in some known basis, then a scheme based on first selecting a set of significant coordinates and then applying PCA to the submatrix of sample covariance matrix corresponding to the selected coordinates, gives better estimates. Under suitable sparsity restrictions, we show that the risk of the proposed estimator has the optimal rate of convergence when measured in a squared-error type loss. We demonstrate the performance of our method through simulation studies and discuss some potential applications. We also state some new results about the behavior of the eigenvalues and eigenvectors of sample covariance matrix when p/n converges to a positive constant.
Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.