Selecting the Number of Principal Components: Estimation of the True Rank of a Noisy Matrix
Yunjin Choi, Stanford University
Principal component analysis (PCA) is a well-known tool in multivariate statistics. One significant challenge in using PCA is the choice of the number of principal components. In order to address this challenge, we propose an exact distribution-based method for hypothesis testing and construction of confidence intervals for signals in a noisy matrix with finite samples. Assuming Gaussian noise, we derive exact results based on the conditional distribution of the singular values of a Gaussian matrix by utilizing a post-selection inference framework. In simulation studies we find that our proposed methods compare well to existing approaches.