Ohio State nav bar

Seminar Series: Karl Rohe

Karl Rohe
October 8, 2020
All Day
Virtual

Title

Title:  Vintage Factor Analysis with Varimax Performs Statistical Inference with an application to a large Twitter graph

Meeting Link

*Please be sure to mute upon entry

Speaker

Karl Rohe, University of Wisconsin-Madison, Department of Statistics

Abstract

This talk has two parts.  The first part is methodological/theoretical. The second part is applied.

 

Part 1:

Factor Analysis is nearly a century old and remains popular today with practitioners.  A key step, the factor rotation, is historically controversial because it appears to be unidentifiable. This controversy goes back as far as Charles Spearman. The unidentifiability is still reported in all modern multivariate textbooks.   Part 1 of this talk will overturn this controversy and provide a positive theory for PCA with a varimax rotation.  Just as sparsity helps to find a solution in p>n regression, we show that sparsity resolves the rotational invariance of factor analysis.  PCA + varimax is fast to compute and provides a unified spectral estimation strategy for Stochastic Blockmodels, topic models (LDA), and nonnegative matrix factorization.   Moreover, the estimator is consistent for an even broader class of models and the old factor analysis diagnostics (which have been used for nearly a century) assess the identifiability.

 

Part 2:

This will discuss our lab’s website murmuration.wisc.edu that tracks 24 flocks of “elite-sphere” twitter users that discuss politics and current events.  These flocks were computed with PCA + varimax applied to a ~400k x ~1M matrix that records who “elite-sphere” twitter users follow.  We found flocks that represent Bernie Bros, White Nationalists, Academics, Media, and others.  Everyday, we analyze ~500k tweets to document the key news events, list the flocks that discussed each event, and describe how each flock discussed each event.