Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar: Michael Schweinberger

Statistics Seminar
February 5, 2013
All Day
Nineteenth Avenue 140W, Room 270

Title

Second-generation exponential-family models of networks: Scaling up

Speaker

Michael Schweinberger, Penn State University

Abstract

In this talk, I consider two important problems arising in the statistical analysis of networks (e.g., social networks) from lack of scalability.

The first problem is the problem of model degeneracy of statistical exponential-family models with transitivity and other forms of dependence, which has obstructed the statistical analysis of networks more than anything else. The problem of model degeneracy is rooted in the lack of scalability of exponential-family models. I introduce a novel class of second-generation exponential-family models which addresses the lack of scalability of first-generation models and reduces the problem of model degeneracy, while retaining the scientific appeal of first-generation models: i.e., the simplicity and flexibility to model interesting forms of dependence, including transitivity. I discuss a Bayesian approach based on auxiliary-variable Markov chain Monte Carlo methods and demonstrate that second-generation exponential-family models allow to model transitivity without inducing model degeneracy.

The second problem I consider is the lack of scalability of statistical algorithms. I introduce a novel generalized variational EM algorithm which increases scalability by multiple orders of magnitude. The generalized variational EM algorithm takes advantage of the sparsity of networks, convenient convexity properties of exponential-family models, and minorization-maximization (MM) algorithms. I apply the generalized variational EM algorithm to a World Wide Web network with more than 131,000 nodes and 17 billion edge variables.