Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models
Vishesh Karwa, Harvard University
Differential privacy has emerged as a powerful tool to reason rigorously about privacy and confidentiality issues. In its purest form, differential privacy limits direct access to raw data, allowing interaction only through a noisy interface. This requires new approaches to statistical inference. In this talk, I will introduce the definition of differential privacy, followed by some of its key properties. I will present a framework for performing statistical inference under the constraint of differential privacy and its connections to measurement error and missing data models. The primary focus will be on sharing social network data for estimation of exponential random graph models. A case study using a version of the Enron email corpus data-set demonstrates the application and usefulness of the proposed techniques in solving the challenging problem of maintaining privacy and supporting open access to network data to ensure reproducibility of existing studies and discovering new scientific insights. We use a simple yet effective randomized response mechanism to generate synthetic networks under edge differential privacy, and then use likelihood based inference for missing data and Markov chain Monte Carlo techniques to fit exponential-family random graph models to the generated synthetic networks.