
Title
Estimation of Probabilities in Large Incomplete Contingency Tables Using Semi-Parametric Mixture Models
Speaker
Daniel Manrique-Vallier, Duke University
Abstract
Many discrete multivariate datasets include impossible combinations of variables, also known as structural zeros. Models based on conditional independence assumptions, such as latent class models, offer efficient estimation of complex dependencies, and are often appropriate tools for prediction purposes. However, Bayesian versions of latent variable models for categorical data typically do not appropriately handle structural zeros. Allowing non-zero probability for impossible combinations results in biased estimation of joint and conditional probabilities, even for feasible combinations. In this talk I introduce a general estimation approach based on a semi-parametric specification and an MCMC sample-augmentation strategy. I apply this method to two common problems in social science research: the estimation of disclosure risk in public use datasets, and the estimation of the size of elusive populations. These methods deliver excellent results, as illustrated by the applications.