Seminar: Zheng Joyce Wang

Thu, October 25, 2018

All Day

209 W. Eighteenth Ave. (EA), Room 170

Title

Data Fusion Using Hilbert Space Multi-Dimensional Models

Speaker

Zheng Joyce Wang, School of Communication, The Ohio State University

Abstract

With the striking advancement of modern data collection methods, complex and massive data sets are generated from various sources and contexts that are conceptually connected (e.g., “big data”). This promises to provide a better understanding of complex social and behavioral phenomena, but it also presents unprecedented challenges for the integration and interpretation of the data. When large data sets are collected from different contexts or conditions, often they can be summarized by contingency tables or cross-tabulation tables. How to integrate and synthesize these K tables into a compressed, coherent, and interpretable representation? Currently, a common solution is to try to construct a p-way joint probability distribution to reproduce the frequency data observed in the K tables. Often Bayesian causal networks are then used to reduce the number of estimated parameters by imposing conditional independence assumptions. Unfortunately, however, in many cases, no such p-way joint distribution exists in the empirical data sets that can reproduce the observed tables.

General procedures for constructing, estimating, and testing Hilbert space multi-dimensional (HSM) models, built from quantum probability theory, are presented. HSM models can be applied to collections of K different contingency tables obtained from a set of p variables that are measured under different contexts. A context is defined by the measurement of a subset of the p variables that are used to form a table. HSM models provide a representation of the collection of K tables in a low dimensional vector space, even when no single joint probability distribution across the p variables exists. HSM models produce parameter estimates that provide a simple and informative interpretation of the complex collection of tables. An empirical application will be discussed to illustrate these ideas.