Data Fusion Using Hilbert Space Multi-Dimensional Models
Zheng Joyce Wang, School of Communication, The Ohio State University
With the striking advancement of modern data collection methods, complex and massive data sets are generated from various sources and contexts that are conceptually connected (e.g., “big data”). This promises to provide a better understanding of complex social and behavioral phenomena, but it also presents unprecedented challenges for the integration and interpretation of the data. When large data sets are collected from different contexts or conditions, often they can be summarized by contingency tables or cross-tabulation tables. How to integrate and synthesize these K tables into a compressed, coherent, and interpretable representation? Currently, a common solution is to try to construct a p-way joint probability distribution to reproduce the frequency data observed in the K tables. Often Bayesian causal networks are then used to reduce the number of estimated parameters by imposing conditional independence assumptions. Unfortunately, however, in many cases, no such p-way joint distribution exists in the empirical data sets that can reproduce the observed tables.