Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar: Mauricio Sadinle

Department of Statistics Seminar Series
January 26, 2017
All Day
209 W. Eighteenth Ave. (EA), Room 170

Title

A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage

Speaker

Mauricio Sadinle, Duke University

Abstract

Record linkage techniques allow us to combine different sources of information from a common population in the absence of unique identifiers. Linking multiple files is an important task in a wide variety of applications, since it permits us to gather information that would not be otherwise available, or that would be too expensive to collect. In practice, an additional complication appears when the data files to be linked contain duplicates. Traditional approaches to duplicate detection and record linkage output independent decisions on the co-reference status of each pair of records, which leads to non-transitive decisions that have to be reconciled in some ad-hoc fashion. The joint task of linking multiple data files and finding duplicate records within them can be alternatively posed as partitioning the data files into groups of co-referent records. We present an approach that targets this partition as the parameter of interest, thereby ensuring transitive decisions. Our Bayesian implementation allows us to incorporate prior information on the reliability of the fields in the data files, which is especially useful when no training data are available, and it also provides a proper account of the uncertainty in the duplicate detection and record linkage decisions. We show how this uncertainty can be incorporated in certain models for population size estimation using a case study on human rights violations during the civil war of El Salvador.