Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar: Jessie Jeng

Statistics Seminar
January 31, 2011
All Day
Dreese Lab (DL), Room 0113

Title

Optimal Sparse Segment Identification: Theory and Applications in CNV Analysis

Speaker

Jessie Jeng, University of Pennsylvania

Abstract

Motivated by DNA copy number variation (CNV) analysis based on high-density single nucleotide polymorphism (SNP) data, we consider the problem of identifying sparse short segments in a long sequence of noisy observations, where the number, length and location of the segments are unknown. We present a statistical characterization of the identifiable region of a segment where it is possible to reliably separate the segment from noise. An effcient likelihood ratio selection (LRS) procedure is developed, and is shown to be asymptotically optimal in the sense that the LRS can separate the signal segments from the noise as long as the signal segments are in the identifiable regions. The problem is further studied in the setting where a set of aligned sequences of observations is available. Signal segments are characterized into rare and common groups according to their carrier's proportions across sequences. A proportion adaptive segmentation (PAS) procedure is proposed, and its asymptotic optimality is presented for detecting both rare and common segments. Both LRS and PAS are demonstrated via simulations and CNV analysis on high-density SNP data. The results show that the proposed methods can yield greater gain in power for detecting the true segments than some standard signal identification procedures.