Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar: Sanjoy Dasgupta

Statistics Seminar
May 13, 2008
All Day
209 W. Eighteenth Ave. (EA), Room 170

Title

Random projection trees and low dimensional manifolds

Speaker

Sanjoy Dasgupta, University of California, San Diego

Abstract

The curse of dimensionality has traditionally been the bane of nonparametric statistics (histograms, kernel density estimation, nearest neighbor search, and so on), as reflected in running times and convergence rates that are exponentially bad in the dimension. This problem is all the more pressing as data sets get increasingly high dimensional. Recently the field has been rejuvenated substantially, in part by the realization that a lot of real-world data which appears high-dimensional in fact has low "intrinsic" dimension in the sense of lying close to a low-dimensional manifold. In the past few years, there has been a huge interest in learning such manifolds from data, and then using the learned structure to transform the data into a lower-dimensional space where standard statistical methods generically work better.

I'll exhibit a way to benefit from intrinsic low dimensionality without having to go to the trouble of explicitly learning its fine structure. Specifically, I'll present a simple variant of the ubiquitous k-d tree (a spatial data structure widely used in machine learning and statistics) that is provably adaptive to low dimensional structure. We call this a "random projection tree" (RP tree).

Along the way, I'll discuss different notions of intrinsic dimension—motivated by manifolds, by local statistics, and by analysis on metric spaces—and relate them. I'll then prove that RP trees require resources that depend only on these dimensions rather than the dimension of the space in which the data happens to be situated.

This is work with Yoav Freund (UC San Diego).

Sanjoy Dasgupta is an Assistant Professor in the Department of Computer Science and Engineering at UC San Diego. He received his PhD from Berkeley, and spent two years at AT&T Research Labs before joining UCSD. His area of research is algorithmic statistics, with a focus on unsupervised learning. He has recently completed a textbook, "Algorithms" (with Christos Papadimitriou and Umesh Vazirani).

Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.