Seminar: James Malley

Thu, November 17, 2011

All Day

Cockins Hall (CH), Room 212

Title

Probability Machines: Consistent Estimation of Probability and Risk Using Nonparametric Learning Machines

Speaker

James Malley, National Institutes of Health

Abstract

Many statistical learning machines provide an optimal classification for binary or category outcomes. However, probability estimates are better suited for risk evaluation using individual patient characteristics and solving problems in personalized medicine. It is known that any statistical learning machine consistent for the nonparametric regression problem is also consistent for probability estimation. Such schemes can be called probability machines. For evaluating probabilistic forecasts it is also known that strictly proper scores are referred, one example being the Brier statistic. It is shown that any consistent scheme also minimizes the expected Brier score, so that evaluation of any probability machine is transparent. Probability machines discussed include regression random forests, nearest neighbors, and others all of which use any collection of predictors, of any size and arbitrary statistical structure. For comparison the classical parametric approach of logistic regression is considered, and the learning machine logitboost. Two synthetic and two real data sets illustrate the machines. A method is displayed for whole genome probability profiles to locate transcription start sites. Several extensions are discussed, including prognostic forecasts and treatment plans, probabilistic ranking of predictor importances, and network detection.