
Title
Probability Machines: Consistent Estimation of Probability and Risk Using Nonparametric Learning Machines
Speaker
James Malley, National Institutes of Health
Abstract
Many statistical learning machines provide an optimal classification for binary or category outcomes. However, probability estimates are better suited for risk evaluation using individual patient characteristics and solving problems in personalized medicine. It is known that any statistical learning machine consistent for the nonparametric regression problem is also consistent for probability estimation. Such schemes can be called probability machines. For evaluating probabilistic forecasts it is also known that strictly proper scores are referred, one example being the Brier statistic. It is shown that any consistent scheme also minimizes the expected Brier score, so that evaluation of any probability machine is transparent. Probability machines discussed include regression random forests, nearest neighbors, and others all of which use any collection of predictors, of any size and arbitrary statistical structure. For comparison the classical parametric approach of logistic regression is considered, and the learning machine logitboost. Two synthetic and two real data sets illustrate the machines. A method is displayed for whole genome probability profiles to locate transcription start sites. Several extensions are discussed, including prognostic forecasts and treatment plans, probabilistic ranking of predictor importances, and network detection.