
Title
BART: Finding Low Dimensional Structure in High Dimensional Data
Speaker
Edward I. George, The Wharton School, University of Pennsylvania
Abstract
Consider the canonical regression setup where one wants to model the relationship between y, a variable of interest, and x1, . . . , xp, p potential predictor variables. For this general problem we propose BART (Bayesian Additive Regression Trees), a new approach to discover the form of f(x1, . . . , xp) ≡ E(Y | x1, . . . , xp) and draw inference about it. BART approximates f by a Bayesian “sum-of-trees” model where each tree is constrained by a prior to be a weak learner as in boosting. Fitting and inference are accomplished via an iterative backfitting MCMC algorithm. By using a large number of trees, which essentially yields an over-complete basis for f, we have found BART to be remarkably effective at finding highly nonlinear relationships hidden within a large number of irrelevant potential predictors.
Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.