
Title
A New Robust Partial Least Squares Regression Method: RoPLS
Speaker
Asuman Turkmen, Auburn University
Abstract
Most traditional statistical techniques are especially designed for low dimensional data sets where the number of observations (n) is greater than the number of variables (p). Application of these methods for the problems such as, the survival time or the tumor class prediction of a patient, based on a high-dimensional data (n < < p), is a difficult and challenging task. The partial least squares regression (PLSR) method is gaining importance in many scientific fields that require preprocessing and analyzing high- dimensional data. The main idea in PLSR is to summarize high- dimensional and/or collinear predictor variables into a smaller set of uncorrelated, so called latent variables, which have the best predictive power. Despite of the fact that PLSR handles the multicollinearity problem, it fails to deal with data containing outliers since it is based on maximizing the sample covariance matrix between the response(s) and a set of predictor variables, which is known to be sensitive to outliers. Existence of multicollinearity and outliers is no exception in real data sets, and it leads to a requirement of robust PLSR methods.
The aim of this presentation is proving a brief overview of PLSR and introducing the proposed robust PLSR method, RoPLS, based on the weights calculated by BACON or PCOUT algorithms, and the robust criteria for determining the optimal number of components, which is a very important issue in building the PLSR model. Benchmark data sets and simulation studies are employed to demonstrate the performance of the proposed method along with diagnostic plots to visualize and classify the outliers. Non-robustness of the classical PLSR is illustrated by its unbounded sensitivity curve, whereas RoPLS, yielding a bounded sensitivity curve, is shown to be a robust method.
Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.