
Dissertation Defense: Zhenbang Jiao, Statistics PhD Candidate
Title: Case Sensitivity in Lasso Regression: Diagnostics, Model Complexity, and Robust Model Evaluation
Abstract: This dissertation studies case influence in the Lasso regression using Cook's distance. Due to the nondifferentiability of the L1 penalty, the estimated coefficients in the Lasso do not have a closed form, and neither does Cook's distance. To find the case-deleted Lasso solution without refitting the model, we develop a case-weight approach that tracks how the model changes when each observation is down-weighted from 1 to 0 continuously. This yields a piecewise linear solution path with respect to a simple function of the weight parameter and naturally produces Cook’s distance for Lasso. Moreover, we introduce a case influence graph to visualize how the contribution of each data point changes with the penalty parameter, which also offers new insights for model evaluation.
Building on the case-weighted solution path, we present an alternative way to derive the degrees of freedom of the Lasso model. Our approach examines the sensitivity of the fitted value to a change in the observed response through the perturbation of its case weight in the vicinity of the full data. This approach can be extended to other regularized regression models with squared error loss.
From the case influence graph, we observe that the average Cook’s distance across all data points can be considered as a measure of model robustness. Inspired by this idea, we use the average Cook’s distance and combine it with cross-validation (CV) into a single criterion for feature selection. Under mild assumptions, we show that this criterion is consistent in selecting the true feature set in linear regression, which is not the case for CV. This criterion can also be extended to generalized linear models.
Advisor: Yoonkyung Lee