Ohio State nav bar

Seminar Series - Graduate Student Award: Bo Luan and Andrew Richards

Statistics Seminar
April 15, 2021
3:00PM - 4:00PM
TBA

Date Range
Add to Calendar 2021-04-15 15:00:00 2021-04-15 16:00:00 Seminar Series - Graduate Student Award: Bo Luan and Andrew Richards Title Predictive Model Degrees of Freedom in Linear Regression (Bo Luan) Relaxing the Molecular Clock Assumption for Inference under the Multispecies Coalescent Model (Andrew Richards) Meeting Link Speaker Bo Luan and Andrew Richards; Statistics PhD Students Abstract Predictive Model Degrees of Freedom in Linear Regression: Overparametrized interpolating models have drawn increasing attention from machine learning. Some recent studies suggest that regularized interpolating models can generalize well. This phenomenon seemingly contradicts the conventional wisdom that interpolation tends to overfit the data and performs poorly on test data. Further, it appears to defy the bias-variance trade-off. As one of the shortcomings of the existing theory, the classical notion of model degrees of freedom fails to explain the intrinsic difference among the interpolating models since it focuses on estimation of in-sample prediction error. This motivates an alternative measure of model complexity which can differentiate those interpolating models and take different test points into account. In particular, we propose a measure with a proper adjustment based on the squared covariance between the predictions and observations. Our analysis with least squares method reveals some interesting properties of the measure, which can reconcile the “double descent” phenomenon with the classical theory. This opens doors to an extended definition of model degrees of freedom in modern predictive settings. Relaxing the Molecular Clock Assumption for Inference under the Multispecies Coalescent Model The inference of species relationships is complicated by the fact that different parts of genomes may have histories that differ from the history of the species as a whole. The multispecies coalescent process is commonly used to model one source of this divergence, incomplete lineage sorting, or ILS. Chifman and Kubatko (2015) previously developed a probability model under the multispecies coalescent and Jukes-Cantor substitution model when the molecular clock holds. Here, we present a generalization of that work to allow for mutation rates to differ among species. This will enable better phylogentic inference in cases where the molecular clock does not hold. TBA Department of Statistics stat@osu.edu America/New_York public

Title

Predictive Model Degrees of Freedom in Linear Regression (Bo Luan)

Relaxing the Molecular Clock Assumption for Inference under the Multispecies Coalescent Model (Andrew Richards)

Meeting Link

Speaker

Bo Luan and Andrew Richards; Statistics PhD Students

Abstract

Predictive Model Degrees of Freedom in Linear Regression:
Overparametrized interpolating models have drawn increasing attention from machine learning. Some recent studies suggest that regularized interpolating models can generalize well. This phenomenon seemingly contradicts the conventional wisdom that interpolation tends to overfit the data and performs poorly on test data. Further, it appears to defy the bias-variance trade-off. As one of the shortcomings of the existing theory, the classical notion of model degrees of freedom fails to explain the intrinsic difference among the interpolating models since it focuses on estimation of in-sample prediction error. This motivates an alternative measure of model complexity which can differentiate those interpolating models and take different test points into account. In particular, we propose a measure with a proper adjustment based on the squared covariance between the predictions and observations. Our analysis with least squares method reveals some interesting properties of the measure, which can reconcile the “double descent” phenomenon with the classical theory. This opens doors to an extended definition of model degrees of freedom in modern predictive settings.

Relaxing the Molecular Clock Assumption for Inference under the Multispecies Coalescent Model
The inference of species relationships is complicated by the fact that different parts of genomes may have histories that differ from the history of the species as a whole. The multispecies coalescent process is commonly used to model one source of this divergence, incomplete lineage sorting, or ILS. Chifman and Kubatko (2015) previously developed a probability model under the multispecies coalescent and Jukes-Cantor substitution model when the molecular clock holds. Here, we present a generalization of that work to allow for mutation rates to differ among species. This will enable better phylogentic inference in cases where the molecular clock does not hold.