Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar Series: Ye Tian

Ye Tian
February 4, 2025
3:00 pm - 4:00 pm
EA 170

Speaker: Ye Tian

Title: Transfer and Multi-task Learning: Statistical Insights for Modern Data Challenges

Abstract: Knowledge transfer, a core human ability, has inspired numerous data integration methods in machine learning and statistics. However, data integration faces significant challenges: (1) unknown simi- larity between data sources; (2) data contamination; (3) high-dimensionality; and (4) privacy constraints. This talk addresses these challenges in three parts across different contexts, presenting both innovative statistical methodologies and theoretical insights.

In Part I, I will introduce a transfer learning framework for high-dimensional generalized linear models that combines a pre-trained Lasso with a fine-tuning step. We provide theoretical guarantees for both esti- mation and inference, and apply the methods to predict county-level outcomes of the 2020 U.S. presidential election, uncovering valuable insights.

In Part II, I will explore an unsupervised learning setting where task-specific data is generated from a mixture model with heterogeneous mixture proportions. This complements the supervised learning setting discussed in Part I, addressing scenarios where labeled data is unavailable. We propose a federated gradient EM algorithm that is communication-efficient and privacy-preserving, providing estimation error bounds for the mixture model parameters.

In Part III, I will introduce a representation-based multi-task learning framework that generalizes the distance-based similarity notion discussed in Parts I and II. This framework is closely related to modern applications of fine-tuning in image classification and natural language processing. I will discuss how this study enhances our understanding of the effectiveness of fine-tuning and the influence of data contamination on representation multi-task learning.

Finally, I will summarize the talk and briefly introduce my broader research interests. The three main sections of this talk are based on a series of papers [TF23, TWXF22, TWF24, TGF23] and a short course I co-taught at NESS 2024 [STL24]. More about me and my research can be found at https://yet123.com.