Seminar Series: Yuan Zhang

leaf
September 15, 2022
3:00PM - 4:00PM
EA170

Date Range
2022-09-15 15:00:00 2022-09-15 16:00:00 Seminar Series: Yuan Zhang Speaker: Yuan Zhang, Assistant Professor of Statistics, OSU Title: U-statistic reduction:  higher-order accurate inference, statistical-computational trade-off, and network moments   Abstract:  U-statistics play central roles in many statistical learning scenarios.  However, a major practical obstacle is computation: a degree-r U-statistic costs O(n^r) time to evaluate in the worst case.  To address this issue, researchers proposed to compute an incomplete U-statistic that averages over a selected set of O(n^\alpha) tuples (called ``design’’), where \alpha << r.  The past forty years of research on this topic, mostly from the perspective of experimental design, focuses on choosing designs to minimize the variance of the incomplete U-statistic.  However, little attention is paid to another important aspect of inference accuracy, namely, the accuracy of controlling inference risks (the type I error of a test and the actual coverage probability of a confidence interval) around their nominal levels.   This work fills this significant blank by proposing a higher-order accurate statistical inference framework for incomplete U-statistics.  The highlights of our contributions include the following.  Methodology-wise, our framework is widely applicable to both deterministic and randomized designs.  To analyze a newly proposed design, researchers only need to check two easy-to-verify assumptions we formulate.  This greatly simplifies the analysis task and makes it routine.  Our inference formulae are completely analytical and computable within O(n^\alpha) time, which is much faster than iterated bootstraps.  Theory-wise, our error bounds are much sharper than that in existing literature.  Our method achieves higher-order accurate risk control not only for non-degenerate kernels, but even for degenerate kernels under some designs.  Our results clearly reflect statistical-computational trade-off and provide a guidance for practitioners on choosing \alpha based on novel theoretical insights.  We also apply our approach to analyzing network U-statistics (known as network moments) and obtain the sharpest result to date.   This is a joint work with my PhD student Meijia Shao and Professor Dong Xia at HKUST.   Note: Seminars are free and open to the public. Reception to follow.   EA170 America/New_York public

Speaker: Yuan Zhang, Assistant Professor of Statistics, OSU

Title: U-statistic reduction:  higher-order accurate inference, statistical-computational trade-off, and network moments

 

Abstract: 

U-statistics play central roles in many statistical learning scenarios.  However, a major practical obstacle is computation: a degree-r U-statistic costs O(n^r) time to evaluate in the worst case.  To address this issue, researchers proposed to compute an incomplete U-statistic that averages over a selected set of O(n^\alpha) tuples (called ``design’’), where \alpha << r.  The past forty years of research on this topic, mostly from the perspective of experimental design, focuses on choosing designs to minimize the variance of the incomplete U-statistic.  However, little attention is paid to another important aspect of inference accuracy, namely, the accuracy of controlling inference risks (the type I error of a test and the actual coverage probability of a confidence interval) around their nominal levels.

 

This work fills this significant blank by proposing a higher-order accurate statistical inference framework for incomplete U-statistics.  The highlights of our contributions include the following.  Methodology-wise, our framework is widely applicable to both deterministic and randomized designs.  To analyze a newly proposed design, researchers only need to check two easy-to-verify assumptions we formulate.  This greatly simplifies the analysis task and makes it routine.  Our inference formulae are completely analytical and computable within O(n^\alpha) time, which is much faster than iterated bootstraps.  Theory-wise, our error bounds are much sharper than that in existing literature.  Our method achieves higher-order accurate risk control not only for non-degenerate kernels, but even for degenerate kernels under some designs.  Our results clearly reflect statistical-computational trade-off and provide a guidance for practitioners on choosing \alpha based on novel theoretical insights.  We also apply our approach to analyzing network U-statistics (known as network moments) and obtain the sharpest result to date.

 

This is a joint work with my PhD student Meijia Shao and Professor Dong Xia at HKUST.

 

Note: Seminars are free and open to the public. Reception to follow.