Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar Series: Yuan Zhang

leaf
September 15, 2022
3:00 pm - 4:00 pm
EA170

Speaker: Yuan Zhang, Assistant Professor of Statistics, OSU

Title: U-statistic reduction:  higher-order accurate inference, statistical-computational trade-off, and network moments

 

Abstract: 

U-statistics play central roles in many statistical learning scenarios.  However, a major practical obstacle is computation: a degree-r U-statistic costs O(n^r) time to evaluate in the worst case.  To address this issue, researchers proposed to compute an incomplete U-statistic that averages over a selected set of O(n^\alpha) tuples (called ``design’’), where \alpha << r.  The past forty years of research on this topic, mostly from the perspective of experimental design, focuses on choosing designs to minimize the variance of the incomplete U-statistic.  However, little attention is paid to another important aspect of inference accuracy, namely, the accuracy of controlling inference risks (the type I error of a test and the actual coverage probability of a confidence interval) around their nominal levels.

 

This work fills this significant blank by proposing a higher-order accurate statistical inference framework for incomplete U-statistics.  The highlights of our contributions include the following.  Methodology-wise, our framework is widely applicable to both deterministic and randomized designs.  To analyze a newly proposed design, researchers only need to check two easy-to-verify assumptions we formulate.  This greatly simplifies the analysis task and makes it routine.  Our inference formulae are completely analytical and computable within O(n^\alpha) time, which is much faster than iterated bootstraps.  Theory-wise, our error bounds are much sharper than that in existing literature.  Our method achieves higher-order accurate risk control not only for non-degenerate kernels, but even for degenerate kernels under some designs.  Our results clearly reflect statistical-computational trade-off and provide a guidance for practitioners on choosing \alpha based on novel theoretical insights.  We also apply our approach to analyzing network U-statistics (known as network moments) and obtain the sharpest result to date.

 

This is a joint work with my PhD student Meijia Shao and Professor Dong Xia at HKUST.

 

Note: Seminars are free and open to the public. Reception to follow.