
Speaker: Yuan Zhang, Assistant Professor of Statistics, OSU
Title: U-statistic reduction: higher-order accurate inference, statistical-computational trade-off, and network moments
Abstract:
U-statistics play central roles in many statistical learning scenarios. However, a major practical obstacle is computation: a degree-r U-statistic costs O(n^r) time to evaluate in the worst case. To address this issue, researchers proposed to compute an incomplete U-statistic that averages over a selected set of O(n^\alpha) tuples (called ``design’’), where \alpha << r. The past forty years of research on this topic, mostly from the perspective of experimental design, focuses on choosing designs to minimize the variance of the incomplete U-statistic. However, little attention is paid to another important aspect of inference accuracy, namely, the accuracy of controlling inference risks (the type I error of a test and the actual coverage probability of a confidence interval) around their nominal levels.
This work fills this significant blank by proposing a higher-order accurate statistical inference framework for incomplete U-statistics. The highlights of our contributions include the following. Methodology-wise, our framework is widely applicable to both deterministic and randomized designs. To analyze a newly proposed design, researchers only need to check two easy-to-verify assumptions we formulate. This greatly simplifies the analysis task and makes it routine. Our inference formulae are completely analytical and computable within O(n^\alpha) time, which is much faster than iterated bootstraps. Theory-wise, our error bounds are much sharper than that in existing literature. Our method achieves higher-order accurate risk control not only for non-degenerate kernels, but even for degenerate kernels under some designs. Our results clearly reflect statistical-computational trade-off and provide a guidance for practitioners on choosing \alpha based on novel theoretical insights. We also apply our approach to analyzing network U-statistics (known as network moments) and obtain the sharpest result to date.
This is a joint work with my PhD student Meijia Shao and Professor Dong Xia at HKUST.
Note: Seminars are free and open to the public. Reception to follow.