The following faculty, alumni and students will be representing The Ohio State University's Department of Statistics next week at the 2025 Joint Statistical Meetings (JSM) in Nashville, Tennessee. We invite you to join us in celebrating their achievements and join their sessions if you will be attending the event.
Radu Herbei: Simulating the maximum and its location for constrained Brownian processes
Sunday, Aug 3: 3:05 PM - 3:20 PM
2769
Contributed Papers
Music City Center
We consider the problem of exact simulation from the joint distribution of the maximum and its location for several Brownian processes: the Brownian meander, restricted Brownian meander and the Brownian excursion. Such distributions have complicated probability density functions (pdfs), expressed in terms of infinite series. Thus, a direct sampling approach is not feasible. In this work we derive the joint pdf of the maximum and its location for the restricted Brownian meander process as an infinite series and devise exact sampling algorithms for all three processes above. We present a simulation study to assess the efficiency of our algorithms.
Seth Adarkwah Yiadom (Co-Author), Rebecca Aldridge (Speaker): Sensitivity Analyses for Nonignorable Selection Bias When Estimating Subgroup Parameters in Nonprobability Samples
Sunday, Aug 3: 4:00 PM - 5:50 PM
0564
Topic-Contributed Paper Session
Music City Center
Room: CC-104E
Selection bias in survey estimates is a major concern, for both non-probability samples and probability samples with low response rates. The proxy-pattern mixture model (PPMM) has been proposed as a method for conducting a sensitivity analysis that allows selection to depend on survey outcomes of interest, i.e., assuming a nonignorable selection mechanism. Indices based on the PPMM have been proposed and used to quantify the potential for non-ignorable nonresponse or selection bias, including the SMUB for means and the MUBP for proportions. These methods require information from a reference data source, such as a large probability-based survey, with summary-level auxiliary information for the target population of interest (means, variances, and covariances of the auxiliary variables). To this point, the SMUB/MUBP measures have exclusively been used to estimate bias in overall population-level estimates. Extension to domain-level estimates is straightforward if the reference data source contains the domain indicator so that population-level margins within the domain of interest can be calculated.
However, interest may often lie in subgroups for which population-level summaries are not available. This will happen in cases where the domain indicator is observed on the survey only (not in the reference data source) and can also happen when the goal is estimation within intersectional subgroups for which stable/reliable population-level estimates of auxiliary variables may not be available. To combat this issue, we propose creating nonignorable selection weights based on the PPMM and using these weights for domain estimation and subsequent calculation of the SMUB/MUBP within subgroups.
These PPMM selection weights rely on a single sensitivity parameter that ranges from 0 to 1 and captures a range of selection mechanisms, from ignorable to an "extreme" non-ignorable mechanism where selection depends only on the outcome of interest. The PPMM selection weights are based on the re-expression of the PPMM as a selection model, using the known equivalence between pattern-mixture models and selection models. In this talk, we briefly describe the re-expression of the PPMM as a selection model and illustrate the use of the novel non-ignorable selection weights to estimate various subgroup quantities using the Census Household Pulse Survey under a range of assumptions on the selection mechanism.
Sebastian Kurtek (Speaker); Fangyi Wang, Oksana Chkrebtii (Co-Authors): Probabilistic Size-and-shape Functional Mixed Models
Sunday, Aug 3: 4:00 PM - 5:50 PM
0341
Invited Paper Session
Music City Center
Room: CC-102A
The reliable recovery and uncertainty quantification of a fixed effect function μ in a functional mixed model, for modelling population- and object-level variability in noisily observed functional data, is a notoriously challenging task: variations along the x and y axes are confounded with additive measurement error, and cannot in general be disentangled. The question then as to what properties of μ may be reliably recovered becomes important. We demonstrate that it is possible to recover the size-and-shape of a square-integrable μ under a Bayesian functional mixed model. The size-and-shape of μ is a geometric property invariant to a family of space-time unitary transformations, viewed as rotations of the Hilbert space, that jointly transform the x and y axes. A random object-level unitary transformation then captures size-and-shape preserving deviations of μ from an individual function, while a random linear term and measurement error capture size-and-shape altering deviations. The model is regularized by appropriate priors on the unitary transformations, posterior summaries of which may then be suitably interpreted as optimal data-driven rotations of a fixed orthonormal basis for the Hilbert space. Our numerical experiments demonstrate utility of the proposed model, and superiority over the current state-of-the-art. This is joint work with Fangyi Wang (Statistics, Ohio State University), Oksana Chkrebtii (Statistics, Ohio State University) and Karthik Bharath (Mathematical Sciences, University of Nottingham).
Laura Kubatko: Composite likelihood approaches to phylogenetic inference under the multispecies coalescent
Monday, Aug 4: 8:30 AM - 10:20 AM
0774
Topic-Contributed Paper Session
Music City Center
Room: CC-104E
Species-level phylogenetic inference under the multispecies coalescent model remains challenging in the typical inference frameworks (e.g., the likelihood and Bayesian frameworks) due to the dimensionality of the space of both gene trees and species trees. Algebraic approaches intended to establish identifiability of species tree parameters have suggested computationally efficient inference procedures that have been widely used by empiricists and that have good theoretical properties, such as statistical consistency. However, such approaches are less powerful than approaches based on the full likelihood. In this talk, I will describe how the use of a composite likelihood approach enables computationally tractable statistical inference of the species-level phylogenetic relationships for genome-scale data. In particular, asymptotic properties of estimators obtained in the composite-likelihood framework will be derived, and the utility of the methods developed will be demonstrated with both simulated and empirical data.
Jason Hsu (Speaker, Organizer): The Subgroup Mixable Estimation principle makes stratified conditional HR math the marginal HR, and cures an oversight in stratified analysis in computer packages
Monday, Aug 4: 10:30 AM - 12:20 PM
0715
Topic-Contributed Paper Session
Music City Center
Room: CC-101B
I will start by deriving simply the connect between Hazard Ratio (HR) and Living Longer Probability (LLP) using Cox's originally Mann-Whitney testing definition of HR, a connection that not only medical officer can understand but also makes all subsequent explanations easier.
Liu et al (2022, Biometrical Journal 64:198–224) showed that mixing HR in subgroups dilutes them if the biomarker has a Prognostic effect, giving the illusion that patients with high and low biomarker values benefit more from the new treatment. Key message of that paper is patients with middling biomarker values may be unfairly deprived of lifesaving new treatments.
Reversing the thought process for HR, separating patients into subgroups by prognostic factors boosts the apparent efficacy of the new treatment in each stratum. In turn, if stratified analysis for ratio efficacy ignores the prognostic effect (as in all the computer packages currently), then a sponsor can game the system by stratifying on as many prognostic factors as possible, to artificially lower HR for the overall population, a danger to public health.
The Subgroup Mixable Estimation (SME) principle takes the prognostic effect into account to properly mixes ratio efficacy in subgroups, removing such danger. In the case of HR, Liu et al (2022, Biometrical Journal 64:246–255) describes how SME mixes by first converting each conditional HR to an LLP, mixes LLPs, then converting the mixed/unconditional LLP to get the marginal HR. Using a real data set, I will show in fact SME mixing makes the conditional HR and marginal HR equal in the population/parameter space, dispelling the notion that they are apples and oranges.
Xinyu Zhang (Speaker), Steven MacEachern (Co-Author), Sally Paganin (Session Chair): Bayesian Restricted Likelihood, Generalized Bayes and Model Misspecification
Monday, Aug 4: 10:30 AM - 12:20 PM
0581
Topic-Contributed Paper Session
Music City Center
Room: CC-106A
Model misspecification is problematic for Bayesians. Various methods have been proposed to modify the update from prior distribution to posterior distribution when one acknowledges that one's model is imperfect. Two current proposals are Bayesian restricted likelihood (BRL) methods and generalized Bayesian (GB) methods. The first focuses on aspects of the model that are believed to be modeled well and derives the posterior distribution by conditioning on an insufficient statistic (e.g., Huber's M-estimate) that captures those aspects of the model. The second focuses on a particular inference, making use of a loss function to define the target of inference. The usual Bayesian update is altered: the likelihood function is replaced with the exponentiated negative loss. We compare these two approaches by investigating both finite sample and asymptotic behavior when the data come from a location family. Suggestions for a choice between these two methods are given in different cases.
Patrick McHugh (Speaker); Chris Hans, Elly Kaizar (Co-Authors): Bayesian Estimation of Population Average Causal Effects from a Collection of Trials
Monday, Aug 4: 10:30 AM - 12:20 PM
4060
Contributed Speed
Music City Center
Room: CC-104B
We propose two Bayesian mixed effects models, one linear and one linear spline, to estimate the average effect of a binary treatment on a target population via one-stage meta-analysis. In an extension of previous work in a frequentist setting, we aim to combine information from a collection of randomized trials to identify the average treatment effect (ATE) on a separate, non-study target population, by allowing study-level random effects to account for variations in outcome due to differences in studies. We examine, with simulation studies, several situations in which weight-based estimators and/or nonparametric machine learning methods face challenges in estimating a population ATE, and highlight the advantages of our parametric, outcome-based estimators.
Alex Nguyen (Presenting Author), Sally Paganin (Co-Author): Marginal Likelihood Estimation in Bayesian Item Response Theory Models
Monday, Aug 4: 10:30 AM - 12:20 PM
4060
Contributed Speed
Music City Center
Room: CC-104B
Item Response Theory (IRT) models are widely used in psychometrics to measure latent traits like ability from test responses. Standard IRT models assume a fixed trait distribution, which may not capture population differences. To address this, Bayesian nonparametric (BNP) IRT models use priors such as the Chinese Restaurant Process (CRP) to allow data-driven clustering of individuals. While this increases flexibility, it also adds computational complexity, making accurate marginal likelihood estimation crucial for comparing BNP and parametric models using Bayes factors, especially in high-dimensional settings. Bridge sampling provides a more stable alternative to traditional Monte Carlo methods but must be adapted to handle the discrete clustering structure of BNP models.
This work develops a two-step method for marginal likelihood estimation in BNP IRT models. First, latent traits are integrated out using the model's structure, reducing computation. Second, bridge sampling is refined, incorporating moment-matching and variance reduction techniques to improve accuracy. Simulation results show that this method enhances efficiency and precision.
Massimiliano Russo (Co-Author): Missing data imputation via truncated Gaussian factor analysis with application to metabolomics data
Monday, Aug 4: 10:30 AM - 12:20 PM
4060
Contributed Speed
Music City Center
Room: CC-104B
In metabolomics, which involves the study of small molecules in biological samples, data are often acquired via mass spectrometry, resulting in high-dimensional, highly correlated datasets with frequent missing values. Both missing at random (MAR), due to acquisition or processing errors, and missing not at random (MNAR), often caused by values falling below detection thresholds, are common. Imputation is thus a critical component of downstream analysis. We propose a novel Truncated Gaussian Infinite Factor Analysis (TGIFA) model to address these challenges. By incorporating truncated Gaussian assumptions, TGIFA respects the physical constraints of the data, while the use of an infinite latent factor framework eliminates the need to pre-specify the number of factors. Our Bayesian inference approach jointly models MAR and MNAR mechanisms and, via a computationally efficient exchange algorithm, provides posterior uncertainty quantification for both imputed values and missingness types. We evaluate TGIFA through extensive simulation studies and apply it to a urinary metabolomics dataset, where it yields sensible and interpretable imputations with associated uncertainty estimates.
Chris Hans (Chair): Current Development in Nonparametric Bayes
Monday, Aug 4: 2:00 PM - 3:50 PM
4070
Contributed Papers
Music City Center
Room: CC-202B
Steven MacEachern (Discussant) Data Privacy: Frontiers and Barriers of Differential Privacy
Monday, Aug 4: 2:00 PM - 3:50 PM
0575
Topic-Contributed Paper Session
Music City Center
Room: CC-104B
Ruoyuan Qian (Presenter); Biqing Yang, Bo Lu, Xinyi Xu (Co-Authors): Matching-assisted power prior for incorporating real-world data in randomized clinical trial
Monday, Aug 4: 2:00 PM - 3:50 PM
4072
Contributed Papers
Music City Center
Room: CC-208A
Leveraging external data to supplement randomized clinical trials has become increasingly popular, particularly for medical device and drug discovery. In rare diseases, recruiting enough patients for large-scale trials is challenging. To address this, small hybrid trials can borrow historical controls or real-world data (RWD) to increase statistical power, but borrowing must follow a statistically principled manner. This paper proposes a matching-assisted power prior method to mitigate bias when incorporating external data. Using template matching, a subset of comparable external subjects is grouped and assigned weights based on their similarity to the current study population. These weighted groups are then integrated into Bayesian inference through power priors. Unlike traditional power prior methods, which apply similar discounts to all control patients, our approach pre-selects high-quality controls, improving the reliability of borrowed data. Through simulation studies, we compare its performance with the propensity score-integrated power prior approach. Finally, we demonstrate its practical implementation using data from a real acupuncture clinical trial.
Oksana Chkrebtii (Chair and Organizer): SPES and Q&P Student Paper Award
Monday, Aug 4: 2:00 PM - 3:50 PM
0723
Topic-Contributed Paper Session
Music City Center
Room: CC-205C
Paul Wiemann (Co-Author): Scalable non-Gaussian variational inference for spatial fields using sparse autoregressive normalizing flows
Tuesday, Aug 5: 8:30 AM - 10:20 AM
0220
Invited Paper Session
Music City Center
Room: CC-207C
We introduce a novel framework for scalable and flexible variational inference targeting the non-Gaussian posterior of a latent continuous function or field. For both the prior and variational family, we consider sparse autoregressive structures corresponding to nearest-neighbor directed acyclic graphs. Within the variational family, conditional distributions are modeled with highly flexible normalizing flows. We provide an algorithm for doubly stochastic variational optimization, achieving polylogarithmic time complexity per iteration. Empirical evaluations show that our method offers improved accuracy compared to existing techniques.
Jae Ho Chang (Presenter); Max Russo, Subhadeep Paul (Co-Authors): Heterogenous transfer learning for high dimensional regression with feature mismatch
Tuesday, Aug 5: 10:30 AM - 12:20 PM
4097
Contributed Papers
Music City Center
Room: CC-103B
We study transferring knowledge from a source domain to a target domain to learn a high-dimensional regression model with possibly different features. Recently, the statistical properties of homogeneous transfer learning have been investigated. However, most homogeneous transfer and multi-task learnings assume that the target and proxy domains have the same feature space. However, target and proxy feature spaces are often not fully matched due to the inability to measure some variables in the target data-poor environments. Existing heterogeneous transfer learning methods do not provide statistical error guarantees. We propose a two-stage method that learns the relationship between the missing and observed features through a projection step and then solves a joint penalized optimization problem. We develop upper bounds on the method's parameter estimation and prediction risks, assuming that the proxy and the target domain parameters are sparsely different. Our results elucidate how estimation and prediction error depend on the complexity of the model, sample size, the extent of overlap, and correlation between matched and mismatched features.
Fandi Chang (Presenter), Elly Kaizar (Co-Author): A Flexible Bayesian Multivariate Ordinal Regression Model for Language Sample Scale Data
Tuesday, Aug 5: 10:30 AM - 12:20 PM
4099
Contributed Posters
Music City Center
Room: CC-Hall B
One common type of outcome in Language Sample Analysis (LSA) is the sum of ordinal variables, which can be difficult to model. Classical approaches often assume outcomes are independent with additional distributional assumptions. Common choices include linear regression, which assumes outcomes are continuous, and logistic regression, which assumes outcomes follow a binomial distribution. However, linear regression assumes equal intervals between outcome categories, while logistic regression ignores the dependence among ordinal outcomes. Both models may fail to reflect the inherent ordering and differences in the data. Therefore, we proposed a variation of a cumulative ordinal model. Extra flexibility was introduced by allowing the probit link function to have a covariate-specific standard deviation. Additionally, we adopted a Bayesian and hierarchical framework that facilitates parameter estimation and enables direct probabilistic inference about parameters of interest. The proposed model improved fit over logistic and linear regressions on a LSA dataset collected from a study to understand how cognitive and language challenges interfere with expository abilities.
Torey Hilbert (Presenter), Steven MacEachern (Co-Author): Symmetrization of Martingale Posterior Distributions
Tuesday, Aug 5: 10:30 AM - 12:20 PM
4099
Contributed Posters
Music City Center
Room: CC-Hall B
The martingale posterior framework, recently proposed by Fong et al., is based on a sequence of one step ahead predictive distributions. It leads to computationally efficient inference in parametric and nonparametric settings. The predictive distributions implicitly provide a joint model for an infinite sequence of data. The observed data, arbitrarily considered to be Y1 through Yn, form the beginning of the sequence and the tail of the sequence is missing. Filling in the missing data allows one to summarize Y1 through Y∞ (or through YN, N large in practice). A typical summary, such as the mean, is regarded as a parameter.
In cases where the martingale posterior model does not match a de Finetti model, the joint distribution over the Y's is not exchangeable, and so the indices { 1, ..., n } of the data affect the analysis. We investigate methods of symmetrizing inference in these models. In some settings, re-indexing the observed data to { a1 < ... < an } and sending a1 to infinity is analytically tractable, and we recover classical Bayesian models with known priors. Additionally, we investigate the effect of nesting the nonexchangeable models within exchangeable models.
Asuman Turkmen: Hamiltonian Monte Carlo Approaches to Bayesian Empirical Likelihood in Longitudinal Data Analytics
Tuesday, Aug 5: 2:00 PM - 3:50 PM
4128
Contributed Papers
Music City Center
Room: CC-103A
There is a rich and growing body of research focusing on the development of empirical likelihood (EL) estimation and inference methods. Unlike traditional likelihood-based methods, EL techniques rely on a set of unbiased estimating equations (EE) that summarize the parametric constraints of the model, offering greater flexibility in handling complex data, while solving restricted optimization by using Lagrange Multipliers. In recent years, significant progress has been made leveraging the EL approach for longitudinal data analysis and its adaptation to the Bayesian paradigm. In this work, we combine both ideas. Standard Markov chain Monte Carlo (MCMC) procedures within the Bayesian EL (BEL) framework are particularly challenging due to the nonparametric nature of the EL and the restrictions imposed by EE, presenting several limitations that affect their efficiency, mainly dealing with dependent data. Alternatively, by imposing a simple correlation structure often useful in such studies, we develop a method that jointly estimates regression and correlation parameters via the Hamiltonian Monte Carlo (HMC) algorithm that can efficiently sample from the BEL-based posterior distribution.
Arnab Auddy: Tucker Decomposition with Structured Core: Identifiability, Stability and Computability
Tuesday, Aug 5: 2:00 PM - 3:50 PM
0760
Topic-Contributed Paper Session
Music City Center
Room: CC-106A
We consider the Tensor Tucker decomposition and show that it is uniquely identified up to sign and permutation of the columns of the component matrices, and is stable under small perturbations, when the core tensor satisfies certain structural support conditions. When affected by noise, we get stand-alone error bounds of each column, unaffected by the others. We show that if the core of a higher order tensor consists of random entries, the uniqueness and stability properties hold with high probability even when the elements of the core tensor are nonzero with probability close to but bounded away from one. We also furnish algorithms for performing tensor decompositions in these settings. From an application perspective, our results are useful in making inference about paired latent variable models and can be related to Kronecker-product dictionary learning.
Shili Lin (Discussant): Recent Methodological Advances in Polygenic Risk Scores
Tuesday, Aug 5: 2:00 PM - 3:50 PM
0767
Topic-Contributed Paper Session
Music City Center
Room: CC-105B
Xuerong Wang (Presenter), Yoonkyung Lee (Co-Author): The Predictive Degrees of Freedom of LASSO
Wednesday, Aug 6: 8:30 AM - 10:20 AM
4141
Contributed Papers
Music City Center
Room: CC-105B
The double descent phenomenon observed in overparameterized machine learning models appears to defy classical prediction risk theory and has spurred considerable research. Recently, a notion of predictive model degrees of freedom (PDOF) has been proposed as an alternative measure of model complexity to explain the double descent phenomenon with a focus on linear modeling procedures. We extend PDOF to the nonlinear case by first studying the lasso model. The PDOF for lasso involves the covariance matrix of the lasso estimator, for which no closed-form expression exists. Furthermore, existing covariance matrix estimators only work in the under-parameterized case. To fill this gap, we explore two estimators: one based on the iterative soft-thresholding algorithm, and the other based on the infinitesimal jackknife. In a simulation study, we compare these estimators with bootstrap and other covariance matrix estimators based on approximate lasso solutions. Beyond lasso, the infinitesimal jackknife approach can be used to quantify the PDOF of other algorithmic models such as random forests and neural networks.
Asuman Turkmen (Chair): Design of Experiments and Statistical Analysis for Modern Applications
Wednesday, Aug 6: 8:30 AM - 10:20 AM
4144
Contributed Papers
Music City Center
Room: CC-Davidson Ballroom A3
This session focuses on advances in design and analysis of experiments as well as statistical inference for complex physical systems.
Jianxin Chen (Presenter); Xiaoxuan Cai (Co-Author): Mixed-Effects Models for Analyzing Intensive Longitudinal Data on Suicidal Ideation in PTSD
Wednesday, Aug 6: 8:30 AM - 10:20 AM
4148
Contributed Papers
Music City Center
Room: CC-207A
Suicidal ideation is a pressing mental health concern, particularly among individuals with posttraumatic stress disorder (PTSD). Early detection and timely intervention are critical yet challenging in psychiatric care. Ecological momentary assessment (EMA) via mobile devices (e.g., smartphones, wearables) provides a novel approach for continuous monitoring of psychological, behavioral, and contextual biomarkers associated with suicide risk. However, the intensive longitudinal nature of EMA data presents statistical challenges alongside opportunities for new medical insights. This study utilized generalized linear mixed-effects models to explore the relationship between coping plan use frequency and suicidal ideation, addressing both within-person and between-person variability. Significant associations were observed at both levels, with moderation analyses revealing that the relationship varied by coping strategy (CRP versus SP).
Our findings highlight the statistical complexities of EMA data and the value of tailored modeling approaches in capturing the dynamic interplay between coping behaviors and suicide risk, offering critical insights for clinical intervention.
Seth Adarkwah Yiadom (Speaker): Formulating the Proxy Pattern-Mixture Model as a Selection Model to Assist with Sensitivity Analysis Wednesday, Aug 6: 8:30 AM - 10:20 AM
0599
Topic-Contributed Paper Session
Music City Center Room: CC-202B
Proxy pattern-mixture models (PPMM) have previously been proposed as a model-based framework for assessing the potential for nonignorable nonresponse in sample surveys and nonignorable selection in nonprobability samples. One defining feature of the PPMM is the single sensitivity parameter, ø, that ranges from 0 to 1 and governs the degree of departure from ignorability. While this sensitivity parameter is attractive in its simplicity, it may also be of interest to describe departures from ignorability in terms of how the odds of response (or selection) depend on the outcome being measured. In this paper, we re-express the PPMM as a selection model, using the known relationship between pattern-mixture models and selection models, in order to better understand the underlying assumptions of the PPMM and the implied effect of the outcome on nonresponse. The selection model that corresponds to the PPMM is a quadratic function of the survey outcome and proxy variable, and the magnitude of the effect depends on the value of the sensitivity parameter, ø (missingness/selection mechanism), the differences in the proxy means and standard deviations for the respondent and nonrespondent populations, and the strength of the proxy, ρ. Large values of ø (beyond 0.5) often result in unrealistic selection mechanisms, and the corresponding selection model can be used to establish more realistic bounds on ø. We illustrate the results using data from the U.S. Census Household Pulse Survey.
Chris Hans: A Class of Non-Separable Penalty Functions for Bayesian Lasso-like Regression
Wednesday, Aug 6: 10:30 AM - 12:20 PM
4151
Contributed Papers
Music City Center
Room: CC-103B
Non-separable penalty functions are often used in regression modeling to enforce group sparsity structure, reduce the influence of unusual features, and improve estimation and prediction by providing a more realistic match between model and data. From a Bayesian perspective, such penalty functions correspond to a lack of (conditional) prior independence among the regression coefficients. We describe a class of prior distributions for regression coefficients that generates non-separable penalty functions. The priors have connections to L1-norm penalization and the Bayesian lasso (BL) and elastic net (BEN) regression models. The regularization properties of the class of priors can be understood both by studying its tunable parameters directly and via the connections to BL and BEN regression. We discuss full Bayesian inference under these priors and variable selection via Bayes factors and posterior model probabilities. Inference and prediction under the class of priors is shown to perform competitively under a range of example data structures.
Omer Ozturk (Co-Author): Order Restricted Cluster Randomized Block Design
Wednesday, Aug 6: 10:30 AM - 12:20 PM
4174
Contributed Posters
Music City Center
Room: CC-Hall B
This research introduces a novel two-stage cluster randomized design, the order restricted cluster randomized block design (ORCRBD). The ORCRBD builds upon the cluster randomized block design by incorporating a second layer of blocking, achieved through ranking cluster units that are randomly sampled from the population. This approach creates a two-way layout, with blocks and ranking groups, and employs restricted randomization to enhance the accuracy of treatment contrast estimation. We calculate the expected mean square for each source of variation in the ORCRBD under a suitable linear model, develop an approximate F-test for the treatment effect, assess ranking quality, calculate optimal sample sizes for a given cost model, formulate multiple comparison procedures, and apply the design to an educational setting.
Rui Qiang (Co-Author): Accounting for Preferential Sampling in Geostatistical Inference
Wednesday, Aug 6: 10:30 AM - 12:20 PM
0482
Invited Paper Session
Music City Center
Room: CC-104E
In geostatistical inference, preferential sampling takes place when the locations of point-referenced data are related to the latent spatial process of interest. Traditional geostatistical models can lead to biased inferences and predictions under preferential sampling. We introduce an extended Bayesian hierarchical framework that models both the observation locations and the observed data jointly, using a spatial point process for the locations and a geostatistical process for the observations. We illustrate extensions beyond the classical log-Gaussian Cox process for the sampling locations, combined with a Gaussian process for the observations. We also introduce simpler methods for accounting for preferential sampling that are less computationally demanding at the expense of prediction accuracy. We validate our models through simulation, demonstrating their effectiveness in correcting biases and improving prediction accuracy. We apply our models to decadal average temperature data from the Global Historical Climate Network in Southwestern United States and show that preferential sampling could be present in some spatial regions.
Subhadeep Paul (Presenter), Jae Ho Chang (First Author): Embedding Network Autoregression for time series analysis and causal peer effect inference
Wednesday, Aug 6: 2:00 PM - 3:50 PM
4198
Contributed Papers
Music City Center
Room: CC-202C
We propose an Embedding Network Autoregressive Model (ENAR) for multivariate networked longitudinal data. We assume the network is generated from a latent variable model, and these unobserved variables are included in a structural peer effect model or a time series network autoregressive model as additive effects. This approach takes a unified view of two related yet fundamentally different problems: (1) modeling and predicting multivariate networked time series data and (2) causal peer influence estimation in the presence of homophily from finite time longitudinal data. We show that the estimated momentum and peer effect parameters are consistent and asymptotically normally distributed in asymptotic setups with a growing number of network vertices N while including growing time points T (time series) and finite T (peer effect) cases. Our theoretical results encompass cases when the network is modeled with the RDPG model and a more general latent space model. We also develop selection criteria when the number of latent variables K is unknown that provably does not under-select and show that the theoretical guarantees hold with the selected number for K as well.
Sean Tomlin (Presenter): Counterpart Statistics in the Matched Difference-in-Differences Design
Thursday, Aug 7: 8:30 AM - 10:20 AM
4202
Contributed Papers
Music City Center
Room: CC-102A
Difference-in-differences (DiD) estimates intervention effects under the parallel trends assumption, but nuisance trends can bias estimates. Matching methods that balance pre-intervention trends have been used, yet we show they fail to adjust for latent confounders and introduce regression to the mean bias. Instead, we advocate for methods grounded in explicit causal assumptions about selection bias. We also propose a Bayesian approach to assess parallel trends, avoiding the challenges of specifying non-inferiority thresholds. We demonstrate our method using Medical Expenditure Panel Survey data to estimate the impact of health insurance on healthcare utilization.
Sally Paganin (Speaker & Organizer): Bayesian Model Assessment using NIMBLE
Thursday, Aug 7: 8:30 AM - 10:20 AM
0770
Topic-Contributed Paper Session
Music City Center
Room: CC-209C
Posterior predictive p-values (ppps) have become popular tools for Bayesian model assessment, being general-purpose and easy to use. However, interpretation can be difficult because their distribution is not uniform under the hypothesis that the model did generate the data. Calibrated ppps (cppps) can be obtained via a bootstrap-like procedure, yet remain unavailable in practice due to high computational cost. This work introduces methods to enable efficient approximation of cppps and their uncertainty for fast model assessment. We first investigate the computational tradeoff between the number of calibration replicates and the number of MCMC samples per replicate. Provided that the MCMC chain from the real data has converged, using short MCMC chains per calibration replicate can save significant computation time compared to naive implementations, without significant loss in accuracy. We propose different variance estimators for the cp approximation, which can be used to confirm the lack of evidence against model misspecification quickly. As variance estimation uses effective sample sizes of many short MCMC chains, we show these can be approximated well from the real-data MCMC chain. The procedure for cppp is implemented in NIMBLE, a flexible framework for hierarchical modeling that supports many models and discrepancy measures.
Massimiliano Russo (Presenter): A tree-based scan statistic for database studies with time-to-time event outcomes
Thursday, Aug 7: 8:30 AM - 10:20 AM
4214
Contributed Papers
Music City Center
Room: CC-201A
Tree-based scan statistics (TBSSs) are machine learning methods for disproportionality analyses in database studies. They simultaneously scan for thousands of hierarchically related outcomes to detect potential signals of harm from health products while controlling for multiplicity. They have been extensively used in pharmacoepidemiology. Current TBSS implementations do not allow for comparative safety evaluation with time-to-event outcomes, available in most database studies. Explicitly accounting for person time can improve the power to detect signals compared to methods that only use number of events. We propose three novel TBSSs for time-to-event data. The first assumes proportional Hazard Rates (HRs) for each node and uses a permutation scheme for inference. The second builds on exponential survival models for the terminal nodes of the hierarchy, implying a constant HR for each node. It uses a parametric bootstrap for inference. The third approach uses robust asymptotic approximations of the HRs to build an approximate parametric bootstrap. We compare the proposed methods with standard TBSSs in various simulation scenarios and database study.
Steephanson Anthonymuthu (Presenter): A Bayesian approach for the estimation of the mitigated fraction for ordinal response
Thursday, Aug 7: 10:30 AM - 12:20 PM
4218
Contributed Papers
Music City Center
Room: CC-202C
The efficacy of an intervention, such as a vaccine, can be established through the estimation of several numerical measures. Mitigated fraction is one of the contemporary numerical measures, and it serves the purpose of reducing the severity of a specific disease rather than completely
preventing its occurrence. In this paper, an efficient approach to calculating the mitigated fraction is presented, with Bayesian approach which involves utilizing the values of latent variables within a generalized linear mixed model (GLMM). This proposed Bayesian method works with many link functions efficiently compared to traditional frequentist approach. The concept of the mitigated fraction was introduced in veterinary medicine to quantify the reduction in the severity of disease occurring in vaccinated animals as compared to non-vaccinated animals. The USDA's Center for Veterinary Biologics (CVB) recommends a form of the mitigated fraction when the disease severity is generally graded by some continuous measure or by some discrete assessment resulting in unambiguous ranks. Our Bayesian approach works effectively when observations are ordinal and measured longitudinally.
Subhadeep Paul (Chair): Structure Identification and Dimension Reduction Methods
Thursday, Aug 7: 10:30 AM - 12:20 PM
4231
Contributed Papers
Music City Center
Room: CC-207C