
Existing approaches rely on outcome regression or propensity score weighting but may suffer from model misspecification and instability due to extreme weights. To address these limitations, M-learning has been proposed for continuous outcomes. In the first part of this work, we extend M-learning to handle right-censored time-to-event outcomes, which are common in medical studies. We construct matched sets to compare observed times and incorporate inverse probability of censoring weights to account for censoring. We also consider full matching design as an alternative to matching with replacement. We show that the proposed value function is unbiased for the true value function without censoring. Simulation studies compare M-learning with different matching strategies and a weighted learning method, showing that all methods perform well when confounders are fully observed, with performance varying across scenarios. However, performance declines in the presence of unmeasured confounders or effect modifiers. We apply these methods to patients with Atrial Fibrillation (AF) complications to estimate optimal anticoagulant strategies, with the full matching design effectively reducing the risk of composite events.
The second part of this work addresses another limitation of existing ITR methods: they often produce deterministic recommendations based solely on the sign of the treatment effect, which can lead to unreliable decisions when effects are uncertain or not clinically meaningful. To address this, we propose MatchBART-ITR, a framework that integrates prognostic score matching, Bayesian Additive Regression Tree (BART) modeling, and uncertainty-based decision-making. The method incorporates a “no-recommendation” option based on a prespecified clinical threshold, helping avoid overconfident decisions when effects are small or uncertain. Simulation studies show that MatchBART-ITR with the posterior probability rule consistently improves ITR accuracy over direct BART modeling, particularly under poor covariate overlap and nonlinear outcome settings. Finally, we apply this method to patients with severe Traumatic Brain Injury (TBI), guiding individualized Intracranial Pressure (ICP) monitoring decisions and leading to a reduction in mortality. We conclude with a discussion of limitations and future directions.