Dissertation Defense: Yongqi Liu

Tue, July 1, 2025

1:00 pm - 3:00 pm

CH 440 Conference Room

Dissertation Defense: Yongqi Liu

Title: Single-Cell Hi-C Data Analyses

Abstract: The advance of single-cell high-throughput chromosome conformation capture (scHi-C) techniques provides significant insights into how DNA interacts with itself within individual nuclei, revealing structural organization and regulatory mechanisms in individual cells. However, the dynamic chromosome structures and intricate experimental procedures pose great challenges in analyzing scHi-C data. Not only are the scHi-C data extremely sparse (i.e., contain lots of observed zeros), but the confounding effect of sparsity and intrinsic variability make it difficult to distinguish structural zeros (SZs) from dropouts, where SZs are resulted from the biologically lack of interaction between the locus pairs and dropouts are due to low coverage or sampling

variation. In this dissertation, we explored several aspects of scHi-C data analyses and tackled these challenges.

We first investigated the effects of random-walk-based methods in improving data quality. Although random-walk-based methods have increasing prevalence, are easy to implement and do not assume any specific data distribution, the theoretical support that they can enhance the unique scHi-C data characteristics is lacking. In the scenarios of possible data quality improvement, the resulted data is highly sensitive to the choice of parameters.

Next, we looked into a downstream analysis for the improved data, cell clustering. We proposed a structural-zero-aware Kendall’s tau (szKendall) measure, which utilizes both the spatial information of locus pairs and the structural zeros (SZs) information, to quantify the dissimilarity between single cells. The performance of szKendall was benchmarked against the traditional Euclidean and Kendall’s tau distances through extensive simulation studies and real data applications, and szKendall consistently outperformed the benchmark measures.

Driven by the scarcity of tools for identifying SZs in scHi-C data, we developed a Bayesian hierarchical model to detect SZs (both common and cell-specific) and impute dropouts simultaneously. The proposed algorithm, cell-specific HiCImpute (csHiCImpute), allows the probability of a locus pair being an SZ to be different across single cells and leads to more accurate capture of cell-to-cell variability. Our simulations and real data analysis found that csHiCImpute is more powerful in detecting cell-specific SZs than its predecessor, HiCImpute.

Advisor: Shili Lin