Identifying chromatin loops from single cells
Single cell Hi-C (scHi-C) maps chromatin spatial organization at single cell resolution, but the sparsity of the resulting data presents significant challenges in data analysis. We report a novel method, SnapHiC, which detects chromatin loops from scHi-C data at high sensitivity and specificity.
First invented in 2013, single cell Hi-C (scHi-C) and its derived technologies1-7 have been used to study the chromatin spatial organization at single cell resolution, facilitating the characterization of the three-dimensional (3D) genome of complex human tissues in individual cells. One active research area is to explore the variability of 3D genome features discovered from bulk Hi-C data, such as A/B compartments, topologically associating domains (TADs) and chromatin loops, in cell populations. While detection of compartments and TADs in individual cells have been described in earlier scHi-C studies8,9, the development of chromatin loop identification methods tailored for scHi-C data is still lagging behind, largely due to two challenges: First, chromatin looping events are highly dynamic and variable, even among homogenous cell populations. Second, the low capture efficiency of current scHi-C technologies results in a significant amount of missing data, making the analysis of sparse scHi-C data extremely challenging.
Our teams are most interested in chromatin biology and gene regulation, which requires the accurate mapping of enhancer-promoter interactions in a cell-type-specific manner. Since previous efforts were focused on higher-order chromatin features such as A/B compartments and TADs8-10 and no loop callers are available for scHi-C data, we decided to tackle this problem in 2019. Our goal was to devise a computational method that can accurately map chromatin loops at high resolution from a small number of single cells. It took us nearly two years to develop a robust computational method, SnapHiC, to identify chromatin loops from scHi-C data with high sensitivity and accuracy.
We introduced several key advances to address the abovementioned challenges. We first adopted the random walk with the restart (RWR) algorithm to impute the contact frequency between genomic fragments within each cell at sub-10-kilobase resolution. Such an RWR algorithm was originally proposed to infer A/B compartments and TADs from scHi-C data at lower resolution (1Mb or 40Kb)4. We were the first to extend it to such high resolution, which is necessary for the loop calling. After the imputation, we obtained the contact probability for all genomic loci pairs within each cell, substantially alleviating the data sparsity issue. More importantly, instead of aggregating all cells together into pseudo bulk cell data for loop calling, we treated each cell as an independent unit, thus keeping the information of cell-to-cell variability uniquely provided by scHi-C data and enhancing the statistical power of loop calling. In addition, we combined both global and local background models, to identify genomic loci pairs with higher contact probability compared to loci pairs with the same 1D genomic distance (i.e., global background) and loci pairs in the near neighborhood regions (i.e., local background), ensuring low false positive rates. Last but not least, we applied the Rodriguez and Laio’s clustering algorithm11 to group nearby loops into clusters, and selected cluster summits as the final output, further reducing the false positives.
Our method development was far from smooth. The first prototype of SnapHiC came out quickly, but optimizing the algorithm, improving the loop quality and interpreting biological findings took much longer time. We performed extensive tests in each step of the entire pipeline. We visually checked more than 500 genomic loci in both human and mouse scHi-C datasets2,4 to select robust critical values, in order to identify highly reliable loops. In addition, we tried several widely used clustering methods, each with different combinations of tuning parameters, applied them to multiple scHi-C datasets, resulting in a large number of combinations. In the end, we decided to use the Rodriguez and Laio’s clustering algorithm, which provided both efficiency and accuracy in the detection of chromatin loops.
SnapHiC provides a powerful tool to map chromatin loops at single cell resolution, which is useful for precious tissue samples with a limited number of cells and for rare cell types in complex human tissues. The cell-type-specific 3D genome characterized by SnapHiC can shed novel insights into gene regulation mechanism and disease etiology and accelerate the prioritization of disease genes underlying complex human diseases. For example, as described in our paper, we identified astrocyte-specific chromatin loops linking the promoter of APOE gene to two Alzheimer’s diseases associated genetic variants 100Kb~150Kb downstream, suggesting that APOE is the target gene of these two AD-associated SNPs, specifically within astrocytes. In the near future, we plan to perform single cell multi-omics studies on different human cell types in order to map transcriptome, epigenome and 3D genome simultaneously from the same cell and to discover novel causal genes underlying complex human diseases and traits.
To learn more about SnapHiC, here is the link to the full article (https://www.nature.com/articles/s41592-021-01231-2). SnapHiC is freely available under GPL-3.0 license from GitHub: https://github.com/HuMingLab/snapHiC.
1 Nagano, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59-64, doi:10.1038/nature12593 (2013).
2 Nagano, T. et al. Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature 547, 61-67, doi:10.1038/nature23001 (2017).
3 Li, G. et al. Joint profiling of DNA methylation and chromatin architecture in single cells. Nature methods 16, 991-993, doi:10.1038/s41592-019-0502-z (2019).
4 Lee, D. S. et al. Simultaneous profiling of 3D genome structure and DNA methylation in single human cells. Nature methods 16, 999-1006, doi:10.1038/s41592-019-0547-z (2019).
5 Ramani, V. et al. Massively multiplex single-cell Hi-C. Nature methods 14, 263-266, doi:10.1038/nmeth.4155 (2017).
6 Flyamer, I. M. et al. Single-nucleus Hi-C reveals unique chromatin reorganization at oocyte-to-zygote transition. Nature 544, 110-114, doi:10.1038/nature21711 (2017).
7 Tan, L., Xing, D., Chang, C. H., Li, H. & Xie, X. S. Three-dimensional genome structures of single diploid human cells. Science (New York, N.Y.) 361, 924-928, doi:10.1126/science.aat5641 (2018).
8 Zhang, R., Zhou, T. & Ma, J. Multiscale and integrative single-cell Hi-C analysis with Higashi. 2020.2012.2013.422537, doi:10.1101/2020.12.13.422537 %J bioRxiv (2021).
9 Li, X., Zeng, G., Li, A. & Zhang, Z. DeTOKI identifies and characterizes the dynamics of chromatin TAD-like domains in a single cell. Genome biology 22, 217, doi:10.1186/s13059-021-02435-7 (2021).
10 Tan, L. et al. Changes in genome architecture and transcriptional dynamics progress independently of sensory experience during post-natal brain development. Cell 184, 741-758.e717, doi:10.1016/j.cell.2020.12.032 (2021).
11 Rodriguez, A. & Laio, A. Clustering by fast search and find of density peaks. Science (New York, N.Y.) 344, 1492-1496, doi:10.1126/science.1242072 (2014).