Capture Hi-C (CHi-C) is a rapidly emerging technology from the family of chromosome conformation capture (3C) methods, a series of methods which are designed to detect proximity driven chromosome interactions within the cell. CHi-C involves a standard Hi-C experiment which finds all possible interactions between DNA fragments genome wide (“all to all”), followed by a capture step to limit these to interactions for which at least one end (the bait) involves an area of interest (“many to all”). There are two key applications of CHi-C: to characterise GWAS risk loci interactions using region CHi-C (rCHi-C), and to characterise interactions emanating from gene promoters using promoter CHi-C (pCHi-C).
Our lab is interested specifically in post-GWAS follow-up of breast cancer risk loci and we use rCHi-C to interrogate the mechanism by which intergenic risk SNPs contribute to the pathogenesis of the disease. However, we found that existing methods for CHi-C data analysis were geared towards pCHi-C and often resulted in a large number of interaction calls. Before we invest in functional follow-up studies of the interaction calls, which are costly in terms of time and labour as well as financially, it is imperative that we select only those which are of highest-confidence. In order to identify high-confidence interactions for follow-up studies, we developed Capture Hi-C ANalysis Engine (CHiCANE), which implements a statistical method for interaction calling and support for downstream data analyses. CHiCANE is freely available as an R package from CRAN.
Our method takes into account whether both of the interacting fragments are baits (bait-to-bait) or just one of the fragments is a bait (bait-to-other) as well as whether the interactions are on the same chromosome (cis) or on different chromosomes (trans) and models them differently according to these properties (Figure 1). We use a negative binomial as the default distribution as we have found this model to provide the best fit to the interactions we are interested in. Adjustments are included for distance between the fragments and “interactibility” of the baits as appropriate for the interaction type.
Figure 1: Overview of the CHiCANE method of adjustments by fragment pair type.
We recognise that the default negative binomial model, as well as other default settings, that are optimal for our data may not be desirable for every CHi-C experimental protocol. Therefore, we built the CHiCANE R package to be highly parameterised with many different options available to tailor interaction calling to a wide range of possible experimental procedures. In addition to the negative binomial model, users can also specify Poisson, truncated Poisson, or truncated negative binomial models. There are also multiple options for replicate merging, whether/how to include zero counts, filtering for bait or target fragments with low and/or high “interactibility”, inclusion of additional adjustment terms, multiple testing correction, etc.
Beyond just the statistical framework for interaction calling, we have included several functions in the CHiCANE package to assist with downstream analysis and visualisation of the interaction calls. Users can test modelling their data with alternative distributions and other optional settings and compare these outputs to determine which provides the best fit. For interoperability with other tools, CHiCANE’s outputs are in R data structures offering further compatibility with BEDTools1 and BEDOPS2 and the WashU Epigenome3 browser where the interactions can be viewed in the context of genome annotations as well as publicly available datasets such as ENCODE. CHiCANE also supports a CHi-C specific interface with Gviz4 for customised visualisation of interactions at a desired locus in context of a user-provided genome annotation and region of interest BED file (e.g. baitmap, TAD, or CTCF coordinates) along with generic ideogram and genome axis tracks.
In order to facilitate assessment of enrichment of interacting fragments in terms of histone marks, CTCF binding sites, etc. CHiCANE has built in support to generate a randomised background tailored to the characteristics of a given dataset that can be then used to determine fold-enrichment of observed data compared to the background. We also discuss in detail how to evaluate library quality, model fitting, interaction peaks calls, and replicate concordance using both CHiCANE functions and other freely available bioinformatics tools. Furthermore, we provide detailed instructions for annotating the interactions call file, appraisal of eQTLs to measure association between genotype and mRNA abundance, and creating files for TAD boundary examination. Finally, we include an in-depth discussion of the impact of covariates and statistical distributions.
Our recently published protocol details end-to-end processing and analysis of both rCHi-C and pCHi-C experiments utilising our freely available R package CHiCANE. Read all about it here: http://dx.doi.org/10.1038/s41596-021-00498-1
- Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842 (2010).
- Neph S. et al. BEDOPS: high-performance genomic feature operations. Bioinformatics 28, 1919-20 (2012).
- Li, D., Hsu, S., Purushotham, D., Sears, R.L. & Wang, T. WashU Epigenome Browser update 2019. Nucleic Acids Research 47, W158-W165 (2019).
- Hahne, F. & Ivanek, R. Visualizing Genomic Data Using Gviz and Bioconductor. Statistical Genomics: Methods and Protocols, 335–351 (2016).