The organisation of DNA in eukaryotic nuclei is not random. DNA is instead carefully wrapped around histone proteins to form chromatin, which is then highly organised by architectural proteins and other looping factors. This folding impacts on several basic biological processes: gene expression regulation, DNA damage repair, DNA replication, cell division and evolution. Our understanding of these relationships has increased dramatically over the past two decades, as new technologies have allowed us to measure chromatin folding in ever-greater detail.
A few years ago, the Pombo lab developed a new method of measuring chromatin folding called Genome Architecture Mapping (GAM). GAM uses DNA extracted from many ultra thin cryosections of individual nuclei (we call these "nuclear profiles") to infer DNA conformation. This approach makes GAM ideal for probing nuclear organisation in cells from complex tissues like the brain. As we went through the process of publishing the initial paper describing GAM, our peer reviewers were keen for us to conduct more in depth comparisons between GAM and the most popular alternative technology, Hi-C. We thought that this was a great idea, and assured our reviewers that we would cover it in a separate paper as that was the only way to give the comparison the time and space it deserved.
As our initial GAM dataset was fairly small, we started by collecting some additional data to make our comparisons as robust as possible. However, the GAM protocol we were using at the time involved sequencing each thin nuclear slice separately, which was quite time consuming and used up a lot of reagents. We wondered if we could streamline the process by sequencing multiple slices together, so we asked Mario Nicodemi and his team including Carlo Annunziatella to update SLICE, their mathematical model of the GAM procedure. They found that the statistics should all work well with this new approach so, suitably reassured, we started collecting multiplex-GAM samples and were very happy with the results. Along the way we also improved several other aspects of the technique: we identified a new counterstain that made sample collection much easier, adopted a speedier library preparation method (which thankfully also allowed us to use robotic automation for some steps) and updated the statistical model to account for different nuclear shapes.
Having used multiplex-GAM to expand our mouse embryonic stem cell dataset from 408 nuclear profiles to 1250, we started to compare the results to published Hi-C experiments from the same cell type. We could immediately see that topologically associating domains, one of the most prominent aspects of chromatin organisation, were detected equally well by both methods. To search for areas where GAM and Hi-C gave different results, we first looked closely at our new GAM contact matrices – two dimensional maps that encode the spatial distance in the cell nucleus between any two points on the genome. We soon found that the difference in the way measurements are made in GAM and Hi-C made it impossible to directly compare the raw results. Christoph Thieme took on this challenge and experimented with a few different methods for making the measurements made by the two techniques more comparable until he found an approach called Z-score normalisation that worked well.
After applying this Z-score normalisation, Christoph was able to develop an approach to locate DNA contacts that were detected by Hi-C but missing from GAM datasets and vice versa. Arguably the best known protein responsible for organising the 3D genome is the architectural factor CTCF. Interactions are known to form between CTCF binding sites oriented towards one another (convergent CTCF sites) through a process called loop-extrusion, and we were able to pick up strong CTCF-CTCF contacts in both GAM and Hi-C. With a lot of help from the team of Lonnie Welch we also found a class of contacts between CTCF sites and regions of the genome that bore the hallmarks of active transcription – these contacts were frequently detected only in GAM data but not by Hi-C.
Excited by this discovery, we decided to investigate these contacts further and see if we could understand why they were preferentially detected by GAM. We were inspired by previous work by Justin O’Sullivan which had predicted that DNA interactions involving multiple different partners coming together simultaneously (complex contacts) would be more difficult to detect by Hi-C. With this in mind, we went back to our datasets and were able to show that particular regions of the genome had a greater propensity to form complex contacts, and that these regions were also much more likely to form contacts that were detectable in GAM but not in Hi-C, providing the first direct evidence for a previously theoretical concept.
We hope that the improvements we have made to the GAM protocol will make it much easier for our colleagues around the world to adopt and use for themselves. We are also very excited to find out more about how these complex contacts in our future research: how do they form? Why do they often involve super-enhancers? What functions might such interactions play in the cell? Future advances in this field will be indispensable for us to better understand the gene regulation, cellular differentiation and the interplay between human DNA sequence variation and disease.
Header image: a DAPI-stained cryosection through a colony of mouse embryonic stem cells, kindly provided by Alexander Kukalev.