The story begins with a small molecule called Kethoxal, which was first reported to react with and inactivate RNA viruses in the 1950s. Several years ago, our lab tried experiments with this compound to expand the toolbox for labeling RNA in vivo and probing RNA secondary structures at the transcriptome-wide scale. Kethoxal attracted our attention because it specifically reacts with guanines in single-stranded RNA, but not double-stranded RNA, under mild conditions. However, the lack of synthetic routes to modified kethoxal derivatives hampered its use in transcriptomic studies. Taking advantage of our expertise in synthetic chemistry, a former postdoc, Dr. Xiaocheng Wen developed a synthetic scheme to prepare azide modified kethoxal (N3-kethoxal) for the specific labeling of the Watson–Crick face of guanines in single-stranded RNA (ssRNA). The azido group offers a bioorthogonal handle that can be modified with a biotin for enrichment. The high labeling efficiency of N3-kethoxal coupled with efficient enrichment lead to a new method (keth-seq) for RNA secondary structure probing with high accuracy (https://www.nature.com/articles/s41589-019-0459-3).
Tong joined our lab as a graduate student when N3-kethoxal was first made, and he helped develop the keth-seq method. He reasoned that N3-Kethoxal should also show the same reactivity with guanines in ssDNA as in ssRNA, and the formation of Watson-Crick base-pairing in dsDNA can block this reaction(Fig. 1a). This property could enable a method to detect ssDNA in situ. From Keth-seq work Tong already knew that N3-kethoxal labeling can be reversibly removed by heating at 95 ˚C, so he could avoid issues with the kethoxal label interfering with PCR amplification. Following his experiences from the keth-seq project, Tong tried a modified protocol by labeling HEK293T cells, isolating genomic DNA instead of RNA, performing biotinylaiton and enrichment, and constructing DNA libraries for next generation sequencing. Overall, the entire protocol was simple and he was able to finish within one day(Fig. 1b).
Fig. 1 KAS-seq probes single-stranded DNA regions. a, The molecular structure of N3-kethoxal and how N3-kethoxal labels guanines in single-stranded DNA (ssDNA) but not in double-stranded DNA (dsDNA).b, N3-kethoxal (blue star) reacts with single-stranded guanines in the genome (resolved by DNA-binding proteins, such as Pol II as shown in yellow), which can be further biotinylated (red) and enriched for sequencing. HT sequencing, high-throughput sequencing.
After we got the data back from the first Illumina sequencing run, Ruitu quickly mapped the data to the reference genome with the same pipeline used for regular ChIP-seq data analysis. We were surprised by the peaks that showed up on the UCSC genome browser, which contained both sharp and broad peaks and show very significant correlation with active histone modifications and gene expression. Through a more systematic analysis, we found that ssDNA signals were detected on many genomic features, such as promoters, gene bodies, transcription terminal regions, enhancers and some Non-B DNA regions. While we didn’t come into the project with these expectations, these data immediately suggested that our method could be used for profiling transcriptionally active loci across the genome. We therefore named this method as KAS-seq, which stands for kethoxal-assisted single-stranded DNA sequencing.
To further increase the signal-to-noise ratio of KAS-seq, we optimized the concentration of N3-kethoxal and enrichment conditions and submitted another batch of libraries from HEK293T cells, this time including samples treated with two transcription inhibitors, 5,6-dichlorobenzimidazole 1-β-D-ribofuranoside (DRB) and triptolide. In the DRB and triptolide treated samples, KAS-seq peak numbers decreased by 57% and 93%, respectively. DRB treatment severely diminished ssDNA signals in gene body and transcription terminal regions with increased signals on TSS; triptolide treatment almost completely erased all signals at the entire gene-coding regions, which are consistent with the mechanism that they inhibit transcription. These observations confirmed that the KAS-seq signal reflects the dynamics of transcription. Further analysis of the KAS-seq data helped us identify a group of single-stranded-DNA-containing enhancers (SSEs), which show unique sequence and protein-binding features and are associated with higher enhancer activity.
Considering the high labeling efficiency and the high affinity biotin–streptavidin pulldown used for enrichment, we thought that KAS-seq should maintain its sensitivity even with low-input starting materials. After several rounds of optimization, we were able to get KAS-seq to work well when using 10,000, 5,000 or even 1,000 HEK293T cells. The KAS-seq results from low input samples showed similar enrichment efficiency and captured similar numbers of peaks compared with those starting with bulk cells, suggesting a wide range of potential applications for studying rare cell populations in the future.
In conclusion, our efforts led to a new method to profile the ssDNA genome-wide which can be used to capture global transcription dynamics, enhancer activity, as well as other processes involving ssDNA in situ. KAS-seq is easy to use and should be deployable in other labs. Indeed, we have already shared N3-kethoxal with many other labs that study transcription dynamics in various biological systems. We are optimizing KAS-seq to make it more powerful as well as applying it to some important biological systems. We are also expanding the application of kethoxal derivatives to create new methods that can answer other biological questions. Finally, we want to take this chance to thank all He lab members. The interdisciplinary environment created by everyone in the lab enabled the development of KAS-seq.