About a month ago, I attended the ‘Beyond the Genome (BTG): Cancer Genomics’ meeting in Boston—my second conference as Chief Editor of Nature Protocols (my first was the 2014 ARR meeting at the University of Sussex). The BTG meeting grabbed my attention for several reasons; firstly, the topic was more-or-less within my comfort zone; secondly, there was a heavy focus on bioinformatics (a rapidly developing and important field); and thirdly, the line-up was fantastic. Fortunately, it was a small meeting so I was able to pin down several of the key speakers, including Fred Alt, Gad Getz, Peter Park, Nuria Lopez-Bigas, Mike Schatz, Nils Gehlenborg, and Rosalie Sears, and poke my head into some of their ‘labs’ (offices).
Given the bioinformatics focus of the meeting, it is perhaps unsurprising that many of them spoke about the need for easy-to-follow instructions for biologists branching out into bioinformatics and using complex computational tools for the first time. As Peter Park pointed out to me, “biologists and bioinformaticists speak different languages, and few journals seem to bridge this gap”. As it happens, some of our most popular protocols (e.g. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources, which has been cited almost 4,000 times) fit this bill perfectly, providing clear, concise, step-by-step instructions for using popular bioinformatics tools and software packages. Written by bioinformaticists, reviewed by biologists and bioinformaticists, and edited by (ex)-biologists, these protocols help researchers without a bioinformatics background to analyse their data themselves. Given the popularity of these protocols and the growing importance of bioinformatics within the biomedical sciences, this seems like an obvious area for expansion of the journal (so watch this space!)
With more than 40 talks over three days, a comprehensive overview of the BTG meeting is out of the question, so instead I will highlight a few of my favourites. On the first day, Gad Getz opened the talks with a discussion of MutSig, software developed by his group for identifying driver genes (by analysing lists of mutations and territory covered during sequencing) and building models of the background mutation processes that occur during tumour formation. Jan Korbel stepped up next, explaining the system he has set up for constructing maps of unbalanced SVs based on whole-genome DNA sequencing data. At the end of the day, Fred Alt spoke passionately about his recent work on high-throughput genomic translocation sequencing (HTGTS) strategies to identify translocations that arise from fixed DSBs, as well as sites of endogenous genomic DSBs.
The next morning, Rosalie Sears gave an interesting talk on the use of 3D tissue bioprinters to generate tumour tissue, as well as normal healthy tissue. Printer ‘ink’ 1 (consisting of endothelial cells, fibroblasts, and immune cells) is added first; printer ‘ink’ 2 (cancer cells) is then added to look at tissue distribution and assess cell interactions. Unfortunately, only a few labs have access to such advanced technology but this could change over the next 5-10 years as the set-up cost inevitably drops. Later that day, Lynda Chin talked about data she has collected using ChromHMM, a computational tool set up by Ernst and Kellis for predicting chromatin ‘states’ (based on combinations of chromatin marks) and characterizing their biological functions.
On the third and final day, Mike Schatz kicked off proceedings with a ‘bioinformatics challenge’ relating to single-cell CNV analysis. The challenge has become a regular feature of ‘Beyond the Genome’, encouraging students, postdocs, and analysts in the audience to get “down and dirty with data to solve an informatics problem as quickly as possible” (as Mike puts it). The task opened up to the floor this year (“should you chose to accept it”) was to resolve the population structure of a collection of cells, establishing which cells were in the same clone and, for each clone, which was the most highly amplified oncogene. Schatz provided data for simulated single-cell sequencing of a population of 9 cells with 250k reads per cell (all for chromosome 1 (not whole genome) at 0.5x coverage). He also provided a list of 100 candidate genes. As this coverage is too sparse to identify point mutations, the genome must be divided into “bins” (with 50-100 reads/bin) before mapping the reads and counting reads/bin. Copy number variations in a single cell can be identified as bins with significantly fewer or significantly more reads, and the population structure can be examined by finding cells with the same patterns of bin counts (see our Protocol on this).
After an initial flurry of activity as people rose to the challenge, the audience settled down to hear Nils Gehlenborg talk about the applications of StratomeX, an interactive visualization tool he developed in the Park lab to compare differences in molecular profiles across patient sets. Then, just after lunch, Schatz announced the winner of the bioinformatics challenge; René Böttcher, a PhD student in Guido Jenster’s lab in the Erasmus Medical Center in Rotterdam, was the first to solve the challenge in just over 1 hour (congratulations, René!). As the closing remarks drew nearer, Peter Park gave a fantastic talk on the different single-cell sequencing approaches, with tips on comparing data obtained using different techniques (e.g. GC bias with MDA sequencing, better read-depth stability at large scales with MALBEC than with MDA, more consistent depth at small scales with MDA, etc.). Shortly afterwards, a sea of smiling faces left the auditorium, with many promises to return next year.