Analyzing high throughput sequencing data


Nature Methods has published popular analysis tools to make sense of the ever-increasing amount of high throughput (HTP) sequencing data. Some tools in this field have a short half life, due to pressure to always improve and innovate, others have staying power. Let’s look back over some of the highlights in our pages.

Mapping and assembling genomic reads

One of the first steps in any sequence analysis pipeline is base-calling and in 2008 Yaniv Erlich with Gregory Hannon improved the calling errors in Illumina data with their Alta-Cyclic that uses machine learning to reduce noise.

Once bases are called they most often need to be aligned to a reference, and high speed, sensitivity and accuracy are key requirements for mapping tools. In 2009 Paul Flicek and Ewan Birney discussed the basic principles behind methods for read alignment and assembly, and since then many more read mappers have been written. mrsFAST is a cache-oblivious, seed and extend,  short read mapper presented in 2010 by Cenk Sahinalp and colleagues. Bowtie2 by Ben Langmead and Steven Salzberg, a gapped read aligner,  promises exceptional speed and accuracy.  The GEM mapper by Paolo Ribeca and colleagues combines speed with an exhaustive search that returns all existing matches.

If no reference genome is available de novo assembly is the way to go. Many tools for genome assembly have been published but in 2010 Evan Eichler and colleagues demonstrated some of the limitations of popular assemblers used for the human genome. The ongoing high citation level of this paper and other work pointing out limits in current assembly programs highlight that de novo read assembly continues to be a challenge.

Finding structural variants

In 2009 Paul Medvedev and Michael Brudno looked at tools to discover structural variants  and later the same year they presented MoDIL, an insertion-deletion (indel) finder that focuses on a size range of 20-50 base pairs. Ken Chen et al.  published the aptly named BreakDancer, a tool to predict a wide variety of structural variation ranging in size from 10 base pairs to 1 megabase.  In 2011 Evan Eichler and colleagues added Splitread to find indels, de novo structural variants and copy number polymorphisms with high specificity and sensitivity. More recently in 2013, DeNovoGear from Donald Conrad and colleagues showed high validation rates in finding de novo indels in somatic tissue.  This year Scalpel, written by Michael Schatz and colleagues, came on the scene; a combination of mapping and de novo assembly allows it to detect transmitted as well as new indels in exome data.

Handling RNA-seq data

In 2008 Mortazavi et al.  and Cloonan et al. published one of the first RNA-seq papers in our pages and in 2009 Wold and Mortazavi presented and overview of tools for RNA-seq data analysis and the principles behind them. And since then the number of RNA-seq analysis tools has grown steadily throughout the literature.

To assess differential expression in RNA-seq data Malachi Griffith et al. wrote ALEXA-seq in 2010. The same year Chris Burge and colleagues published the MISO model to estimate expression of alternatively spliced exons and isoforms. Inanc Birol and colleagues presented Trans-ABySS for de novo transcriptome assembly. And a year later Manuel Garber and colleagues discussed the challenges in tracriptome mapping, reconstruction and expression quantification.

Last year Paul Bertone and colleagues from the RGASP consortium compared popular tools for spliced alignment.  And they looked at the performance of software to reconstruct transcripts.

David Haussler and colleagues showed in 2010 with FragSeq that RNA-seq data can also be used to probe the structure of a transcript.  And by combining SHAPE with HTP sequencing in their SHAPE-MaP approach Kevin Weeks and colleagues showed this year that RNA functional motifs can be discovered in their structure.

Despite the many computational tools we have published it is still not always easy to predict a priori which one will be taken up by the community. We’d love to hear from you what you think makes a top notch analysis tool.

Please sign in or register for FREE

If you are a registered user on Springer Nature Protocols and Methods Community, please sign in