Biomedical researcher Stephen Kingsmore is on the move. He has just taken on his new post running the new Rady Pediatric Genomics and Systems Medicine Institute, which is part of Rady Children’s Hospital in San Diego. He is leaving Children’s Mercy Hospital (CMH) in Kansas City where he founded the Center for Pediatric Genomic Medicine.
Kingsmore has also just published along with colleagues at CMH a method called STAT-seq in which the team performs whole genome sequencing and analysis in 26 hours.
As Neil Miller, CMH’s director of informatics explains, CMH is in the process of making most of the downstream characterization and interpretation software behind the STAT-seq pipeline freely available. The team also plans to make its warehouse of genetic variants available and they want to launch a software-as-a-service offering for people without IT infrastructure so that they can use these tools.
Nature Methods spoke to Kingsmore and what follows is an edited version of the conversation.
Q: To better diagnose and make treatment decisions about seriously ill babies in intensive care you have found a way to sequence whole genomes of parents and their newborn and analyze them in 26 hours. The babies might be having unexplained seizures, parents are deeply upset. Does this involve a lot of people doing the analysis? Or is much of the analysis automated? Just thinking there might be a new world of jobs opening up.
Stephen Kingsmore: The analysis and interpretation are highly automated. That being said, there will be a new world of jobs opening up as folk like us scale up to meet the needs of local populations.
There are 9 million people living in the Rady catchment area, so we foresee a need for 25,000 parent or child genomes a year! That’s a very large number of new genetic counselors.
Q: How do you validate that the genomic analysis is right, especially under these high-pressure circumstances with high stakes? Sanger sequencing?
S.K.: Yes, Sanger sequencing or other appropriate confirmatory test, depending on the type of mutation.
There may also be a need for functional validation, since all that Sanger does is to say that the letter code is correct – it doesn’t speak to whether the mutation is actually causing disease.
Q: Structural variants are complicated to find, making for time-consuming analysis, but they play a role in many diseases. What is needed to make them part of speedy genome analysis?
S.K.: Yes, this is a key need. We need robust, fast methods for finding structural variants genome wide. Microarrays don’t pick up small structural variants nor complex variants, like inversions.
This will be a race between longer read or longer insert whole genome sequencing and newer methods such as offerings from companies such as 10X Genomics and BioNano.
We then need to integrate the two types of variant information so we get a full picture of variations. And all of that can happen with the ease of interpretation and speed now possible for whole genome sequencing.
Q There are a number of fast computational analysis pipelines such as Churchill, SpeedSeq and now yours—is this officially a race of speed demons? One of them, Speed-Seq describes genome analysis in 13 hours. How to compare these tools, their sensitivity and specificity?
S.K.: We need a bake-off! I’m biased, but I think ours is fastest with its genome analysis that takes between one and one and a half hours. It has the highest sensitivity and specificity for nucleotide variants, and the smallest IT footprint for local implementation.
However, ours is not yet in the public domain nor yet available on a software-as-a-service basis, and does not yet have fully integrated structural variant calls. We hope to rectify these things by the end of the year.
Q: In your new study you use proprietary hardware by a company called Edico Genome; others are using open source software. Do people need to decide on belonging to the open or closed club when they want to try to implement what you are doing?
S.K.: Edico is a mix of hardware and software. The overall cost, I think, is significantly lower than traditional compute plus freeware. We are strongly focused on making freeware versions of the software described in the manuscript available by the end of the year.
That being said, there are some excellent commercial software options, and Genomics England has gone that route after their bake-off for the 100k Genome Project. So yes, people should really step back at this juncture and think critically about their needs over the next two to three years
Q: The data , particularly on childrens’ genomes is sensitive but also of great interest to researchers. How do you work out data-sharing schemes with these data?
S.K. This is a delicate balancing act. There is great value for researchers to be able to re-analyze genomes together with structured clinical data, especially where a diagnosis was not evident.
We like the secure NIH database of Genotypes and Phenotypes (dbGAP) route, which balances the need for confidentiality, even of de-identified data, with the needs of researchers and funding agencies.