Understanding and documenting variation in human genomes
To understand disease one needs to understand the genetic variations that underlie it. Many tools exist that predict the deleteriousness of variants in the human genome; PolyPhen2, SIFT or CADD (combined annotation dependent depletion), to name only a few examples. On page 109 of our March issue Yuval Itan et al. present the mutation significance cutoff (MSC) to replace a global threshold for calling variants deleterious, often used for CADD scores, with a gene-level threshold. For MSC, as for any other variant prediction tool, it was important to validate the quality of the predictions with variants known to be deleterious. Established mutation databases are often used as ground truth to test the quality of prediction tools. MSC, for example, was validated against variants found in two large databases, HGMD and ClinVar.
The February editorial discusses the strength and limitations of large human variation databases and emphasizes the importance of sharing variant data in publicly accessible databases. We encourage our readers to share their experience with these databases and to recommend their favorite ones.