Comprehensive variation discovery in single human genomes.

Nat Genet
Authors
Keywords
Abstract

Complete knowledge of the genetic variation in individual human genomes is a crucial foundation for understanding the etiology of disease. Genetic variation is typically characterized by sequencing individual genomes and comparing reads to a reference. Existing methods do an excellent job of detecting variants in approximately 90% of the human genome; however, calling variants in the remaining 10% of the genome (largely low-complexity sequence and segmental duplications) is challenging. To improve variant calling, we developed a new algorithm, DISCOVAR, and examined its performance on improved, low-cost sequence data. Using a newly created reference set of variants from the finished sequence of 103 randomly chosen fosmids, we find that some standard variant call sets miss up to 25% of variants. We show that the combination of new methods and improved data increases sensitivity by several fold, with the greatest impact in challenging regions of the human genome.

Year of Publication
2014
Journal
Nat Genet
Volume
46
Issue
12
Pages
1350-5
Date Published
2014 Dec
ISSN
1546-1718
URL
DOI
10.1038/ng.3121
PubMed ID
25326702
PubMed Central ID
PMC4244235
Links
Grant list
HHSN272200900018C / AI / NIAID NIH HHS / United States
U54 HG003067 / HG / NHGRI NIH HHS / United States
R01 HG003474 / HG / NHGRI NIH HHS / United States
U54HG003067 / HG / NHGRI NIH HHS / United States
HHSN272200900018C / PHS HHS / United States
R01HG003474 / HG / NHGRI NIH HHS / United States