Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Authors | |
Keywords | |
Abstract | A major use of the 1000 Genomes Project (1000 GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000 GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. |
Year of Publication | 2014
|
Journal | Nat Commun
|
Volume | 5
|
Pages | 3934
|
Date Published | 2014
|
ISSN | 2041-1723
|
DOI | 10.1038/ncomms4934
|
PubMed ID | 25653097
|
PubMed Central ID | PMC4338501
|
Links | |
Grant list | 096599 / Wellcome Trust / United Kingdom
G0801823 / Medical Research Council / United Kingdom
G0801823 / Medical Research Council / United Kingdom
P20 MD006899 / MD / NIMHD NIH HHS / United States
P30 ES013508 / ES / NIEHS NIH HHS / United States
R01 CA166661 / CA / NCI NIH HHS / United States
R01 HG006849 / HG / NHGRI NIH HHS / United States
|