Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.

Nat Commun

Authors	Olivier Delaneau Jonathan Marchini 1000 Genomes Project Consortium 1000 Genomes Project Consortium
Keywords	Humans Algorithms Alleles Microarray Analysis Haplotypes Gene Frequency Genome-Wide Association Study Polymorphism, Single Nucleotide Genome, Human
Abstract	A major use of the 1000 Genomes Project (1000 GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000 GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants.
Year of Publication	2014
Journal	Nat Commun
Volume	5
Pages	3934
Date Published	2014
ISSN	2041-1723
DOI	10.1038/ncomms4934
PubMed ID	25653097
PubMed Central ID	PMC4338501
Links	PubMed Google Scholar DOI
Grant list	096599 / Wellcome Trust / United Kingdom G0801823 / Medical Research Council / United Kingdom G0801823 / Medical Research Council / United Kingdom P20 MD006899 / MD / NIMHD NIH HHS / United States P30 ES013508 / ES / NIEHS NIH HHS / United States R01 CA166661 / CA / NCI NIH HHS / United States R01 HG006849 / HG / NHGRI NIH HHS / United States