Improved ancestry inference using weights from external reference panels.

Bioinformatics
Authors
Keywords
Abstract

MOTIVATION: Inference of ancestry using genetic data is motivated by applications in genetic association studies, population genetics and personal genomics. Here, we provide methods and software for improved ancestry inference using genome-wide single nucleotide polymorphism (SNP) weights from external reference panels. This approach makes it possible to leverage the rich ancestry information that is available from large external reference panels, without the administrative and computational complexities of re-analyzing the raw genotype data from the reference panel in subsequent studies.

RESULTS: We extensively validate our approach in multiple African American, Latino American and European American datasets, making use of genome-wide SNP weights derived from large reference panels, including HapMap 3 populations and 6546 European Americans from the Framingham Heart Study. We show empirically that our approach provides much greater accuracy than either the prevailing ancestry-informative marker (AIM) approach or the analysis of genome-wide target genotypes without a reference panel. For example, in an independent set of 1636 European American genome-wide association study samples, we attained prediction accuracy (R(2)) of 1.000 and 0.994 for the first two principal components using our method, compared with 0.418 and 0.407 using 150 published AIMs or 0.955 and 0.003 by applying principal component analysis directly to the target samples. We finally show that the higher accuracy in inferring ancestry using our method leads to more effective correction for population stratification in association studies.

AVAILABILITY: The SNPweights software is available online at http://www.hsph.harvard.edu/faculty/alkes-price/software/.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Year of Publication
2013
Journal
Bioinformatics
Volume
29
Issue
11
Pages
1399-406
Date Published
2013 Jun 01
ISSN
1367-4811
URL
DOI
10.1093/bioinformatics/btt144
PubMed ID
23539302
PubMed Central ID
PMC3661048
Links
Grant list
N01HC25195 / HL / NHLBI NIH HHS / United States
N02HL64278 / HL / NHLBI NIH HHS / United States
R01 HG006399 / HG / NHGRI NIH HHS / United States