You are here

Bioinformatics DOI:10.1093/bioinformatics/btt144

Improved ancestry inference using weights from external reference panels.

Publication TypeJournal Article
Year of Publication2013
AuthorsChen, C-Y, Pollack, S, Hunter, DJ, Hirschhorn, JN, Kraft, P, Price, AL
Date Published2013 Jun 01
KeywordsAfrican Americans, Continental Population Groups, European Continental Ancestry Group, Genome-Wide Association Study, Genotype, HapMap Project, Humans, Polymorphism, Single Nucleotide, Principal Component Analysis, Software, United States

MOTIVATION: Inference of ancestry using genetic data is motivated by applications in genetic association studies, population genetics and personal genomics. Here, we provide methods and software for improved ancestry inference using genome-wide single nucleotide polymorphism (SNP) weights from external reference panels. This approach makes it possible to leverage the rich ancestry information that is available from large external reference panels, without the administrative and computational complexities of re-analyzing the raw genotype data from the reference panel in subsequent studies.

RESULTS: We extensively validate our approach in multiple African American, Latino American and European American datasets, making use of genome-wide SNP weights derived from large reference panels, including HapMap 3 populations and 6546 European Americans from the Framingham Heart Study. We show empirically that our approach provides much greater accuracy than either the prevailing ancestry-informative marker (AIM) approach or the analysis of genome-wide target genotypes without a reference panel. For example, in an independent set of 1636 European American genome-wide association study samples, we attained prediction accuracy (R(2)) of 1.000 and 0.994 for the first two principal components using our method, compared with 0.418 and 0.407 using 150 published AIMs or 0.955 and 0.003 by applying principal component analysis directly to the target samples. We finally show that the higher accuracy in inferring ancestry using our method leads to more effective correction for population stratification in association studies.

AVAILABILITY: The SNPweights software is available online at

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Alternate JournalBioinformatics
PubMed ID23539302
PubMed Central IDPMC3661048
Grant ListN01HC25195 / HL / NHLBI NIH HHS / United States
N02HL64278 / HL / NHLBI NIH HHS / United States
R01 HG006399 / HG / NHGRI NIH HHS / United States