Taking geography out of genetics

The Farnese Atlas sculpture depicts Atlas, the Titan of Greek mythology, shouldering the weight of the globe.
"The Farnese Atlas" sculpture depicts Atlas, the Titan of Greek mythology, shouldering the weight of the globe; at the Museo Archaeologico Nazionale in Naples, Italy.
Photo courtesy of Kathleen Cohen, WorldArt

Although humans differ in their genetic makeup by a mere 0.1%, this tiny slice of the genome really packs a punch — it houses the key genetic differences that can influence a person’s susceptibility to disease. Intensive efforts, known as "genome-wide association studies," are now underway to systematically identify these differences, but other sources of genetic variation, particularly those due to geographic ancestry, can muddle the genetic picture. In the July 23 advance online edition of Nature Genetics, a group of scientists led by David Reich, an assistant professor in the Department of Genetics at Harvard Medical School and a Broad associate member, describe a quantitative method that can correct for the errors introduced by such ancestry differences, known collectively as "population stratification." Importantly, this method appears to be more effective in a genome-wide context than previous corrective measures.

Genome-wide association studies aim to pinpoint the genetic differences that correlate with and perhaps play a causative role in a particular disease, by comparing DNA samples from a group of patients who share the disease (or other biological trait) to those who do not. Aside from their medical dissimilarities, though, the two groups may have differences in their geographic ancestry, which can also be detected genetically. While these ancestry differences typically amount to just 5-10% of all human genetic variation — itself a wee fraction of the genome — their effects on a genome-wide association study can be striking.

For example, a majority of patients within a so-called "disease group" may have ties to northwestern Europe, while the control group may have ancestry largely from southeastern Europe. If not properly accounted for, these geographic differences can serve as genetic "red herrings" in a disease association study. Scientists may identify them as systematic genetic differences that correlate with disease, when in reality, they are simply hallmarks of a particular geographic region and have no biological relevance. Methods for handling this population stratification are currently available, but limited in their application, especially to studies conducted on a genome-wide scale.

To address this problem, Alkes Price, a postdoctoral research fellow at Harvard Medical School and at the Broad, together with Reich and Nick Patterson, a senior research scientist and statistician at the Broad, developed a new quantitative approach known as "EIGENSTRAT." This technique enables researchers to identify genetic variation due to ancestry differences and to visualize it as a linear axis, or sometimes multiple axes, which can be superimposed on a map. For instance, individuals from both disease and control groups can be represented as points along a straight line, which might run from northwestern to southeastern Europe, with Great Britain having a value of +1, and Italy a value of -1. Once it is determined where each individual lies along this axis, the scientists can appropriately adjust for any ancestry differences between cases and controls.

The researchers rigorously tested EIGENSTRAT on various sets of simulated data, proving both its effectiveness in eliminating the false positives that arise from population stratification and its sensitivity for detecting the true genetic associations with disease. In collaboration with Broad researcher Robert Plenge and his colleagues, the scientists used EIGENSTRAT to analyze a set of real data from a disease study on rheumatoid arthritis (the Brigham Rheumatoid Arthritis Sequential Study, or BRASS). They performed a hypothetical association study for an unrelated condition, lactose intolerance, using a group of about 500 European Americans whose DNA was analyzed as part of the BRASS study, which is funded by Millennium Pharmaceuticals. Lactose intolerance often tracks with European ancestry, but is perfectly correlated with a known genetic variant in the lactase (LCT) gene on human chromosome 2, so it provided a strong test case for EIGENSTRAT. With the genetic profiles at this locus in hand, the researchers inferred who was likely to be lactose intolerant and who was not — in effect creating their own groups of cases and controls. They restricted their analysis to regions excluding chromosome 2 (and the LCT gene) and, without correcting for population stratification, found four genetic differences that showed a significant, albeit false, correlation with lactose intolerance. By applying previous methods for correcting population stratification, the researchers could eliminate some of these spurious associations. However, using EIGENSTRAT, they were able to remove all of the false positive findings.

Given its confirmed utility, the researchers are currently applying EIGENSTRAT to several genome-wide association studies now underway at the Broad. And though the genetic differences among humans amount to a few miniscule drops in the proverbial bucket of the human genome, with EIGENSTRAT's help, researchers can better distinguish the ones with the most medical significance.

Paper(s) cited

Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics advance online publication; doi:10.1038/ng1847