Clearing a path to disease genes
Image by Bang Wong, Broad Institute
In the search for disease genes, scientists now have the ability to scan the human genome for variations that occur more often in people with a disease than without. The motivation behind this new capability springs from the simple notion that DNA variations found more frequently in individuals with a disease are likely to contribute to that disease. But most genetic variations do not confer disease risk, and those that do often have modest effects, making it hard to recognize the truly causal variants. That can be a complicating factor in genome scanning efforts, also known as genetic association studies, which have skyrocketed in the past year.
Some of this complexity flows from genetic ancestry. In association studies, individuals who share the same self-identified ethnicity are often studied as part of a uniform population, when in fact they reflect a genetic diversity shaped by immigration and other influences on human history. When this genetic diversity is not accounted for, association studies can lead to spurious results, identifying variants that are markers of ancestry rather than markers of disease. A research team, co-led by Alkes Price, David Reich, and Joel Hirschhorn at the Broad Institute, Harvard Medical School, and Children’s Hospital Boston, has developed a method to alleviate some of these complexities in mapping disease genes in one such diverse population, European Americans. The results appear in the January 18 issue of PLoS Genetics.
Indeed, the fraction of the genome that varies among people is strikingly small. And within that tiny fraction, ancestry accounts for less than one-tenth of human genetic variation worldwide — and less than one-hundredth of the genetic variability among European Americans. Surprisingly, those minute differences can have an important effect on association studies. That is because the disease-related variants that these studies aim to identify — single-letter DNA changes known as single nucleotide polymorphisms, or SNPs — can have effects that are as small as the false associations that stem from subtle, unrecognized differences in ancestry.
“When you’re designing a study to detect subtle effects, subtle things become important,” said Reich, co-senior author of the study, associate member of the Broad Institute, and an associate professor at the Harvard Medical School Department of Genetics. Ideally, people with disease and healthy individuals in these studies would be precisely matched in terms of geographical origin, so that disease is the only variable. But, as Reich explained, the problem is they cannot be perfectly matched.
To account for ancestry, also described as “population stratification,” Alkes Price, a postdoctoral research fellow at Harvard Medical School and the Broad Institute and first author on the current study, together with Reich and Nick Patterson, a statistician at the Broad, in 2006 developed a statistical method called EIGENSTRAT. With the technique, which has already contributed to several published genome-wide association studies, researchers can visualize and measure the genetic variation due to geographical origin, rather than relying on subjects’ self-reported ancestry.
EIGENSTRAT is designed specifically for large-scale studies — those that test 100,000 or more SNPs in many thousands of people. That means it is ineffective in smaller, more targeted studies involving fewer genetic variants, such as those that focus on just a few hundred of the 20,000 or so human genes, or a small number of promising SNPs found in whole-genome studies that require further study.
To create a new method that could be used in small-scale projects, Price and Reich teamed up with co-senior author Joel Hirschhorn, an associate member of the Broad Institute, a pediatric endocrinologist at Children’s Hospital Boston, and an associate professor of genetics at Harvard Medical School. First, the team systematically examined data from four disease association studies in European Americans to describe and characterize the ancestry differences that could lead to false connections with disease.
They discovered that they could avoid most of the spurious results by measuring how closely an individual’s genetic ancestry resembles that of three populations: northwest European, southeast European, or Ashkenazi Jewish. Although far from providing a complete picture of European Americans, it appears that for genetic analyses of this population, such a description is adequate for studies aiming to find disease genes. “It doesn’t say anything fundamental about these distinctions in the history of the United States,” explained Reich. “It simply means that in the European American studies we looked at, estimating ancestry in this way captured most of the information relevant to disease gene mapping.”
To turn their findings into a useful tool for other researchers, the team identified a panel of 583 SNPs that were the most related to ancestry. Testing the panel in samples from seven countries, they were able to narrow it to just 300 markers. Compared to self-reported ancestry, which is not always available in disease gene mapping studies, the panel provides more accurate information. It is also cheap to test in the laboratory, making it a useful tool for targeted association studies. Another use of the panel is to make sure that cases and controls are equally matched in their ancestry before beginning a large study.
Although the ability to examine ancestry genetically has value in mapping disease genes, self-described ethnicity is still an important variable in epidemiological studies. “Although these 300 markers give a reasonable estimate of the major components of genetic ancestry in European Americans, self-described ethnicity can still reflect environmental, social and cultural factors that may not be captured by estimating genetic ancestry,” said Hirschhorn. “Because the genetic differences between these populations are very small, our study is most important for helping in gene discovery efforts, which will lead to a better understanding of human biology in health and disease, and hopefully improved care for all patients over the long term.”