David Altshuler comments on the HapMap

It is my great pleasure to share with you some of the exciting results presented in the paper published today in Nature, describing the data and analysis of Phase I of the HapMap Project. I first want to share a few personal thoughts as a physician and scientist about why so many of us choose to spend our professional lives studying DNA variation and its role in human health.

Despite spectacular progress in biomedical research, it is sobering to realize that we remain largely ignorant of the underlying root causes of common diseases, responses to the environment, and variation in treatment outcomes. Until we can explain why one person gets bipolar disease and another does not — why one person given an antihypertensive medication has a satisfactory lowering of blood pressure, while another has a side effect — we will be at a great disadvantage in trying to improve methods to prevent, diagnose and treat medical problems.

One thing we do know is that perhaps half of inter-individual risk of most diseases is due to inherited differences in DNA sequence. As one of the most promising clues in all of biomedicine — and one made tractable to study by astonishing advances in genomic science — the identification of genes that contribute to human health is a remarkable opportunity for biomedical research.

The paper in Nature describes a resource and set of analyses that help make possible a new approach to this fundamental problem. This approach has deep roots in the history of human genetics and in epidemiology, and was first conceptualized in its current form nearly a decade ago. The approach is utterly simple: to systematically test each genetic variation in the population for its frequency in people with and without a clinical endpoint. While a complete version of this test is not yet possible, recent advances have shown that if we can sample a subset of all genetic variants, the vast majority of information about common variants can be extracted. What was missing was a catalogue of common variants with information about the frequencies and patterns in the population, and technologies to test these variants efficiently in large patient samples.

The paper in Nature documents a number of important findings: that the vast majority of all genetic variation in each individual is due to a finite set of common variants, and that the public database now contains enough information to capture essentially all of these common variants. Moreover, we find that variants are nearly always correlated to many of their neighbors, such that testing a subset is adequate to capture the information about the whole. Finally, the genome-wide HapMap has made possible the selection of so-called "tag SNPs" that efficiently and powerfully capture this information for use in disease studies.

All this information is exciting, but would be of little practical relevance if we weren't able to now test the tag SNPs in large patient samples. Technology has been spurred on by (and in turn made possible) the HapMap Project, and so many such studies in patient samples are now ongoing.

The other set of results I'd like to highlight from the paper relate to human evolution. When a mutation improves the evolutionary fitness of those who inherit it, a fossil record is left in the DNA. Until now, scientists searching for such fossil records had to start with a hunch that a particular gene might have been important to evolution, and then try to build a case. Since there are some 20,000 genes, however, and our hunches aren't that good, this was a piecemeal approach.

The data from HapMap make it possible to search for such evidence without preconceived notions about which genes might have been important — in essence, we can peer into the data and let it tell us which genes mattered to the evolution of our species.

As shown in the paper, two of the top results in such an analysis are already known to have been important to human evolution: the beta globin gene responsible for sickle cell anemia and protection from malaria, and the lactase gene that explains the ability to eat dairy products into adulthood. The positive results in our analysis for these well validated examples are reassuring that an analysis based on HapMap data can reveal examples of natural selection. What is truly exciting is that dozens of other genes are identified for the first time as possibly having been important in human evolution. And while much additional work is required to confirm and extend these findings, they represent the start of a new era in which we can unveil important factors that mattered in human evolution — and possibly in medicine today — directly from the DNA record.

Those two advances — that we now have the information and technology to perform more powerful studies of the root causes of common diseases, and insights into the genes that were shaped by human evolution — are why I am so honored to have been part of this project.