Extending the map of human genetic variation
The field of human genetics is in a state of transition. Less than ten years ago, sampling hundreds of sites in the genome was a costly and time-consuming undertaking that required researchers to design individual tests for every base that they wanted to identify. Today, scientists can peer at the genome at a higher resolution, sampling millions of sites at a fraction of the cost or sequencing across the whole genome instead of just a subset of sites.
This latter technique will be used in the 1000 Genomes Project, the full results of which are still several years away. In the interim, scientists have used the latest genotyping technology to generate data and increase the resolution of the map of human genetic variation. The results of their latest endeavor, known as HapMap3, appear in the latest issue of Nature.
HapMap3 represents the next chapter of the International HapMap Project, spanning 11 global populations. (The original project looked at four populations.) The HapMap Project gets its name from the word haplotype, which refers to “blocks” of genetic variants that are inherited together. This new phase of the project was made possible by advances in genotyping technology, the tools that allow researchers to read SNPs and CNVs, DNA variants sprinkled throughout the genome. Only a few years ago, the most advanced SNP “chips” – small pieces of silicon glass – contained 500,000 pieces of DNA used to probe the genome. That number doubled by the beginning of the HapMap3 project.
“When the one million SNP chips became available, this was a really great way to be able to expand the HapMap into these other populations so that we could start to ask more questions,” said Stacey Gabriel of the Broad Institute and a member of the HapMap Project steering committee. Such questions included how patterns of genetic variation differ among populations and how researchers can more accurately define these patterns.
The researchers collected samples from over 1,000 people. By studying more genomes, they were able to collect more precise data on patterns of lower frequency variants. In the initial project, researchers looked for SNPs that appeared in more than five percent of the population. In this study, the patterns of tested variants that appear in 1-5% of the world’s population could be more precisely evaluated.
These data have already been invaluable for genome-wide association studies (GWAS), which are often conducted on thousands of individuals. With this many participants, it’s possible to evaluate the role of less common variants in disease. Additionally, researchers were able to see how much these low frequency variants changed across populations, which other groups could follow up on to learn more about recent human evolution.
“SNPs that have been studied so far give us information about broad-scale patterns, but as you get lower and lower frequency, you’re basically looking at more and more recent history about how the population developed and whether it was replaced by subsequent waves and how much mixing or migration there was,” explains Stephen Schaffner, a member of the manuscript writing group and a computational biologist for the Broad Institute’s Program in Medical and Population Genetics.
The project is considered “hypothesis generating” research as opposed to hypothesis driven – the scientists approached the samples without preconceived theories about what they would find and new avenues of investigation for follow-up studies emerged. “There’s a number of novel sites and loci where it looks like natural selection has occurred in populations that haven’t been looked at before,” said Schaffner. “There are long lists of things that we don’t know more about, and those are research opportunities.”
The data, which have been made freely available to the scientific community since the project began, inform genome-wide association studies (GWAS) and particularly aid studies that aim to combine multiple GWAS. Mark Daly, co-director of the Broad’s Program in Medical and Population Genetics, describes the data from HapMap3 as an anchor for these analyses. “Sample expansion of the HapMap data has been instrumental in providing the backbone of GWAS meta-analyses,” he said. “We have been able to take substantial steps forward particularly in complex phenotypes like schizophrenia and diabetes by using this dataset as the connective baseline for large numbers of studies performed worldwide.”
The project also points to the path ahead. “HapMap3 was important to create a framework to let us know how well we can detect rare variation using, at that time, current SNP arrays and the answer was that we really need more sequencing, more individuals, and more depth within the population,” said Gabriel. “That was a very strong impetus then for moving into whole genome sequencing of many different samples, which is what the 1000 Genomes Project is doing.”
HapMap3’s results support the approach taken by the 1,000 Genomes Project, which aims to sequence 2,000 samples four times each (4X coverage) to discover new SNPs. “Mostly the news was what [researchers involved in the 1,000 Genomes Project] were hoping to see. Their sampling strategy will probably do a pretty good job for much of the world’s populations,” said Schaffner.
To accurately sequence every base in a genome without consulting previous data, researchers would have to sequence a genome at least 30 times (30X coverage). However, researchers can use data from HapMap3 as a reference for the 1,000 Genomes Project samples, allowing them to sequence at relatively low coverage. “The ability to draw on the information from HapMap3 data enables us to capture almost the full value of an extremely deep whole genome sequence without having to sequence to that great a depth,” said Daly.
HapMap3 also creates a framework that will help scientists perform imputation, a way of predicting un-genotyped SNPs. Just as the human brain can fill in letters that are missing from words on a page, imputation helps researchers predict unknown variants based on their known context. “You look at a series of SNPs to make predictions about SNPs that lie between them. You’re inferring what SNP was likely there because you’ve seen the pattern before in some reference data,” said Schaffner.
Gabriel, who began working on the HapMap Project in 2003, said that advances in technology have rapidly changed the cost and time it takes to genotype or sequence samples. “I would never have considered that we’d be doing whole genome sequencing at the pace we are right now when we were starting the HapMap Project,” she said. “It’s a very exciting time.”
Other Broad researchers who contributed to this project include David Altshuler, Paul de Bakker, Joshua Korn, Steven McCarroll, James Nemesh, Samuela Pollack, Wendy Brodeur, Huy Nguyen, Melissa Parkin, Ilya Shlyakhter, and Pardis Sabeti.
The International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature, 2010. Doi:10.1038/nature09298 Published online 02 September 2010