HapMix: A tool for finding genetic diversity

When researchers from the Broad Institute and the Department of Human Genetics at Harvard University set about the task of pinpointing ancestral diversity in African Americans, the first tool they used for the job was the HapMix software engine . HapMix is a software tool that helps researchers...

When researchers from the Broad Institute and the Department of Human Genetics at Harvard University set about the task of pinpointing ancestral diversity in African Americans, the first tool they used for the job was the HapMix software engine. HapMix is a software tool that helps researchers infer the ancestry of extremely small bits of DNA. “It is a method for reconstructing the mosaic of African and European ancestry that is present in each African-American,” explains David Reich, associate member of the Broad Institute and assistant professor at Harvard Medical School's Department of Genetics and one of the co-writers of the HapMix program. “It is used for determining the ancestral parentage of an African-American person’s genome.” Within an African-American genome, any particular portion may be derived from two European segments, two African segments, or be a mixed segment of both.

HapMix works by identifying genetic variations that are of very different frequencies between Africans and Europeans. There is actually very little overall genetic differentiation between human populations. At any one site, in fact, there may be zero difference. But when examining 1,000 points right next to each other, evidence piles up about ancestry.

HapMix statistically combines the information for many sites that are next to each other, each of which by itself provide very weak evidence about ancestry, but when combined provides very strong inference about whether it is of African or European ancestry. HapMix was created by David, Simon Myers, a former senior post-doctoral fellow at the Broad now at the University of Oxford, along with Broad Institute colleagues Alkes Price, Nick Patterson, and Arti Tandon. The development team published the open source tool in a 2009 PLoS Genetics paper and it has since become widely used globally for genetic variation studies, including this week’s published study in Nature on the ancestral landscape of African-Americans. (Read the Broad news story here.)

Thanks to tools like HapMix, scientists can home in on extremely small differences in the DNA of different people known as single nucleotide polymorphisms (SNPs) and identify their ancestral origin. “If you look at just one area, you can’t come up with a clear result,” David explains. “But ultimately combining the probabilities from millions of different DNA differences, one can obtain very strong evidence of ancestry.” The type of approach they used to do this, known as a Hidden Markov Model (HMM), is a standard way of combining data from a lot of weak evidence at neighboring sites to provide stronger evidence from the combination.