Charting the genome’s landscape

Copy number variationThe DNA copy number in a given region of a chromosome varies from person to person.
Image courtesy of Bang Wong, Broad Communications

During the Age of Discovery, explorers set out to investigate uncharted lands and cultures. They took with them the best maps of the era, but discoveries made on their journeys helped produce even better ones.

Like these early explorers, modern geneticists also have a need for good maps, especially when hunting for disease genes that are missing or duplicated in some people. A recent survey of the human genome suggested that around 12% of human DNA was subject to this type of fluctuation, known as copy number variation. Now, a new, more detailed map has sharpened that initial view, dramatically reducing the fraction of the genome estimated to vary in copy number. Moreover, just as the Haplotype Map, or HapMap, has allowed scientists to connect DNA misspellings to disease, the new survey may help reveal the disease risk due to commonly missing or extra genes. The new work, which appears in Nature Genetics, is led by senior author David Altshuler, director of the Broad’s Program in Medical and Population Genetics, and a professor of genetics and medicine at Harvard Medical School and Massachusetts General Hospital. The Broad’s Genetic Analysis Platform provided critical expertise in genotyping and informatics to make possible the analysis.

The study’s researchers served as both toolmakers and cartographers, first collaborating with industry partner Affymetrix to design a pair of innovative new gene chips, which can identify variations in both DNA sequence and gene copy number. With the surveying tool in hand, the team set out to refine the charted landscape of variation in the human genome. Previous estimates involved scanning the genome with genetic probes over 150,000 bases, or letters of DNA, long. Probes on the new chips are much more numerous, and 6000 times smaller — a mere 25 bases long — giving the team enhanced power to create a map of unprecedented detail. Older methods could only narrow down suspect regions of DNA to 100,000 bases, while the close genomic spacing of probes on the new chips can pinpoint variants a few thousand bases long, which is roughly the size of individual genes.

By scanning the genomes of the same individuals recruited for the HapMap Project, the team discovered that areas previously thought to be commonly deleted or duplicated had been overestimated in size. “The portion of the genome that’s copy number variant is probably an order of magnitude less than we once thought,” said Steven McCarroll, co-first author on the new study and a postdoctoral researcher in Altshuler’s lab.

The new analysis has changed the view of copy number variation. “Before this work, the consensus was that there were thousands of copy number variants that were 50,000 bases long or more and that collectively involved thousands of genes,” said McCarroll. On the new map, these variants appear to be 5 to 10 times smaller than originally described, meaning fewer genes are affected. For example, in one area of the genome, the original map estimated that 20 genes are deleted, while the new work shrinks that area to a mere 2 genes.

Another insight that emerged from this work was that most of the copy number differences between any two individuals arise from so-called polymorphisms that were once ancient mutations in the ancestors of current humans and that have now become common among people. Previous studies had assumed that most CNVs were recurring mutations that had occurred recently. In light of the new findings, scientists now know that these old mutations are strongly linked to nearby genetic markers, which will enable powerful ways of studying disease risk.

Even before it was published, the new map had already begun to yield fruit. Broad senior associate member Mark Daly (also a faculty member at Harvard and MGH) and colleagues had recently identified a flurry of genomic regions related to Crohn’s Disease. Scanning an early draft of the new map, McCarroll noted that one of the Crohn’s-related regions, near a gene called IRGM, also harbored a copy number change. In a separate paper published last week in Nature Genetics, McCarroll, Daly, and colleagues showed that this copy number variant is likely a causal mutation influencing inflammatory bowel disease, doing so by changing the tissues in the body in which IRGM is active.

With such maps in hand, researchers can now scan the entire human genome to assess the relationship between gene copy number and disease, similar to what can be done for DNA sequence changes known as SNPs. In the past few years, researchers have been able to use high-throughput gene chips and knowledge from the HapMap to connect over 150 SNPs with common diseases like diabetes, heart disease, and Crohn’s disease. “I see the field of copy number variation as being where SNP studies were a few years ago,” said Finny Kuruvilla, co-first author of the new study and a postdoctoral researcher in Altshuler’s lab. With the new map and gene chips developed by Kuruvilla and his colleagues, scientists worldwide can better navigate the terrain of copy number variation and explore its role in disease.

Other scientists at the Broad Institute who contributed to the work include Joshua Korn, James Nemesh, Alec Wysoker, Amanda Elliott, Melissa Parkin, Robert Handsaker, Marcia Nizzari, Mark Daly, and Stacey Gabriel. Additional members of the team were Michael Shapero, Earl Hubbell, Teresa Webster, Rui Mei, Steve Lincoln, John Blume, Keith Jones, and Rich Rava from Affymetrix, Inc., and Julian Maller and Andrew Kirby from the Center for Human Genetic Research at Massachusetts General Hospital.

Paper(s) cited

McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH, deBakker PIW, Maller J, Kirby A, Elliott AL, Parkin M, Hubbell E, Webster T, Mei R, Handsaker R, Lincoln S, Nizzari M, Blume J, Jones K, Rava R, Daly MJ, Gabriel SB, Altshuler DM. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genetics, advance online publication. September 7, 2008. DOI: 10.1038/ng.238.

McCarroll SA, Huett A, Kuballa P, Chilewski SD, Landry A, Goyette P, Zody MC, Hall JL, Brant SR, Cho JH, Duerr RH, Silverberg MS, Taylor KD, Rioux JD, Altshuler D, Daly MJ, Xavier RJ. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nature Genetics: 40; 1107-1112. DOI: 10.1038/ng.215.