A unique collaboration between Cambridge and Silicon Valley
Five years ago, scientists studying human genetic variation welcomed the arrival of a new tool: a tiny device known as a DNA microarray, or chip, that can identify thousands of single-letter changes in the human genome at once. Following a wave of technological advances, that chip quickly evolved into one that simultaneously scans over 1.8 million genetic markers. The new chips provide a detailed picture of the genome and enable powerful whole-genome studies of DNA’s contribution to disease risk.
The most advanced chips now available from Affymetrix — dubbed the Genome-Wide Human SNP Arrays 5.0 and 6.0 — are the fruits of a unique collaboration between the company and scientists at the Broad Institute of MIT and Harvard, including Medical and Population Genetics director David Altshuler, Genetic Analysis Platform director Stacey Gabriel, senior associate member Mark Daly, and many others. (Daly and Altshuler are also Harvard faculty members at Massachusetts General Hospital.) By working together over the past several years, sharing scientific insights and experimental results, the researchers engineered two new chips that have put an innovative technology into the hands of the wider scientific community.
These chips are part of a long history of collaborative work between the Broad Institute and Affymetrix. For years, Broad scientists have utilized the company’s tools in a variety of genomic studies and, in the process, offered their thoughts on everything from product manuals to packaging. “With so many people studying genomic variation, the Broad is a ripe testing ground for new technologies,” said Finny Kuruvilla, one of the collaborating scientists at the Broad.
Two years ago, Broad scientists’ feedback touched on a new topic — microarray design. Kuruvilla, then working as a postdoctoral fellow in Altshuler’s laboratory, had been toiling away on a challenging project aimed at improving the analysis software for the lab’s most popular tool at the time, the Affymetrix GeneChip® Human Mapping 500K Array Set — two chips that together analyze a half-million single-letter changes in the genome, or SNPs.
Looking closely at the 500K array, Kuruvilla noticed that the array’s probes — short strands of DNA tethered to the chips — were highly variable in their ability to genotype a SNP. That observation led him to the discovery that each SNP could be detected not with the typical 30 probes, but instead with an average of only 8 probes. The upshot of this fine-tuning was that the same number of SNPs targeted by the 500K’s two chips could now be interrogated using one-half of one chip, freeing the other half for new probes.
Another postdoctoral researcher in Altshuler’s lab, Steven McCarroll, had been investigating a newly appreciated form of common human genetic variation. Instead of a change in DNA sequence, like a SNP, this type involves structural changes that alter the number of gene copies. Such missing or extra genes can have medical relevance, as they may lead to too much or too little of critical biochemicals in the body. An extreme example is Down’s syndrome, in which an entire chromosome is duplicated and hundreds of genes are copied.
McCarroll and others had uncovered the surprising prevalence of this kind of variation among normal individuals, and were eager to explore — on a genome-wide level — its potential role in common diseases. (Broad associate member Charles Lee, also a Harvard faculty member at Brigham and Women’s Hospital, was one of the first to recognize the extent of this form of variation in a landmark paper published in 2004.) But progress was limited by the lack of effective research tools.
Unlike tools for studying SNPs, the existing methods for analyzing deletions and duplications lacked accuracy and resolution. One method, called comparative genomic hybridization, detects copy number changes on a genomic scale, but cannot precisely determine their location. In another approach, SNP chips like the 500K are used to infer differences in copy number from the characteristic “footprints” that extra or missing DNA often leave behind in SNP data. But the measure is indirect. Moreover, SNP probes that cover DNA regions with copy number variation often yield unusual-looking data, so chip-makers unknowingly excluded the majority of such probes from the early chip designs.
McCarroll thought that it should be possible to make “hybrid” arrays that would also feature what he called “copy-number probes” — a distinct set of probes that were optimized for measuring copy number across the genome, independent of where the SNPs were located.
As they discussed these concepts, McCarroll, Kuruvilla, and Altshuler realized that they could combine Kuruvilla’s pared-down collection of SNP probes with McCarroll’s idea of “copy-number probes”. They proposed what was then a very different design of the industry’s SNP arrays, envisioning an array that could detect SNPs with Kuruvilla’s probes and copy number with McCarroll’s probes — all in the same sample. Rather than buying different arrays to learn about SNPs and copy number variation, scientists could use the “hybrid” arrays to measure all the variation in a single experiment.
The plan became known at Broad as the “one-chip” project, in the sense of both reducing a two-array system to one and combining two different types of variation on one chip.
“To ask scientific questions on a large scale, we needed a way to systematically genotype copy number variants,” said McCarroll, “and the extra room freed by Finny’s work gave us the opportunity.”
To design the new chip, Kuruvilla and McCarroll teamed up with Joshua Korn, a graduate student in the Harvard University Biophysics Program. In theory, their design would allow researchers to simultaneously test both kinds of DNA variation in one experiment. “It was like chocolate and peanut butter,” said McCarroll. “The two ideas went really well together, and it became a concept to propose to Affymetrix.”
To their colleagues in industry, the plan seemed like a good one, but it needed a testable prototype. “There was a lot of sound reasoning behind the approach, so there was a lot of hope that it would work,” said Simon Cawley, the director of the Algorithms and Data Analysis Group at Affymetrix. Yet, “it just needed to pass the sanity test of working not just in theory, but in practice.” Cawley and fellow scientists at the company had been developing their own successor to the 500K array system, an upgraded 2-chip system. But discussions with Altshuler and Broad Institute director Eric Lander convinced them that the Broad scientists’ approach also held potential.
Affymetrix agreed to build a prototype, but Kuruvilla, McCarroll, and Korn needed to meet the faster pace of industry. The team was given just one week to prepare a design — in other words, to choose the SNP- and CNV-directed probes that would be placed on the new microarray. After a frenzied week of work, the team met their deadline and Affymetrix soon built the first one-chip.
With both the Broad’s microarray and their own newly developed array system in hand, Affymetrix scientists performed a head-to-head comparison. The Broad one-chip performed better at calling SNPs, and what had begun as a custom array to fulfill their own scientific needs quickly became a research tool that was used across the field of human genetics. Affymetrix recognized the one-chip’s value in the field of genomics and soon brought it to market as the Genome-Wide Human SNP Array 5.0.
About four million of the probes on the 5.0 chip probed the same SNPs tested on the 500K — nearly a half-million single-letter differences across the genome. The chip also contained a half-million “copy-number” probes — short in length and distributed throughout the genome.
Importantly, having this number of copy-number probes allowed the locations of CNVs to be studied at a physical resolution that was comparable to the size of individual genes, a dramatic increase in resolution for studies of gene copy number.
While the data on the array held the promise of accurately measuring SNPs and copy number changes, a new bottleneck arose: the lack of any computer algorithms designed to take advantage of the array’s dual capabilities. The team of Broad researchers went back to the lab and developed a new set of data analysis tools named Birdsuite, which can not only identify SNPs and CNVs individually, but also produce an integrated genotype that incorporates both types of variation. For example, instead of simply determining if a person has an AA, AT, or TT at a given location in his or her genome, the integrated genotype provides a richer label: a single-letter (e.g., A or T) signals a deletion, while three or more letters (e.g., AAAA or ATT) signal a duplication and reveal which version was duplicated.
Continuing the collaboration, Affymetrix soon developed a follow-up to the 5.0, the Genome-Wide Human SNP Array 6.0, with a total of 1.8 million markers — some 900 thousand SNP probe sets (with 6-8 probes each) and another 900 thousand copy-number probes. Introduced in May 2007, the 6.0 array is now a best-seller for Affymetrix and a widely-used tool for ongoing whole-genome studies of diseases like coronary artery disease, schizophrenia, and other complex illnesses.
“The extent to which copy number changes contribute to disease has been a matter of speculation up to this point,” said McCarroll. “But what’s exciting about this array is we can now test this hypothesis in one disease after another, in large numbers of people.”
Only through a close alliance between academic research and industry was the new technology realized. According to Cawley, the Broad-Affymetrix relationship was a productive one, both in terms of developing new products and serving as intellectual stimulation for the company’s developers, who enjoyed working with investigators on the cutting edge of science. “The Broad Institute is a great example of some of our most demanding and sophisticated customers,” said Cawley. “They’ve helped ensure that we are developing products that customers really want.”
Joining other powerful genomic tools currently in use at the Broad, the chips expand the diversity and scale of biological questions that researchers can tackle. And with several whole-genome studies of sequence and copy number variation underway at the Broad and elsewhere, this collaboration will surely continue to bear fruit.
Broad researchers contributing to this work also include James Nemesh, Alec Wysoker, Paul I W de Bakker, Amanda L Elliott, Melissa Parkin, Robert Handsaker, Marcia Nizzari, and Shaun Purcell.
Note to the reader: Neither the Broad Institute nor the researchers involved in this work have a financial relationship with Affymetrix beyond one of a vendor and customer. No Broad researchers receive any royalties or compensation based on the sale of Affymetrix products.
McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH, deBakker PIW, Maller J, Kirby A, Elliott AL, Parkin M, Hubbell E, Webster T, Mei R, Handsaker R, Lincoln S, Nizzari M, Blume J, Jones K, Rava R, Daly MJ, Gabriel SB, Altshuler DM. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genetics, advance online publication. September 7, 2008. DOI: 10.1038/ng.238.
Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, Hubbell E, Veitch J, Collins PJ, Darvishi K, Lee C, Nizzari MM, Gabriel SB, Purcell S, Daly MJ, Altshuler DA. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genetics, advance online publication. September 7, 2008. DOI: 10.1038/ng.237.