A massive approach to finding what's "real" in genome-wide association data
Genome-wide association studies (GWAS) have been a boon for geneticists by revealing thousands of genetic variants associated with human disease. At the same time, GWAS are the bane of geneticists because they reveal thousands of genetic variants associated with human disease. Which variants are the drivers, the ones that truly cause or contribute to disease development and progression?
"With GWAS, you get a set of signals, which can tell you which regions of the genome are associated with a particular disease or trait," said Vijay Sankaran, a Broad associate member and a pediatric hematologist/oncologist at Dana-Farber/Boston Children's Cancer and Blood Disorders Center who studies blood cell disorders. "But it's hard to know which hits are causal hits, and which are just going along for the ride."
The picture gets particularly complicated when talking about variants in non-coding DNA, including the vast stretches of DNA containing sequences that control gene expression. By some estimates, between 85 and 90 percent of the variants picked up by GWAS lie in such regions.
Many scientists are trying to figure out how to connect the dots between non-coding GWAS variants and human biology, health, and ultimately, disease. Three Broad teams, led by Sankaran, Pardis Sabeti, and Broad alum Tarjei Mikkelsen (now with the biotechnology company 10X Genomics), respectively, have focused their efforts on scaling up a staple of the genomics toolkit — the reporter assay — to create a massively parallel reporter assay (MPRA).
"We want to move from understanding the component pieces of the genome to understanding what changes in those components do," said Sabeti, an institute member and Harvard computational geneticist and evolutionary biologist, whose lab probes the role genetic variation writ large plays in human and microbial evolution. "We need very sensitive technology to be able to identify these functional changes, particularly if they're subtle."
The reporter assay helps scientists sift through GWAS data to find variants that truly affect gene expression or function. A researcher takes a DNA fragment from what may be an enhancer, couples it within a plasmid to a "reporter" gene that provides a readout (e.g., the luciferase gene), and inserts the plasmid into cells. If the readout materializes (e.g., if the cells glow), the enhancer sequence drove expression of the reporter. By running the assay with different variations of the same fragment, a pattern can emerge suggesting whether certain variants affect expression.
Such classic reporter assays, however, have one major disadvantage: They don't scale to the level needed to investigate the thousands to tens of thousands of variants that might turn up in a GWAS.
Mikkelsen and Broad research scientist Alexandre Melnikov worked out the principles of one flavor of MPRA while working in the lab of Broad founding director and president Eric Lander. In a 2012 Nature Biotechnology paper, they noted that tagging each plasmid with a short, unique DNA barcode provided a second readout. By sequencing and counting the mRNAs produced from each plasmid, they could easily identify the variant(s) with the greatest influence on gene expression and quantify the magnitude of that influence.
And because each barcode was unique to each plasmid, Mikkelsen and Melnikov's team could pool and assay thousands of variants simultaneously.
Homing in on blood cell traits
Sankaran's lab is the latest to make use of Mikkelsen and Melnikov's MPRA system, harnessing it to scrutinize more than 2,750 non-coding variants in 75 GWAS hits linked to red blood cell traits. And as he, Mikkelsen, and co-first authors Jacob Ulirsch and Satish Nandakumar reported in Cell, MPRA data pointed to 32 hits that actually had some impact on gene expression. They then used additional computational and functional assays to further probe the effects of a subset of these variants on red blood cell traits, as a result revealing that several known genes may have heretofore-unrecognized roles in blood cell development.
"One of the unexpected lessons we learned was that many of the variants tweaked a master blood development regulator, GATA1," said Ulirsch, a staff scientist in Sankaran's lab. "There was a common pattern. Going one by one, variant by variant, we would never have been able to see this."
Building MPRA 2.0
While Mikkelsen and Melnikov's original method is quite powerful, Sabeti's lab wanted to see if they could make it even more robust.
Clockwise from top left: Pardis Sabeti, Vijay Sankaran, Satish
Nandakumar, Ryan Tewhey, Jacob Ulirsch. Photo: Megan Purdum
"The original version of MPRA is limited in how many variants you can test," said Ryan Tewhey, a postdoctoral fellow in Sabeti's lab. "We wanted to know, can you expand this technology out? Can you test tens of thousands of variants at once? And can you make it more sensitive?"
Tewhey, Sabeti, and their team doubled the length of each DNA barcode and upped the number of barcodes to as many as 350 per variant. They then used their enhanced assay to study more than 32,000 possible B cell regulatory variants identified by the 1000 Genomes Project, deeply characterizing one associated with risk of ankylosing spondylitis (an autoimmune disease). They also highlighted another 842 candidate variants, including 53 particularly promising ones associated with human traits and diseases.
As they discussed in their own Cell paper, the added barcodes reduced the noise in their data and increased the assay's overall sensitivity.
"With more barcodes you can start to detect more subtle changes in expression, including changes that might arise from differences between alleles," Tewhey added.
Another view into regulation
MPRA isn't the only approach for pulling causal needles out of GWAS haystacks, and Tewhey is realistic that it won't be a panacea for studying all of the cell's mechanisms for regulating expression.
"For promoters and enhancers, we know it works well," he said. "For things related to long distance connectivity or the genome's shape, we're not as confident. "
Sankaran points out that MPRA really shines in its ability to find themes in genetic variation that researchers can marry to other genetic, structural, or functional data.
"When you start to get all these independent pieces together, you get a real fine view of what's important," he said.
Melnikov A, Murugan A, et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nature Biotechnology. February 26, 2012. DOI: 10.1038/nbt.2137
Ulirsch JC, Nandakumar SK, et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell. June 2, 2016. DOI: 10:1016/j.cell.2016.04.048
Tewhey R, Kotliar D, et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell. June 2, 2016. DOI: 10:1016/j.cell.2016.04.027