Connecting the genetic dots of disease
DAPPLE is an algorithm that can help researchers examine possible
networks and draw meaningful conclusions from GWAS data by
looking at the physical interactions among proteins. The image here,
generated using PLODA (Plot Dapple), shows connections to one
gene: IL23R, which is associated with Crohn's disease.
Image courtesy of Stephan Ripke
When Liz Rossin began the PhD portion of the Harvard/MIT MD-PhD program in the lab of Mark Daly, a ubiquitous and critical problem in genetic research caught her attention. At the time, researchers had identified more than 150 genetic regions scattered throughout the genome tied to various diseases. Such experiments, known as genome-wide association studies (GWAS), would turn up dozens of regions likely harboring genetic changes contributing to risk of disease, but would not point to specific causal mutations. In order to understand the underlying biology leading to disease, Rossin and her colleagues wanted to identify the connections among these regions.
“That’s the problem we were interested in,” says Rossin, who is now finishing medical school at Harvard. “We have all of these regions in the genome associated with disease, but we don’t necessarily know what they mean or what they have to do with one another.”
Inspired by this challenge, Rossin went on to create DAPPLE – Disease Association Protein-Protein Link Evaluator – an algorithm that can help researchers examine possible networks and draw meaningful conclusions from GWAS data by looking at the physical interactions among proteins. DAPPLE has been used to evaluate data for a variety of diseases including Alzheimer’s disease, rheumatoid arthritis, autism, and more.
Most recently, Broad researchers and collaborators used DAPPLE in a study of inflammatory bowel disease, an illness that includes Crohn’s disease (CD) and ulcerative colitis (UC), both inflammatory diseases of the gastrointestinal tract (a paper detailing the results was published in Nature this week – you can check out a Broad press release here or the original paper here). By combining raw data from studies of CD and UC and adding newly collected genetic information, the team was able to more than double the number of genetic regions tied to disease. To draw conclusions from these 160 genetic results, the researchers turned to DAPPLE.
“DAPPLE is a powerful tool for this kind of project,” says Stephan Ripke, one of the first authors of the Nature paper and a researcher at the Broad Institute and Massachusetts General Hospital. “It allows you to build networks and then determine which of those networks make the most sense.”
DAPPLE is a tool of permutation, meaning that it takes proteins in protein-protein interaction databases and rearranges them again and again so that when researchers build a network – say, from 160 regions of the genome tied to IBD – they can compare it to what would be expect by random chance. When Rossin created DAPPLE in Daly’s lab, she teamed up with Kasper Lage, now a researcher at the Broad Institute, and Chris Cotsapas, now an assistant professor at Yale, to populate the algorithm with a list of hundreds of thousands of known protein-protein interactions. Rossin describes these interactions as the “workhorses of the cell” that set off important physical and chemical reactions. DAPPLE uses these known interactions to test possible networks and even predict new associations. In the case of IBD, DAPPLE helped the researchers narrow in on a protein network that implicated genes known to influence how the body responds to pathogens that cause diseases like tuberculosis and leprosy.
“DAPPLE will make an educated guess about all of the genes in a region that could be playing a role in disease, and it will do it for each region, and then ask if the network that it has generated is more connected than would be expected by random chance,” Rossin explains. “If the answer is yes, it will tell you which genes are driving that connection.”
Researchers can then follow up on these critical, driving genes. In previous studies of autism, DAPPLE turned up hits in pathways involved in chromatin remodeling.
“With DAPPLE, you can go from a big list of regions to a narrowed down list of genes that may be very important for your disease,” says Rossin.
Rossin has been impressed by the many uses researchers have found for the algorithm she helped create, several of which she could not have envisioned when DAPPLE was originally built.
“When we created DAPPLE, we wanted to make the tool as accessible as possible, so we built a website that anyone can use – you don’t even have to download it,” she says. “It’s been really exciting for me to see the way people have started using it.”
You can access DAPPLE here: www.broadinstitute.org/mpg/dapple
You can also access PLODA, a tool for creating simple, colored networks based on specific genes, here: http://www.broadinstitute.org/mpg/ricopili/ibd_dapple_1112/