GTEx: Useful expression for cancer research

The Genotype-Tissue Expression (GTEx) Project set out five years ago to create a comprehensive atlas and open database of gene expression and gene regulation across human tissues. This month, it reports on its two-year pilot phase.

Tools of the Trade: GTEx
Tools of the Trade: GTEx

This month, the Genotype-Tissue Expression (GTEx) Project, which set out five years ago to create a comprehensive atlas and open database of gene expression and gene regulation across human tissues, published several papers reporting on findings from its two-year pilot phase. While the publications were the first to result from the ongoing, NIH Common Fund project, the extensive gene expression data amassed thus far by GTEx has been publicly available to scientists worldwide and is empowering research across biological disciplines – particularly in the cancer field.

“If you look at the people who have downloaded the dbGaP data – our raw data for gene expression analyses – by far the single largest group using the resource is cancer researchers,” says Kristin Ardlie, director and co-principal investigator of the GTEx Laboratory Data Analysis and Coordination Center at the Broad Institute of MIT and Harvard.

The GTEx data, which to date has been drawn from roughly 10,000 tissue samples sequenced from 43 different tissue types, reveals how variants in the four-letter genetic code of DNA influence gene expression in the context of different tissues throughout the body. These variants can influence how and when a gene is turned on or off, or to what degree a gene is expressed in a given tissue. For genes in protein-coding regions of the genome, those variants can also influence how much of a protein is produced and which isoforms (or, types of the same protein) are present.

Since these variations affect gene activity and cell function, they could be at play when something goes wrong, leading to diseases such as cancer. That’s why researchers are mining the GTEx data to explore the possible connections between genetics and disease.

Gad Getz, Ardlie’s co-PI, is director of both the Broad’s Cancer Genome Computational Analysis group and the Bioinformatics Program at Massachusetts General Hospital Cancer Center and Department of Pathology. He says that, in the field of cancer research, scientists are taking a variety of approaches to using GTEx data in their investigations of the disease.

“People are playing with different ideas about how to use this expression data in cancer research,” he says. “One way is to use GTEx as a resource in comparisons between normal and cancer cells. GTEx can serve as a control for what normal cell activity should look like; it can give you not just the level of gene expression, but also insights about which gene isoforms are expressed in different cells and tissue types in the body.”

That data can then be juxtaposed with gene expression data from tumor samples, with the differences between the “normal” GTEx samples and the cancer samples pointing to biological mechanisms that might be underlying the disease.

For many types of cancer, these comparisons were previously hard to get.

“Historically, cancer studies have often tried to make these comparisons between tumor and normal cells, but getting samples from inaccessible organs such as the brain or lung is not something that can usually be done,” Ardlie explains.

In such studies, “normal” cells have traditionally been sampled from areas adjacent to tumors, but there has always been some question as to whether that tissue was pre-cancerous, and therefore not truly representative of “normal,” healthy tissue. Having data from the large pool of samples from the GTEx project provides a reliable baseline for comparison.

Another use for the GTEx data, Getz says, is to “clean-up” data from cancer samples. When a tumor is biopsied, the result is a mixture of cells: cancer cells, healthy cells, and others that may be somewhere in between.

“You can use the GTEx data as a filter to computationally remove the background noise caused by the expression data from the normal cells,” Getz explains.

This “normal” data can also show when genes that are typically expressed are not being expressed in cancer samples. The GTEx data can also point to which tissues are relevant to a specific type of cancer, and can point to possible risk alleles – genetic variants that raise or lower one’s risk of getting cancer.

Currently, GTEx data is being used broadly by researchers from academic institutions and pharmaceutical companies to probe the links between genetics and disease and to investigate possible therapeutic avenues. Those efforts are expected to be aided by refinements to the methods researchers are using to analyze GTEx data, and by increased availability of tools such as Broad’s Firehose, a computational system that enables fast, high-throughput analysis of large datasets using many complex algorithms at once.

As part of the National Cancer Institute Cloud Pilot, the Broad is partnering with University of California (UC), Berkeley and UC Santa Cruz to move Firehose to the cloud, making the analysis platform accessible to the scientific community. That incarnation of the platform, FireCloud, is expected to launch in early 2016.


For more information on the GTEx pilot phase, read the NIH press release.

To use the GTEx data for your own scientific research, go to the GTEx Portal website