Five Questions for Kristin Ardlie
The Genotype-Tissue Expression (GTEx) project started five years ago with the goal of creating a comprehensive atlas and open database of gene expression and gene regulation across human tissues. This week, the researchers spearheading the NIH-funded effort released five papers reporting on the pilot phase of the project.
For this edition of “Five Questions,” we asked Kristin Ardlie to share her perspective on the project and what it has revealed thus far. Ardlie is the director and co-principal investigator (along with Broad institute member Gad Getz) of the GTEx Laboratory Data Analysis and Coordination Center at the Broad.
Q1: What did the GTEx Consortium set out to do?
KA: The goals of the project were to create a data resource for gene expression and provide a systematic analysis of the role of genetic variation in the regulation of gene expression across multiple reference human tissues. To do that, the Consortium had to sample and study as many tissues as possible across a large number of people. One of the original motivations was to help create a functional link between the genetic variants that researchers were finding in genome-wide association studies and the associated risk for developing a disease.
Cellular/tissue specificity is a hallmark of many diseases and cellular context is a key determinant of gene regulation. So, a question of interest biologically is: how much specificity or sharing is there in gene expression and regulation across tissues? GTEx allows us to look across a large number of people and across many different tissue types to see how gene expression and function may vary. This sort of multi-tissue analysis at scale can help us understand the link between genetics and disease and may help explain why some people with a given genetic variation develop a disease while others with the same variation might not.
Q2: What is happening biologically at the level of gene expression and gene regulation that may be contributing to disease?
Photo by Maria Nemchuk, Broad Communications
KA: Scientists have long been interested in how changes in our DNA – variations in the bases within the genetic code – might affect protein structure and function downstream. But many of these changes occur well outside of protein coding regions and so they might play a role in gene regulation instead. DNA variants that can influence gene expression can affect whether a gene is turned on or off, up or down; at what stage in development a gene might be turned on or off; or how much of a protein is produced, for example. Altering any of this can certainly contribute to disease.
Q3: Your team at the Broad was part of a larger Consortium that worked on the project. What role did it play?
KA: GTEx is a large consortia effort supported by the NIH Common Fund. Within the consortium there are two key functions and entities funded: one is the collections arm of the project, tasked with enrolling donors, collecting tissues, quality control, and so on, and the other – our Broad team here – is the laboratory, data analysis, and coordination center for the project. We receive all the samples, process them, and then generate all of the RNA and DNA sequencing data. Then we work with seven other funded analysis centers to analyze those data.
A key goal of the GTEx project has been to create a publicly available reference dataset of gene expression across tissues that researchers can use to study human biology and disease. So in addition to the release of the sequence data to dbGaP (the National Center for Biotechnology Information's database of Genotypes and Phenotypes), we host a widely-used portal, ‘the GTEx portal’ where we post most of our analyzed results and data files.
Q4: I’ve heard that GTEx was considered a somewhat daring project to undertake. Why was that?
KA: There were two reasons the project was considered a bit risky. First, to do the project we needed to sample many different types of tissues, and we could only really do that by sampling from recently deceased donors. Collecting samples for any study is notoriously difficult, and nobody was at all sure we could get the donors for this type of sampling in the numbers we needed for the study. We recruited donors by obtaining consent from next-of-kin and, thanks to the great generosity and interest of those families, we’ve been remarkably successful at meeting all the donors and samples goals we set.
The other difficulty is that to study gene expression you need to isolate and study RNA and RNA is a tricky molecule to work with; unlike DNA, it can degrade very rapidly. Since we were sampling from deceased donor tissues, we didn’t know if we could sample quickly enough to get sufficiently high quality RNA. We had to monitor this very carefully at the beginning of the project, and we modified the order in which tissue was collected based on how fast RNA degraded in some tissues relative to others. We also had wonderful collection sites that were able to significantly shorten the collection times. Overall, we obtained very good RNA that maintained very characteristic, tissue-specific RNA signatures.
Recognizing that the project was risky but also potentially high-reward, our funding to scale-up the project was contingent on the pilot working. In the pilot phase, we were able to show that we could enroll the number of donors and get the quality of samples we needed to scale up the project.
Q5: And where does the project stand now?
KA: Like any big project, the parts of it are all at different stages. At this point, we’re actually getting close to having collected samples from the 900 donors we had set as our goal. We’re at about the halfway point in terms of producing data – that is, we have around 10,000 tissue samples sequenced from 43 different tissue types, which members of the analysis working group are all feverishly analyzing.
And, of course, we published the six papers that came out this week. Those were based on the pilot data released about a year and a half ago. I think the completion of the pilot phase showed that we could actually do this – get the samples and develop the analytical tools that would enable this multi-tissue approach. It also showed that you can use this type of information to help understand the mechanisms of action and tissue prevalence of complex diseases and underscores the need to better understand disease biology in all tissue across the body, instead of just the snapshot we often get from the handful of tissues we typically do these studies in.
To learn more about the findings of the GTEx pilot phase, read the press release from the National Human Genome Research Institute.
To use the GTEx data for your own scientific research, go to the GTEx Portal website.
A full list of contributors to the GTEx project can also be found on the Portal website.
The GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multi-tissue gene regulation in humans. Science. Online May 7, 2015. DOI: 10.1126/science.1262110
Baran, Y. et al. The landscape of genomic imprinting across diverse adult human tissues. Genome Research. Online May 7, 2015. DOI: 10.1101/gr.192278.115
Melé, M. et al. The human transcriptome across tissues and individuals. Science. Online May 7, 2015. DOI: 10.1126/science.aaa0355
Pierson, E. et al. Sharing and specificity of co-expression networks across 35 human tissues. PLoS Computational Biology. Online May 7, 2015. DOI: 10.1371/journal.pcbi.1004220
Pirinen, M. et al. Assessing allele-specific expression across multiple tissues from RNA-seq read data. Bioinformatics. Online March 27, 2015. DOI: 10.1093/bioinformatics/btv074
Rivas, M. et al. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science. Online May 7, 2015. DOI: 10.1126/science.1261877