New project aims to put together new pieces of the genetic variability puzzle
Since the human genome became public information a decade ago, researchers have been developing new ways to dissect, examine, and probe it to find clues to understand the genetic basis of human illness. This fall, a new endeavor known as the Genotype-Tissue Expression (GTEx) project, became one of the latest – and largest – efforts to link certain genetic variations with changes in the abundance of particular proteins in unique tissues of the body. Researchers hope this information will add another layer of information explaining the genetic influences that can lead to disease. I spoke with Wendy Winckler, one of the co-principal investigators of the GTEx project at the Broad, about what this resource aims to provide to investigators worldwide.
Q1. Why did researchers choose to pursue this new type of genetic investigation?
Wendy: The idea behind GTEx is to follow up on the era of genome wide association study (GWAS), which we at the Broad have done quite a few of for many different diseases. The goal of GWAS studies is to find genetic variations called single nucleotide polymorphisms (SNPs), or small, inherited changes in DNA, that contribute to human traits or inherited diseases.
To date, many of the GWAS projects have shown that SNPs associated with complex diseases, for example, diabetes and prostate cancer, are not in regions of DNA that encode the composition of proteins. Instead, they are in areas that may be responsible for regulating the levels of RNA, another type of genetic material in the cell, which dictates the abundance of the protein in the cell. So, instead of looking at changes in the sequence of a protein related to disease, we are looking for DNA variants that control the amount of the protein. Too much or too little of a protein can lead to aberrant cell behavior, which could result in disease.
Q2. What is the value in studying RNA levels?
Wendy: In short, we are looking for genetic variations that influence the number of RNA copies that are made in a specific tissue type. During a process called transcription, DNA makes copies of RNA that serve as the blueprints to make proteins. We know that different tissues contain varying levels of RNA, and ultimately protein, and this is responsible making all of the different cell types in the body. Differential control of transcription is why two cells that have exactly the same DNA can develop to have very different characteristics, such as a brain cell versus a muscle cell. SNPs can influence the amount of RNA made. We want to be able to look at the gene variation found in each tissue, and see if we can correlate which SNPs influence the amount of RNA molecules – or transcripts – in a particular tissue.
Q3. What are the sources of the tissues studied?
Wendy: In the GTEx project we aim to examine 50-70 different tissues from the same individual. The initial pilot study will include various tissues from ~200 recently deceased, consented donors and some surgical donors for a total of up to 1,000 different RNA transcript profiles. During the pilot project, we have special interest in brain, muscle, liver, skin, and fat tissue but we will evaluate samples from many other biologically informative sites.
We are working with three other institutions charged with collecting the tissue samples, plus a site that will manage collection and sampling of brain and spinal cord tissues. This is of particular interest since we want to access a variety of brain tissues to see if we can find SNPs and RNA transcript variability linked with some neurologic and psychiatric illnesses, for example, schizophrenia or bipolar disorder.
Q4. Has anything like this been done on a smaller scale?
Wendy: There have been a couple of publications where researchers looked at individual tissues and many fewer SNPs, using older technology. GTEx is a scale-up both of technology and sheer size of the project.
Anecdotally, researchers have found examples in liver, brain regions, and adipose tissue in which individual DNA variants are correlated with the expression of a particular gene only in that tissue But no one has yet tested most of the tissues we are interested in with the number of SNPs we will be assessing or the rich, comprehensive RNA transcript profiling possible with today’s sequencing technology.
Q5. What is the long-term value of GTEx?
Wendy: GTEx is the largest human transcript sequencing project undertaken to date. It is intended to be a reference data set that researchers can use to test any number of theories. If the pilot study is successful and we are able to find a reasonable number of these DNA regions that influence the level of transcription, then in about three years it may be possible to do a major scale-up of up to 1,000 patients and 50,000 different transcript profiles.
One of the major determinants as to whether the project will be successful is if researchers will be able to get high-quality RNA from these different tissues, a particularly difficult proposition since RNA begins to degrade immediately after death. The Broad’s Kristin Ardlie, co-principal investigator of the GTEx project and director of the Broad’s Biological Samples Platform, and her team are currently putting a lot of effort into developing the best protocols for extracting RNA from the various tissue types. But certain tissue will prove challenging. For example, fatty tissue does not contain much RNA. Bone tissue is very hard to grind down to get RNA. Others, like liver, change dramatically immediately after death. In that case, enzymes are immediately released after death leading to massive degradation of the RNA making it difficult to get good quality samples.
GTEx is likely to benefit other types of projects, some of which we have not even anticipated yet. The data will be publicly accessible and we welcome input from researchers everywhere about potential uses for this data. We have already received great feedback from a couple of researchers but would love to expand the interaction through an analysis community beyond our GTEx team meeting.
The 2.5-year pilot study officially launched in July 2010. The first samples should start to arrive in early 2011, followed by a scale-up after that. Check back this time next year when we anticipate we’ll have some exciting data to present.