We value having options in biomedical research. But sometimes having many choices, and without sufficient comparative information about their benefits and limitations, can unnecessarily complicate research progress. Such has been the situation over the past two years regarding the technology of complementary DNA (cDNA) “second-generation” sequencing, or “RNA-seq” as it has come to be known.
Complementary DNA is created in the laboratory starting from an RNA template using an enzyme called reverse transcriptase. The cDNA sequence is complementary to the sequence of its RNA template, hence its name. Over the past few years, researchers have developed a variety of techniques to decipher this cDNA and learn more about an organism from a cell’s RNA content. In a paper published August 15 in Nature Methods, researchers at the Broad Institute of Harvard and MIT directly compared seven of these methods, known as RNA-seq techniques. Using a set of criteria, a technique known as dUTP second-strand marking emerged as the leading protocol and has been adopted at the Broad for RNA-seq applications. Full details of this protocol are described in this paper and in the original 2009 dUTP paper by Parkhomchuk et al. in Nucleic Acids Research last year. The authors also provide a menu of comparative analysis criteria that can be applied for assessment of future RNA-seq protocols.
Researchers perform RNA-seq for three general reasons. “First, you want to know how many and which RNA transcripts are in a cell or in a sample,” explains Joshua Levin, of the Genome Sequencing and Analysis Program (GSAP) at the Broad and co-first author of the paper. “With RNA-seq, you can actually count the relative number of transcripts made in each cell, which tells you something about its function.”
Second, RNA-seq provides specific information for genome annotation. “It lets you identify the elements (DNA sequences) of the genome that are copied into RNA and assign them biological functional,” says Levin. In past years, genome annotation was done with expressed sequence tags (ESTs) but this relies on the older Sanger-based sequencing technology, which has rapidly been displaced by newer, second-generation techniques such as those provided by Illumina and others. Second-generation technologies have vastly increased access to the information-dense transcriptome.
Last, RNA-seq allows researchers to characterize RNA splicing – modifications of RNA after transcription, in which introns (nucleotide bases that are not expressed into proteins) are removed and exons (bases that are expressed) are joined. ”There are programs that can predict that but you really want to have an actual experiment that tells you what sequences are present in spliced RNAs and those that are not,” says Levin. Differences in RNA splicing can lead to alterations in the proteins translated from those RNAs thereby imparting functional consequences for cells and organisms.
Levin’s group worked with various RNA-seq methods as they became available, including two developed internally at the Broad. “Though we’ve been working with these techniques for several years, we realized that no one had compared them to determine which would be the best to recommend,” says Levin. “A lot of people are just getting started on RNA-seq so they don’t know which method they should use.” Researchers at the Broad often field technique-related questions from other investigators. This analysis was done largely to help the larger community sift through the options regarding RNA-seq.
In the cell, each single-stranded RNA is synthesized from one of the two strands of DNA. When RNA is copied back into cDNA for RNA-seq in the lab, the information about which of the two strands of DNA was copied into RNA can be lost unless special methods are used. The crux of this paper is to test which of seven different “strand-specific” methods is best to preserve this strand information. Strand-specific RNA-seq improves on standard RNA-seq in three ways: accurately identifying antisense transcripts, determining the transcribed strand of non-coding RNAs (e.g. lincRNAs), and demarcating the boundaries of closely situated or overlapping genes.
“Nonstrand-specific RNA sequencing has been the standard method,” explains Levin. “But now strand-specific approaches provide additional valuable information and do not involve that much more work or cost.”
Along with strand specificity, the team examined other criteria using their new computational pipeline. And they assessed practical measures like ease of use in the laboratory and in computational analysis. “Looking at all these factors, dUTP turned out to be the one we liked the most and it is our default RNA-seq method at the Broad right now,” says Levin. But he notes that technical challenges need to be addressed to make the process high-throughput. “This technique works for making 12 libraries, for example,” he says. “But if you want to automate it, for 100 or more libraries at a time, the method needs to be modified.” He comments that researchers at the Broad are addressing this point now to be ready for large sequencing requests as they become more frequent.
The team’s analysis is freely available on the Broad’s GenePattern server. “This was done so that other researchers can evaluate their own protocols using the same criteria explained in the paper,” explains Moran Yassour of the Broad Institute and the Hebrew University, and a co-first author of the paper.
Joshua Z Levin, Moran Yassour, et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nature Methods. 15 August 2010. doi:10.1038/NMETH.1491