A suite of tools takes flight

Haley Bridger, May 16th, 2011 | Filed under
  • Logo design by Leslie Gaffney and Lauren Solomon, Broad Communications

One summer day, two Broad researchers who had never met before sat next to each other at a lunch table. Moran Yassour, a graduate student in computational biology, and Manfred Grabherr, an engineer turned computational biologist, struck up a conversation about their research interests.

“I realized, this is the Manfred people have been telling me about,” Moran recalls. “People had told us about each other but we’d never spoken before.”

Manfred and Moran began talking about ways to put together the puzzle pieces of the transcriptome: the complete set of all the RNA molecules found in cells. It was a conversation that they would continue for the better part of the year, and it would lead to a partnership with a third Broad researcher, Brian Haas, and result in a suite of tools and a Nature Biotechnology paper (published online this past weekend).

Moran’s research interests center on small organisms that can be used to answer big questions. A researcher who divides her time between Israel and the U.S., Moran uses yeast as a model organism for understanding the transcriptome.

Unlike the genome, which contains instructions for creating all of the cells in an organism’s body, the transcriptome only includes genes and other genetic elements that have been copied into RNA, revealing when and in what cells genes are expressed. So if the genome is the cell’s “parts list,” then the transcriptome shows what parts are being made and by how much, revealing much about the state of a cell. Researchers can use techniques like RNA-Seq to easily and cheaply collect pieces of the transcriptome. Piecing together bits of transcripts back into a full transcriptome, however, is a computational challenge.

Moran approached Manfred with a question: should she take these pieces and mount them to the reference genome, the way one might use the picture on a puzzle’s box to help determine the placement of puzzle pieces or assemble all of the pieces of a transcriptome using just the raw data.

This question made Manfred think of an old computational tool that he had lying around. He dusted off this piece of software, which was originally designed to reconstruct transposable elements, or small, repetitive stretches of DNA. “I thought, if it can do that, maybe it can reconstruct transcriptomes,” Manfred recalls. Surprisingly, the algorithm, with the cumbersome name “Find Repeat Families,” did extraordinarily well.

Manfred refurbished the program, making it a better fit for Moran’s work, and gave it a new name: Ananas (a word that means “pineapple” in many languages*).

That’s when Brian heard about their work. A researcher in the Genome Sequencing and Analysis Program, Brian targets the genome annotation of many species of fungi that have densely packed transcripts, which are challenging to analyze. Brian borrowed Ananas to help him with this work and saw the software’s potential.

“I love to write software, especially when it involves improving genome annotations,” says Brian. “I had a lot of ideas for things I wanted to do, but it wasn’t straightforward for me to answer all of the questions I had using the Ananas software. I decided I would try to write something that works like it, but does all of the things I wanted it to do.”

“When Brian joined the project, it felt like we had this big booster rocket on our backs,” says Manfred, and Moran completely agrees.

Brian and Manfred teamed up to reengineer the tool to answer Brian’s key questions. And then it was time to give the software’s latest incarnation a new name: Inchworm.

Brian gave the software this name based on its behavior – it walks along building transcripts the way an inchworm might move along a forked branch. “You can imagine the back of the inchworm staying put and the front of the inchworm exploring,” says Brian. “Once it figures out which way to go, it walks along and keeps going in that direction.”

Within the year, Manfred and Moran had created two tools to complement Inchworm and named them Chrysalis and Butterfly. The package of these three tools — dubbed Trinity — assembles a cell’s transcripts and reveals all versions of a transcriptome, even in the absence of a reference genome.

Throughout the process, Brian, Moran, and Manfred received help and support from many of their Broad colleagues, especially Chad Nusbaum, Aviv Regev, Kerstin Lindblad-Toh, Federica Di Palma, and Nir Friedman from the Hebrew University. “They coordinated things, giving us tools that we needed, finding people who could help us, and helping us prioritize,” says Moran.

The successful result of an impromptu meeting at the lunch table, Trinity is publicly available and has been put to use at the Broad and beyond. Last month’s paper on fission yeast capitalized on the suite of tools and helped drive its development. Trinity is now being used on the transcriptomes of a variety of organisms, ranging from raspberry plants to species of Saprolegnia, the white mold-like organisms that commonly infects fish.

 

*If you’re wondering where the creative names of these software programs come from, Manfred and his collaborators sometimes turn to their colleagues for help generating them. Jessica Alfoldi, a research scientist in the Genome Sequencing and Analysis Program, came up with “ananas” – part of a fruit-themed series of names. Pineapples are compound fruit, developing from several ovaries, just as the software program creates a composite transcriptome from many pieces.
 

6 Comments
 
(will not be shown on this site)
CAPTCHA
 
Refresh Listen Use Image ReCAPTCHA ©
Please note: comments are held for moderation before posting.
Read more in our Community Guidelines.
Excellent work! I got the same idea as Inchworm. But a worm is far from the butterfly:) Transcriptomics grows up!
I have the same opinion; this is a great improvement on transcriptomics of non-model species. I will be happy for test it on my scallop samples! Is there a mailing list for be tuned on this project?
Hi Raul, Thanks for your comment. If you have questions or would just like to know more about Trinity, please visit: http://trinityrnaseq.sourceforge.net/#contact_us You can find contact information, FAQs, etc. on this page. Best of luck with your research!
Excellent work! In the FAQ, I am not clear at ~1G of RAM per 1M reads to be assembled, is that mean "if I have a reads file of 1000M, I need 1000G RAM to run the program". Thanks!
Response to Doc_Li: Processing a billion reads could certainly require a machine with 1T of RAM. Most applications thus far have involved hundreds of millions of reads, and are run on machines having 256G up to 512G RAM. We're actively working on enhancements to process larger data sets with smaller memory footprints.
Now I am processing the RNA-Seq data by Trinity.Thanks!