Cufflinks.cuffmerge Documentation, v2  Print-icon ▸ Open Module on GenePattern Public Server

Description: Merge multiple Cufflinks assemblies

Author: Cole Trapnell et al, University of Maryland Center for Bioinformatics and Computational Biology

Algorithm Version: Cufflinks 2.0.2

Contact: gp-help@broadinstitute.org

Summary

The main purpose of Cufflinks.cuffmerge is to merge together several Cufflinks assemblies, making it easier to produce an assembly GTF file suitable for use with Cufflinks.cuffdiff.  Cufflinks.cuffmerge also runs Cuffcompare in the background and automatically filters out transcribed fragments (transfrags) that are likely to be artifacts. 

Cufflinks.cuffmerge is essentially a "meta-assembler": it treats the assembled transfrags from Cufflinks the way that Cufflinks treats reads, by merging them together parsimoniously, producing the smallest number of transcripts that explain the data. Furthermore, when a reference genome annotation is available, Cufflinks.cuffmerge can integrate reference transcripts into the merged assembly. It can also perform a reference annotation based transcript (RABT) assembly to merge reference transcripts with sample transfrags and produces a single annotation file for use in downstream differential analysis.

Cufflinks.cuffmerge was created at the University of Maryland Center for Bioinformatics and Computational Biology. This document is adapted from the Cufflinks documentation for release 2.0.2.

Usage

Cufflinks.cuffmerge takes one or more GTF files containing individual Cufflinks assemblies, a genome reference, and, optionally, a reference genome annotation GTF, and merges the information into a single assembly GTF file.  For more information on the GTF file format, see the Input Files section.

If you have a reference genome GTF file available, you can provide it in order to gracefully merge novel isoforms and known isoforms and maximize overall assembly quality.

For more information on using RNA-seq modules in GenePattern, see the RNA-seq Analysis page.

Important Notes:

Cufflinks.cuffmerge jobs can be very resource intensive.  If your job does not complete within a day, retry it on a server with more available memory, or, if you are running on the GenePattern public server, see this FAQ.

There are known issues that prevent Cufflinks.cuffmerge from running on the Mac Mini and possibly other Mac hardware.

Preparing to Run Cufflinks.cuffmerge

Cufflinks.cuffmerge version 2+ can no longer accept a .txt input file list on GenePattern versions 3.6.0+.  Instead, you may specify multiple files using the drag-and-drop interface.
Legacy information: However, if there are more than two GTF Cufflinks assembly files, they must be specified as a list in a text file passed via the input list file parameter. The files listed must be available on the same file system as the server.  In the text file, each filename should include its full path.  In GenePattern 3.6.0 and above, this parameter will accept server-hosted files directly through the drag-and-drop file parameter interface.

References

Trapnell C, Hendrickson D,Sauvageau S, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seqNature Biotechnology. 2013;31:46-53.

Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols 2012;7;562–578.

Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-SeqBioinformatics. 2011 Sep 1;27(17):2325-9.

Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.  Nat Biotechnol. 2010;28:511-515.

Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-SeqBioinformatics. 2009;25:1105-1111.

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.

Links

Cufflinks: http://cufflinks.cbcb.umd.edu/
Cufflinks documentation: http://cufflinks.cbcb.umd.edu/manual.html

 

Parameters

Name Description
input file  GTF Cufflinks assembly files to be merged.
reference GTF  An optional reference annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output. Cuffmerge will use this to attach gene names and other metadata to the merged catalog.  Cufflinks.cuffmerge will use this to attach gene names and other metadata to the merged catalog.
genome file * A file containing the genomic DNA sequences for the reference.  This should be a multi-FASTA file with all contigs present.

* - required

Input Files

  1. <input.file>
    The GTF Cufflinks assembly files to be merged.  In GenePattern 3.6.0 and above, this parameter will accept server-hosted GTF files directly through the drag-and-drop file parameter interface.
    These will usually be the transcripts.gtf files from multiple Cufflinks runs.  The first 7 columns are standard GTF, and the last column contains attributes, some of which are also standardized ("gene_id" and "transcript_id"). There is one GTF record per row, and each record represents either a transcript or an exon within a transcript.  For more information on the GTF format, see the specification.
  2. <reference.GTF>
    An optional reference annotation GTF.  The input assemblies are merged together with the reference GTF and included in the final output.
    The GenePattern FTP site hosts a number of reference annotation GTFs, available in a dropdown selection (requires GenePattern 3.7.0+).
  3. <genome.file>
    A multi-FASTA file containing the genomic DNA sequences for the reference with all contigs present.  The multi-FASTA file can be created by using the ConcatenateFiles module to assemble all the FASTA files for the reference genome sequences into a single file.  For more information on the FASTA format, see this description.
    The GenePattern FTP site also hosts a number of reference genomes, available in a dropdown selection (requires GenePattern 3.7.0+).

Output Files

  1. merged.gtf
    Cufflinks.cuffmerge produces a GTF file named merged.gtf that contains an assembly that merges together the input assemblies.  While it produces several other output files, the Cufflinks documentation refers solely to the merged assembly output file for use with Cuffdiff.

Platform Dependencies

Module Type: RNA-seq
CPU Type: x86_64
OS: Mac, Linux
Language: C++, Perl

GenePattern Module Version Notes

VersionRelease DateDescription
22013-09-25Added hosted GTF and genome file selectors and HTML-based docs
12013-05-07