Cufflinks.cuffcompare Documentation, v7  Print-icon ▸ Open Module on GenePattern Public Server

Description: Analyzes the transcribed fragments in an assembly

Author: Cole Trapnell et al, University of Maryland Center for Bioinformatics and Computational Biology

Algorithm Version: Cufflinks 2.0.2

Contact: gp-help@broadinstitute.org

Summary

Cufflinks.cuffcompare helps analyze the transcribed fragments (transfrags) in an assembly by:

Cufflinks was created at the University of Maryland Center for Bioinformatics and Computational Biology. This document is adapted from the Cufflinks documentation for release 2.0.2.  For more information about Cufflinks.cuffcompare, see the Cufflinks documentation.

Usage

Cufflinks.cuffcompare requires at least one Cufflinks' GTF output file as input, and optionally can also take a "reference" annotation GTF/GFF file such as from Ensembl. For more information on the GTF/GFF format, see the specification.

The Cuffcompare tool provides a number of additonal options and switches that are not directly available through this module's paramters.  The additional.cuffcompare.options parameter is provided to pass these through if you feel that you need them.  To use it, simply specify the extra option(s) along with any arguments in the input text field separated by spaces.  At this time, this parameter unfortunately does not easily support options which require a file argument.  Check the Cufflinks manual for more details of the available options.  Also note that there may be additional undocumented options; manually running the cufflinks executable at the command line with no arguments may show even more options.  If you feel that a particular missing option would be of broad general interest, please contact the GenePattern team and we will look into adding it.  Use of this parameter is recommended for expert use only; use it at your own discretion.  The GenePattern team does not explicitly test all of the possible options that may be passed through using this parameter and can only provide limited support. 

Important Notes:

There are known issues that prevent Cufflinks.cuffcompare from running on the Mac Mini and possibly other Mac hardware.

This module may produce some empty files. This does not mean that the algorithm has failed; it is generally a data issue.  In particular, this may occur if the transfrags are not in the reference annotation.

References

Trapnell C, Hendrickson D,Sauvageau S, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seqNature Biotechnology. 2013;31:46-53.

Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols 2012;7;562–578.

Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-SeqBioinformatics. 2011 Sep 1;27(17):2325-9.

Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.  Nat Biotechnol. 2010;28:511-515.

Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-SeqBioinformatics. 2009;25:1105-1111.

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.

Links

Cufflinks website.
Cufflinks manual.  Note that this information may be based on a subsequent version of Cufflinks.
TopHat website.

Parameters

Name Description
input file * One or more GTF file output(s) from Cufflinks
output prefix  A prefix for the module output
reference GTF  A reference annotation GTF
exclude transcripts  Whether to ignore reference transcripts that are not overlapped by any transcript in the input files.  This takes effect only if a reference GTF is provided.
reference genome file  Fasta file or zip of fasta files against which your reads were aligned
additional cuffcompare options Additional options to be passed along to the Cuffcompare program at the command line. This parameter gives you a means to specify otherwise unavailable Cuffcompare options and switches not supported by the module; check the Cufflinks manual for details.  Note that the information at this link may refer to a subsequent version of Cufflinks.  Recommended for experts only; use this at your own discretion.

* - required

Cuffcompare pass-through options

The following may be useful for advanced users who wish to use the additional.cuffcompare.options parameter.  This is the 'usage' output from running cuffcompare at the command-line, which gives a list of all of the available options and switches.  Note that this was generated by Cuffcompare v2.0.2 and that the options here may differ from the documentation provided online at the Cufflinks website due to subsequent version updates.

cuffcompare v2.0.2 (3524M)
-----------------------------
Usage:
cuffcompare [-r <reference_mrna.gtf>] [-R] [-T] [-V] [-s <seq_path>] 
    [-o <outprefix>] [-p <cprefix>] 
    {-i <input_gtf_list> | <input1.gtf> [<input2.gtf> .. <inputN.gtf>]}
 
 Cuffcompare provides classification, reference annotation mapping and various
 statistics for Cufflinks transfrags.
 Cuffcompare clusters and tracks transfrags across multiple samples, writing
 matching transcripts (intron chains) into <outprefix>.tracking, and a GTF
 file <outprefix>.combined.gtf containing a nonredundant set of transcripts 
 across all input files (with a single representative transfrag chosen
 for each clique of matching transfrags across samples).
 
Options:
-i provide a text file with a list of Cufflinks GTF files to process instead
   of expecting them as command line arguments (useful when a large number
   of GTF files should be processed)
 
-r  a set of known mRNAs to use as a reference for assessing 
    the accuracy of mRNAs or gene models given in <input.gtf>
 
-R  for -r option, reduce the set of reference transcripts to 
    only those found to overlap any of the input loci
-M  discard (ignore) single-exon transfrags and reference transcripts
-N  discard (ignore) single-exon reference transcripts
 
-s  <seq_path> can be a multi-fasta file with all the genomic sequences or 
    a directory containing multiple single-fasta files (one file per contig);
    lower case bases will be used to classify input transcripts as repeats
 
-d  max distance (range) for grouping transcript start sites (100)
-p  the name prefix to use for consensus transcripts in the 
    <outprefix>.combined.gtf file (default: 'TCONS')
-C  include the "contained" transcripts in the .combined.gtf file
-G  generic GFF input file(s) (do not assume Cufflinks GTF)
-T  do not generate .tmap and .refmap files for each input file
-V  verbose processing mode (showing all GFF parsing warnings)

Input Files

  1. <input.file>
    One or more GTF files accessible to the GenePattern server.  In GenePattern 3.6.0 and above, this parameter will accept server-hosted GTF files directly through the drag-and-drop file parameter interface.  When producing the *.tmap and *.refmap ouput files (see below), cuffcompare will use the <output.prefix> parameter and possibly the input GTF file/path to form the file name.  When the input is a single GTF, one of each of these output files will be produced with the names <output.prefix>.tmap and <output.prefix>.refmap.
    For more information on the GTF/GFF format, see the specification.
    Cufflinks.cuffcompare version 5+ can no longer accept a .txt input file list on GenePattern versions 3.6.0+.  Instead, you may specify multiple files using the drag-and-drop interface.
    Legacy information: To avoid file-naming collisions, when the input file is a text file of multiple GTF input files then a transformed version of the input file path is also included in naming these outputs.  This is necessary because GenePattern places all output files in a single job results directory when execution is complete.  The path is transformed by substituting an underscore character (‘_’) for any spaces and path separators and by truncating any path prefix common to all input files in order to shorten the name.  The output file names will be formed as <output.prefix>.[transformed path].tmap and <output.prefix>.[transformed path].refmap.
    Optionally, explicit identifiers can be specified for direct control over output file naming.  Such IDs can be provided after each path listing, separated by a tab character on the same line.  The output file names will be formed as <output.prefix>.[ID_filename].tmap and <output.prefix>.[ID_filename].refmap.  This is not available when using the GP 3.6.0 drag-and-drop interface.
  2. <reference.GTF> (optional)
    A reference annotation file in GTF format.  Each sample is matched against this file, and sample isoforms are tagged as overlapping, matching, or novel where appropriate.  These reference annotation files can be downloaded for many genomes from sites like UCSC Genome Browser.  The GenePattern FTP site hosts a number of reference annotation GTFs, available in a dropdown selection (requires GenePattern 3.7.0+).
    For more information on the GTF format, see the specification.

  3. <reference.genome.file> (optional)
    Fasta file or zip of fasta files against which your reads were aligned.   If supplied, cuffcompare will use this for some optional classification functions.  If a multifasta file, all contigs should be present.  If a zip, this must contain one fasta file per reference chromosome, and each file must be named after the chromosome and have a .fa or .fasta extension.  For more information on the FASTA format, see this description.
    The GenePattern FTP site hosts a number of reference genomes, available in a dropdown selection (requires GenePattern 3.7.0+).

Output Files

For more information about Cufflinks.cuffcompare output files, see the Cufflinks documentation

  1. <output.prefix>.stats
    Various statistics related to the accuracy of the transcripts in each sample when compared to the reference annotation data.
  2. <output.prefix>.combined.gtf
    Cufflinks.cuffcompare reports a GTF file containing the "union" of all transfrags in each sample. If a transfrag is present in both samples, it is thus reported once in the combined GTF.
  3. *.tmap
    These tab-delimited files list the most closely matching reference transcript for each Cufflinks transcript. There is one row per Cufflinks transcript.
  4. *.refmap
    These tab-delimited files list, for each reference transcript, which Cufflinks transcripts either fully or partially match it. There is one row per reference transcript output
    A summary of the execution of Cufflinks.cuffcompare, providing information on both the genomic sequence and datasets.
  5. stdout.txt
    A summary of the execution of Cufflinks.cuffcompare, providing information on both the genomic sequence and datasets.
  6. <output.prefix>.tracking
    This file matches transcripts between samples. Each row contains a transcript structure that is present in one or more input GTF files. Because the transcripts will generally have different IDs (unless you assembled your RNA-seq reads against a reference transcriptome), Cufflinks.cuffcompare examines the structure of each the transcripts, matching transcripts that agree on the coordinates and order of all of their introns, as well as strand. Matching transcripts are allowed to differ on the length of the first and last exons, since these lengths will naturally vary from sample to sample due to the random nature of sequencing.

Platform Dependencies

Module Type: RNA-seq
CPU Type: x86_64
OS: Mac, Linux
Language: C++, Perl

GenePattern Module Version Notes

VersionRelease DateDescription
72014-02-14Added a parameter to allow the user to pass through extra Cuffcompare options
62013-09-25Added dynamic GTF and genome file selectors and HTML-based documentation
52013-07-22Updated to Cufflinks.cuffcompare version 2.0.2
42012-07-06Fixed syntax error.
22012-01-13Updated to Cufflinks.cuffcompare version 1.3.0
12011-04-11