# Cufflinks.cuffcompare Documentation, v7  ▸ Open Module on GenePattern Public Server

Description: Analyzes the transcribed fragments in an assembly

Author: Cole Trapnell et al, University of Maryland Center for Bioinformatics and Computational Biology

## Summary

Cufflinks.cuffcompare helps analyze the transcribed fragments (transfrags) in an assembly by:

• Comparing assembled transcripts to a reference annotation
• Tracking Cufflinks transcripts across multiple experiments (e.g., across a time course)

## Usage

Cufflinks.cuffcompare requires at least one Cufflinks' GTF output file as input, and optionally can also take a "reference" annotation GTF/GFF file such as from Ensembl. For more information on the GTF/GFF format, see the specification.

#### Important Notes:

There are known issues that prevent Cufflinks.cuffcompare from running on the Mac Mini and possibly other Mac hardware.

This module may produce some empty files. This does not mean that the algorithm has failed; it is generally a data issue.  In particular, this may occur if the transfrags are not in the reference annotation.

## References

Trapnell C, Hendrickson D,Sauvageau S, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seqNature Biotechnology. 2013;31:46-53.

Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols 2012;7;562–578.

Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-SeqBioinformatics. 2011 Sep 1;27(17):2325-9.

Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.  Nat Biotechnol. 2010;28:511-515.

Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-SeqBioinformatics. 2009;25:1105-1111.

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.

Cufflinks manual.  Note that this information may be based on a subsequent version of Cufflinks.
TopHat website.

## Parameters

Name Description
input file * One or more GTF file output(s) from Cufflinks
output prefix  A prefix for the module output
reference GTF  A reference annotation GTF
exclude transcripts  Whether to ignore reference transcripts that are not overlapped by any transcript in the input files.  This takes effect only if a reference GTF is provided.
reference genome file  Fasta file or zip of fasta files against which your reads were aligned
additional cuffcompare options Additional options to be passed along to the Cuffcompare program at the command line. This parameter gives you a means to specify otherwise unavailable Cuffcompare options and switches not supported by the module; check the Cufflinks manual for details.  Note that the information at this link may refer to a subsequent version of Cufflinks.  Recommended for experts only; use this at your own discretion.

* - required

## Cuffcompare pass-through options

The following may be useful for advanced users who wish to use the additional.cuffcompare.options parameter.  This is the 'usage' output from running cuffcompare at the command-line, which gives a list of all of the available options and switches.  Note that this was generated by Cuffcompare v2.0.2 and that the options here may differ from the documentation provided online at the Cufflinks website due to subsequent version updates.

cuffcompare v2.0.2 (3524M)
-----------------------------
Usage:
cuffcompare [-r <reference_mrna.gtf>] [-R] [-T] [-V] [-s <seq_path>]
[-o <outprefix>] [-p <cprefix>]
{-i <input_gtf_list> | <input1.gtf> [<input2.gtf> .. <inputN.gtf>]}

Cuffcompare provides classification, reference annotation mapping and various
Cuffcompare clusters and tracks transfrags across multiple samples, writing
matching transcripts (intron chains) into <outprefix>.tracking, and a GTF
file <outprefix>.combined.gtf containing a nonredundant set of transcripts
across all input files (with a single representative transfrag chosen
for each clique of matching transfrags across samples).

Options:
-i provide a text file with a list of Cufflinks GTF files to process instead
of expecting them as command line arguments (useful when a large number
of GTF files should be processed)

-r  a set of known mRNAs to use as a reference for assessing
the accuracy of mRNAs or gene models given in <input.gtf>

-R  for -r option, reduce the set of reference transcripts to
only those found to overlap any of the input loci
-M  discard (ignore) single-exon transfrags and reference transcripts
-N  discard (ignore) single-exon reference transcripts

-s  <seq_path> can be a multi-fasta file with all the genomic sequences or
a directory containing multiple single-fasta files (one file per contig);
lower case bases will be used to classify input transcripts as repeats

-d  max distance (range) for grouping transcript start sites (100)
-p  the name prefix to use for consensus transcripts in the
<outprefix>.combined.gtf file (default: 'TCONS')
-C  include the "contained" transcripts in the .combined.gtf file
-G  generic GFF input file(s) (do not assume Cufflinks GTF)
-T  do not generate .tmap and .refmap files for each input file
-V  verbose processing mode (showing all GFF parsing warnings)

## Input Files

1. <input.file>
One or more GTF files accessible to the GenePattern server.  In GenePattern 3.6.0 and above, this parameter will accept server-hosted GTF files directly through the drag-and-drop file parameter interface.  When producing the *.tmap and *.refmap ouput files (see below), cuffcompare will use the <output.prefix> parameter and possibly the input GTF file/path to form the file name.  When the input is a single GTF, one of each of these output files will be produced with the names <output.prefix>.tmap and <output.prefix>.refmap.
Cufflinks.cuffcompare version 5+ can no longer accept a .txt input file list on GenePattern versions 3.6.0+.  Instead, you may specify multiple files using the drag-and-drop interface.
Legacy information: To avoid file-naming collisions, when the input file is a text file of multiple GTF input files then a transformed version of the input file path is also included in naming these outputs.  This is necessary because GenePattern places all output files in a single job results directory when execution is complete.  The path is transformed by substituting an underscore character (‘_’) for any spaces and path separators and by truncating any path prefix common to all input files in order to shorten the name.  The output file names will be formed as <output.prefix>.[transformed path].tmap and <output.prefix>.[transformed path].refmap.
Optionally, explicit identifiers can be specified for direct control over output file naming.  Such IDs can be provided after each path listing, separated by a tab character on the same line.  The output file names will be formed as <output.prefix>.[ID_filename].tmap and <output.prefix>.[ID_filename].refmap.  This is not available when using the GP 3.6.0 drag-and-drop interface.
2. <reference.GTF> (optional)
A reference annotation file in GTF format.  Each sample is matched against this file, and sample isoforms are tagged as overlapping, matching, or novel where appropriate.  These reference annotation files can be downloaded for many genomes from sites like UCSC Genome Browser.  The GenePattern FTP site hosts a number of reference annotation GTFs, available in a dropdown selection (requires GenePattern 3.7.0+).

3. <reference.genome.file> (optional)
Fasta file or zip of fasta files against which your reads were aligned.   If supplied, cuffcompare will use this for some optional classification functions.  If a multifasta file, all contigs should be present.  If a zip, this must contain one fasta file per reference chromosome, and each file must be named after the chromosome and have a .fa or .fasta extension.  For more information on the FASTA format, see this description.
The GenePattern FTP site hosts a number of reference genomes, available in a dropdown selection (requires GenePattern 3.7.0+).

## Output Files

1. <output.prefix>.stats
Various statistics related to the accuracy of the transcripts in each sample when compared to the reference annotation data.
2. <output.prefix>.combined.gtf
Cufflinks.cuffcompare reports a GTF file containing the "union" of all transfrags in each sample. If a transfrag is present in both samples, it is thus reported once in the combined GTF.
3. *.tmap
These tab-delimited files list the most closely matching reference transcript for each Cufflinks transcript. There is one row per Cufflinks transcript.
4. *.refmap
These tab-delimited files list, for each reference transcript, which Cufflinks transcripts either fully or partially match it. There is one row per reference transcript output
A summary of the execution of Cufflinks.cuffcompare, providing information on both the genomic sequence and datasets.
5. stdout.txt
A summary of the execution of Cufflinks.cuffcompare, providing information on both the genomic sequence and datasets.
6. <output.prefix>.tracking
This file matches transcripts between samples. Each row contains a transcript structure that is present in one or more input GTF files. Because the transcripts will generally have different IDs (unless you assembled your RNA-seq reads against a reference transcriptome), Cufflinks.cuffcompare examines the structure of each the transcripts, matching transcripts that agree on the coordinates and order of all of their introns, as well as strand. Matching transcripts are allowed to differ on the length of the first and last exons, since these lengths will naturally vary from sample to sample due to the random nature of sequencing.

## Platform Dependencies

 Module Type: RNA-seq CPU Type: x86_64 OS: Mac, Linux Language: C++, Perl

## GenePattern Module Version Notes

VersionRelease DateDescription
72014-02-14Added a parameter to allow the user to pass through extra Cuffcompare options
62013-09-25Added dynamic GTF and genome file selectors and HTML-based documentation