Author: Cole Trapnell et al, University of Maryland Center for Bioinformatics and Computational Biology
Algorithm Version: Cufflinks 2.0.2
The main purpose of Cufflinks.cuffmerge is to merge together several Cufflinks assemblies, making it easier to produce an assembly GTF file suitable for use with Cufflinks.cuffdiff. Cufflinks.cuffmerge also runs Cuffcompare in the background and automatically filters out transcribed fragments (transfrags) that are likely to be artifacts.
Cufflinks.cuffmerge is essentially a "meta-assembler": it treats the assembled transfrags from Cufflinks the way that Cufflinks treats reads, by merging them together parsimoniously, producing the smallest number of transcripts that explain the data. Furthermore, when a reference genome annotation is available, Cufflinks.cuffmerge can integrate reference transcripts into the merged assembly. It can also perform a reference annotation based transcript (RABT) assembly to merge reference transcripts with sample transfrags and produces a single annotation file for use in downstream differential analysis.
Cufflinks.cuffmerge was created at the University of Maryland Center for Bioinformatics and Computational Biology. This document is adapted from the Cufflinks documentation for release 2.0.2.
Cufflinks.cuffmerge takes one or more GTF files containing individual Cufflinks assemblies, a genome reference, and, optionally, a reference genome annotation GTF, and merges the information into a single assembly GTF file. For more information on the GTF file format, see the Input Files section.
If you have a reference genome GTF file available, you can provide it in order to gracefully merge novel isoforms and known isoforms and maximize overall assembly quality.
For more information on using RNA-seq modules in GenePattern, see the RNA-seq Analysis page.
Cufflinks.cuffmerge jobs can be very resource intensive. If your job does not complete within a day, retry it on a server with more available memory, or, if you are running on the GenePattern public server, see this FAQ.
There are known issues that prevent Cufflinks.cuffmerge from running on the Mac Mini and possibly other Mac hardware.
Preparing to Run Cufflinks.cuffmerge
Cufflinks.cuffmerge version 2+ can no longer accept a .txt input file list on GenePattern versions 3.6.0+. Instead, you may specify multiple files using the drag-and-drop interface.
Legacy information: However, if there are more than two GTF Cufflinks assembly files, they must be specified as a list in a text file passed via the input list file parameter. The files listed must be available on the same file system as the server. In the text file, each filename should include its full path. In GenePattern 3.6.0 and above, this parameter will accept server-hosted files directly through the drag-and-drop file parameter interface.
An optional reference annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output. Cuffmerge will use this to attach gene names and other metadata to the merged catalog. Cufflinks.cuffmerge will use this to attach gene names and other metadata to the merged catalog.
genome file *
A file containing the genomic DNA sequences for the reference. This should be a multi-FASTA file with all contigs present.
* - required
The GTF Cufflinks assembly files to be merged. In GenePattern 3.6.0 and above, this parameter will accept server-hosted GTF files directly through the drag-and-drop file parameter interface.
These will usually be the transcripts.gtf files from multiple Cufflinks runs. The first 7 columns are standard GTF, and the last column contains attributes, some of which are also standardized ("gene_id" and "transcript_id"). There is one GTF record per row, and each record represents either a transcript or an exon within a transcript. For more information on the GTF format, see the specification.
An optional reference annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output.
The GenePattern FTP site hosts a number of reference annotation GTFs, available in a dropdown selection (requires GenePattern 3.7.0+).
A multi-FASTA file containing the genomic DNA sequences for the reference with all contigs present. The multi-FASTA file can be created by using the ConcatenateFiles module to assemble all the FASTA files for the reference genome sequences into a single file. For more information on the FASTA format, see this description.
The GenePattern FTP site also hosts a number of reference genomes, available in a dropdown selection (requires GenePattern 3.7.0+).
Cufflinks.cuffmerge produces a GTF file named merged.gtf that contains an assembly that merges together the input assemblies. While it produces several other output files, the Cufflinks documentation refers solely to the merged assembly output file for use with Cuffdiff.
GenePattern Module Version Notes
Added hosted GTF and genome file selectors and HTML-based docs