How do I get RNA-SeQC?

RNA-SeQC jar (v1.1.7) file (45 MB)

Change Log

  • v1.0.0 11/23/11
    • Initial Release
  • v1.0.1 12/06/11
    • Fixed error when using BWA to estimate rRNA on single end RNA-seq
  • v1.1.0 02/13/12
    • Implemented downsampling for the rRNA estimation to increase speed
    • Removed erroneous field, 'Unique', from Total Reads table.
    • Added fields to the Mapped Reads table to distinguish duplicate and unique read rates by mapped reads and total reads
    • Tool reports its version when starting
    • Added base mismatch rate metric
    • Added pair mate report
    • Added gap length distribution plots
    • Added insert size mean and standard deviation
    • Added 'genes detected' metric
    • Improved correlation reporting by adding Pearson and an 'all against all' multi-sample correlation table.
    • Added read length
    • Added non-primary alignment counts 
    • GC content threshold are now user configurable
    • Added vendor failed read count
  • v1.1.1 02/24/12
    • Fixed error in calculating chimeric read pairs: minimum chimeric read pair distance is now 500kb
    • Changed Picard validation stringency to lenient for library complexity calculation module
  • v1.1.2 03/06/12
    • Upped chimeric read pair distance cutoff to 2mb
    • Added support for bams that are a mixture of paired and unpaired reads
    • Added a new read pair metric: number of unpaired reads
    • Fixed issue calculating the end1/end2 alignment rates
  • v1.1.3 03/14/12
    • Added a chimeric read report file (outputs the location of chimeric reads and their mate pairs)
    • Fixed a position index bug which lead to an underestimation of the number of exonic reads and overestimated the number of intronic reads
  • v1.1.4 03/29/12
    • Bug fix: error when running with single end data
  • v1.1.5 05/08/12
    • Bug fix: error when running on a mixture of paired and single end data
    • Parameter "n" is now optional, with a default of 1000
    • If no rRNA options are used, and there is no rRNA annotations in the GTF, the tool now continues without this information and produces no values for rRNA estimation
    • For 3' and 5' bias estimation, the default end length has been changed to 200 and the option for 10 has been removed.
  • v1.1.6 05/11/12
    • Added support for coordinate-sorted GTF files
    • Added -gatkFlags option to pass any set of flags directly to the GATK
  • v1.1.7 05/14/12
    • Removed SamToFastq step for rRNA estimation, alleviating problems with mixed pairing


Useful Reference Data
Modified GENCODE GTF file for human with contig names of form ("1","2", etc)
Original GENCODE GTF file for human with contig names of form ("chr1","chr2", etc); Use this if your BAMs were aligned to a reference with these contig names
GC content definitions file with IDs matching GENCODE
rRNA reference files human and for mouse

Example RNA-seq Data
The following files represent a complete dataset for running RNA-SeQC on an example data.
Example BAM
Modified GENCODE GTF file with contigs matching the BAM ("1","2", etc)
Reference genome with contig names matching the BAM ("1","2", etc)
Reference Index and Dictionary should be extracted in the same directory as the Reference Genome file
GC content definitions file
Human rRNA reference files
RNA-SeQC can be run with or without a BWA-based rRNA level estimation mode. To run without (less accurate, but faster) use the command:
java -jar RNASeQC.jar -n 1000 -s "TestId|ThousandReads.bam|TestDesc" -t gencode.v7.annotation_goodContig.gtf -r Homo_sapiens_assembly19.fasta -o ./testReport/ -strat gc -gc gencode.v7.gc.txt
To run the more accurate but slower, BWA-based method :
java -jar RNASeQC.jar -n 1000 -s "TestId|ThousandReads.bam|TestDesc" -t gencode.v7.annotation_goodContig.gtf -r Homo_sapiens_assembly19.fasta -o ./testReport/ -strat gc -gc gencode.v7.gc.txt -BWArRNA human_all_rRNA.fasta
Note: this assumes BWA is in your PATH. If this is not the case, use the -bwa flag to specify the path to BWA