How do I run MuTect?

Require Java 6 Runtime

java -Xmx2g -jar muTect-XXXX-XX-XX.jar
--analysis_type MuTect
--reference_sequence <reference>
--cosmic <cosmic.vcf>
--dbsnp <dbsnp.vcf>
--intervals <intervals_to_process>
--input_file:normal <normal.bam>
--input_file:tumor <tumor.bam>
--out <call_stats.out>
--coverage_file <coverage.wig.txt> 

These parameters are based upon your genome build for your alignments:

For HG18

<reference> - Homo_sapiens_assembly18.fasta
<dbsnp.vcf> - dbsnp_132.hg18.vcf
<cosmic.vcf> - hg18_cosmic_v54_120711.vcf

For HG19/GRC37

<reference> - Homo_sapiens_assembly19.fasta
<dbsnp.vcf> - dbsnp_132_b37.leftAligned.vcf
<cosmic.vcf> - hg19_cosmic_v54_120711.vcf

For Mouse MM9

<reference> - Mus_musculus_assembly9.fasta
<dbsnp.vcf> - dbsnp_128_mm9.vcf
<cosmic.vcf> - there is no cosmic VCF available for mouse, this entire parameter can be eliminated

Whereas these parameters are related to the sample/BAM:

<intervals_to_process> - either a literal list of "chrom:start-end" separated by semicolons  (e.g. chr1:1500-2500; chr2:2500-3500) or a file of such entries with one entry per line
<normal.bam> - BAM file for the Normal (positional, this must be before the tumor BAM file)
<tumor.bam> - BAM file for the Tumor
<call_stats.out> - filename to write detailed caller output
<coverage.wig.txt> - filename for coverage output

How do I interpret the output?

The output of the caller is extremely verbose currently in order to aid with development. However, it's very simple to restrict down to a set of confident calls by searching for lines that don't contain the string REJECT

grep -v REJECT <my.call_stats.txt>

You may also notice that output has quite a few columns in it. Here are some of the more prominent ones along with their definitions:

  • contig - the contig location of this candidate
  • position - the 1-based position of this candidate on the given contig
  • ref_allele - the reference allele for this candidate
  • alt_allele - the mutant (alternate) allele for this candidate
  • tumor_name - name of the tumor as given on the command line, or extracted from the BAM
  • normal_name - name of the normal as given on the command line, or extracted from the BAM
  • score - for future development
  • dbsnp_site - is this a dbsnp site as defined by the dbsnp bitmask supplied to the caller
  • covered - was the site powered to detect a mutation (80% power for a 0.3 allelic fraction mutation)
  • power - tumor_power * normal_power
  • tumor_power - given the tumor sequencing depth, what is the power to detect a mutation at 0.3 allelic fraction
  • normal_power - given the normal sequencing depth, what power did we have to detect (and reject) this as a germline variant
  • total_pairs - total tumor and normal read depth which come from paired reads
  • improper_pairs - number of reads which have abnormal pairing (orientation and distance)
  • map_Q0_reads - total number of mapping quality zero reads in the tumor and normal at this locus
  • init_t_lod - deprecated
  • t_lod_fstar - CORE STATISTIC: Log of (likelihood tumor event is real / likelihood event is sequencing error )
  • tumor_f - allelic fraction of this candidated based on read counts
  • contaminant_fraction - estimate of contamination fraction used (supplied or defaulted)
  • contaminant_lod - log likelihood of ( event is contamination / event is sequencing error )
  • t_ref_count - count of reference alleles in tumor
  • t_alt_count - count of alternate alleles in tumor
  • t_ref_sum - sum of quality scores of reference alleles in tumor
  • t_alt_sum - sum of quality scores of alternate alleles in tumor
  • t_ins_count - count of insertion events at this locus in tumor
  • t_del_count - count of deletion events at this locus in tumor
  • normal_best_gt - most likely genotype in the normal
  • init_n_lod - log likelihood of ( normal being reference / normal being altered )
  • n_ref_count - count of reference alleles in normal
  • n_alt_count - count of alternate alleles in normal
  • n_ref_sum - sum of quality scores of reference alleles in normal
  • n_alt_sum - sum of quality scores of alternate alleles in normal
  • judgement - final judgement of site KEEP or REJECT (not enough evidence or artifact)

Example

Here is an example invocation of the caller on a BAM aligned to HG19

java -Xmx2g -jar muTect-1.0.27783.jar
--analysis_type MuTect
--reference_sequence Homo_sapiens_assembly19.fasta
--dbsnp dbsnp_132_b37.leftAligned.vcf
--cosmic hg19_cosmic_v54_120711.vcf
--intervals 17:7577100-7577200
--input_file:normal Normal.cleaned.bam
--input_file:tumor Tumor.cleaned.bam
--out example.call_stats.txt
--coverage_file example.coverage.wig.txt

Which produces a call stats containing a single confident mutation in the 100bp window (of TP53 in this case):

contig position ref_allele alt_allele tumor_name normal_name score dbsnp_site covered power tumor_power normal_power total_pairs improper_pairs map_Q0_reads init_t_lod t_lod_fs
tar tumor_f contaminant_fraction contaminant_lod t_ref_count t_alt_count t_ref_sum t_alt_sum t_ins_count t_del_count normal_best_gt init_n_lod n_ref_count n_alt_count n_ref_sum
n_alt_sum failure_reasons judgement
17 7577106 G A TUMOR NORMAL 0 DBSNP COVERED 0.988119 0.988122 0.999996 90 0 0 13.724831 17.74093 0.162162 0 2.152275
31 6 1207 220 0 0 GG 13.545461 45 0 1760 0 KEEP