Tagged with #mutect
0 documentation articles | 4 announcements | 10 forum discussions

No posts found with the requested search criteria.

Created 2015-11-25 07:37:00 | Updated 2015-11-25 14:21:18 | Tags: haplotypecaller release mutect version-highlights topstory mutect2
Comments (3)

The last GATK 3.x release of the year 2015 has arrived!

The major feature in GATK 3.5 is the eagerly awaited MuTect2 (beta version), which brings somatic SNP and Indel calling to GATK. This is just the beginning of GATK’s scope expansion into the somatic variant domain, so expect some exciting news about copy number variation in the next few weeks! Meanwhile, more on MuTect2 awesomeness below.

In addition, we’ve got all sorts of variant context annotation-related treats for you in the 3.5 goodie bag -- both new annotations and new capabilities for existing annotations, listed below.

In the variant manipulation space, we enhanced or fixed functionality in several tools including LeftAlignAndTrimVariants, FastaAlternateReferenceMaker and VariantEval modules. And in the variant calling/genotyping space, we’ve made some performance improvements across the board to HaplotypeCaller and GenotypeGVCFs (mostly by cutting out crud and making the code more efficient) including a few improvements specifically for haploids. Read the detailed release notes for more on these changes. Note that GenotypeGVCFs will now emit no-calls at sites where RGQ=0 in acknowledgment of the fact that those sites are essentially uncallable.

We’ve got good news for you if you’re the type who worries about disk space (whether by temperament or by necessity): we finally have CRAM support -- and some recommendations for keeping the output of BQSR down to reasonable file sizes, detailed below.

Finally, be sure to check out the detailed release notes for the usual variety show of minor features (including a new Queue job runner that enables local parallelism), bug fixes and deprecation notices (a few tools have been removed from the codebase, in the spirit of slimming down ahead of the holiday season).

Introducing MuTect2 (beta): calling somatic SNPs and Indels natively in GATK

MuTect2 is the next-generation somatic SNP and indel caller that combines the DREAM challenge-winning somatic genotyping engine of the original MuTect with the assembly-based machinery of HaplotypeCaller.

The original MuTect (Cibulskis et al., 2013) was built on top of the GATK engine by the Cancer Genome Analysis group at the Broad Institute, and was distributed as a separate package. By all accounts it did a great job calling somatic SNPs, and was part of the winning entries for multiple DREAM challenges (including some submitted by groups outside the Broad). However it was not able to call indels; and the less said about the indel caller that accompanied it (first named SomaticIndelDetector then Indelocator) the better.

This new incarnation of MuTect leverages much of the HaplotypeCaller’s internal machinery (including the all-important graph assembly bit) to call both SNPs and indels together. Yet it retains key parts of the original MuTect’s internal genotyping engine that allow it to model somatic variation appropriately. This is a major differentiation point compared to HaplotypeCaller, which has expectations about ploidy and allele frequencies that make it unsuitable for calling somatic variants.

As a convenience add-on to MuTect2, we also integrated the cross-sample contamination estimation tool ContEst into GATK 3.5. Note that while the previous public version of this tool relied on genotyping chip data for its operation, this version of the tool has been upgraded to enable on-the-fly genotyping for the case where genotyping data is not available. Documentation of this feature will be provided in the near future. Both MuTect2 and ContEst are now featured in the Tool Documentation section of the Guide. Stay tuned for pipeline-level documentation on performing somatic variant discovery, to be added to the Best Practices docs in the near future.

Please note that this release of MuTect2 is a beta version intended for research purposes only and should not be applied in production/clinical work. MuTect2 has not yet undergone the same degree of scrutiny and validation as the original MuTect since it is so new. Early validation results suggest that MuTect2 has a tendency to generate more false positives as compared to the original MuTect; for example, it seems to overcall somatic mutations at low allele frequencies, so for now we recommend applying post-processing filters, e.g. by hard-filtering calls with low minor allele frequencies. Rest assured that data is being generated and the tools are being improved as we speak. We’re also looking forward to feedback from you, the user community, to help us make it better faster.

Finally, note also that MuTect2 is distributed under the same restricted license as the original MuTect; for-profit users are required to seek a license to use it (please email softwarelicensing@broadinstitute.org). To be clear, while MuTect2 is released as part of GATK, the commercial licensing has not been consolidated under a single license. Therefore, current holders of a GATK license will still need to contact our licensing office if they wish to use MuTect2.

Annotate this: new and improved variant context annotations

Whew that was a long wall of text on MuTect2, wasn’t it. Let’s talk about something else now. Annotations! Not functional annotations, mind you -- we’re not talking about e.g. predicting synonymous vs. non-synonymous mutations here. I mean variant context annotations, i.e. all those statistics calculated during the variant calling process which we mostly use to estimate how confident we are that the variants are real vs. artifacts (for filtering and related purposes).

So we have two new annotations, BaseCountsBySample (what it says on the can) and ExcessHet (for excess heterozygosity, i.e. the number of heterozygote calls made in excess of the Hardy-Weinberg expectations), as well as a set of new annotations that are allele-specific versions of existing annotations (with AS_ prefix standing for Allele-Specific) which you can browse here. Right now we’re simply experimenting with these allele-specific annotations to determine what would be the best way to make use of them to improve variant filtering. In the meantime, feel free to play around with them (via e.g. VariantsToTable) and let us know if you come up with any interesting observations. Crowdsourcing is all the rage, let’s see if it gets us anywhere on this one!

We also made some improvements to the StrandAlleleCountsBySample annotation, to how VQSR handles MQ, and to how VariantAnnotator makes use of external resources -- and we fixed that annoying bug where default annotations were getting dropped. All of which you can read about in the detailed release notes.

These Three Awesome File Hacks Will Restore Your Faith In Humanity’s Ability To Free Up Some Disk Space

CRAM support! Long-awaited by many, lovingly implemented by Vadim Zalunin at EBI and colleagues at the Sanger Institute. We haven’t done extensive testing, and there are a few tickets for improvements that are planned at the htsjdk level -- but it works well enough that we’re comfortable releasing it under a beta designation. Meaning have fun with it, but do your own thorough testing before putting it into production or throwing out your old BAMs!

Static binning of base quality scores. In a nutshell, binning (or quantizing) the base qualities in a BAM file means that instead of recording all possible quality values separately, we group them into bins represented by a single value (by default, 10, 20, 30 or 40). By doing this we end up having to record fewer separate numbers, which through the magic of BAM compression yields substantially smaller files. The idea is that we don’t actually need to be able to differentiate between quality scores at a very high resolution -- if the binning scheme is set up appropriately, it doesn’t make any difference to the variant discovery process downstream. This is not a new concept, but now the GATK engine has an argument to enable binning quality scores during the base recalibration (BQSR) process using a static binning scheme that we have determined produces optimal results in our hands. The level of compression is of course adjustable if you’d like to set your own tradeoff between compression and base quality resolution. We have validated that this type of binning (with our chosen default parameters) does not have any noticeable adverse effect on germline variant discovery. However we are still looking into some possible effects on somatic variant discovery, so we can’t yet recommend binning for that application.

Disable indel quality scores. The Base Recalibration process produces indel quality scores in addition to the regular base qualities. They are stored in the BI and BD tags of the read records, taking up a substantial amount of space in the resulting BAM files. There has been a lot of discussion about whether these indel quals are worth the file size inflation. Well, we’ve done a lot of testing and we’ve now decided that no, for most use cases the indel quals don’t make enough of a difference to justify the extra file size. The one exception to this is when processing PacBio data, it seems that indel quals may help model the indel-related errors of that technology. But for the rest, we’re now comfortable recommending the use of the --disable_indel_quals argument when writing out the recalibrated BAM file with PrintReads.

Created 2015-11-25 07:10:45 | Updated 2015-11-25 14:27:50 | Tags: Promote haplotypecaller release-notes mutect gatk3 mutect2
Comments (3)

GATK 3.5 was released on November 25, 2015. Itemized changes are listed below. For more details, see the user-friendly version highlights.

New tools

  • MuTect2: somatic SNP and indel caller based on HaplotypeCaller and the original MuTect.
  • ContEst: estimation of cross-sample contamination (primarily for use in somatic variant discovery).
  • GatherBqsrReports: utility to gather recalibration tables from scatter-parallelized BaseRecalibrator runs.

Variant Context Annotations

  • Added allele-specific version of existing annotations: AS_BaseQualityRankSumTest, AS_FisherStrand, AS_MappingQualityRankSumTest, AS_RMSMappingQuality, AS_RankSumTest, AS_ReadPosRankSumTest, AS_StrandOddsRatio, AS_QualByDepth and AS_InbreedingCoeff.

  • Added BaseCountsBySample annotation. Intended to provide insight into the pileup of bases used by HaplotypeCaller in the calling process, which may differ from the pileup observed in the original bam file because of the local realignment and additional filtering performed internally by HaplotypeCaller. Can only be requested from HaplotypeCaller, not VariantAnnotator.

  • Added ExcessHet annotation. Estimates excess heterozygosity in a population of samples. Related to but distinct from InbreedingCoeff, which estimates evidence for inbreeding in a population. ExcessHet scales more reliably to large cohort sizes.

  • Added FractionInformativeReads annotation. Reports the number of reads that were considered informative by HaplotypeCaller (over all samples).

  • Enforced calculating GenotypeAnnotations before InfoFieldAnnotations. This ensures that the AD value is available to use in the QD calculation.

  • Reorganized standard annotation groups processing to ensure that all default annotations always get annotated regardless of what is specified on the command line. This fixes a bug where default annotations were getting dropped when the command line included annotation requests.

  • Made GenotypeGVCFs subset StrandAlleleCounts intelligently, i.e. subset the SAC values to the called alleles. Previously, when the StrandAlleleCountsBySample (SAC) annotation was present in GVCFs, GenotypeGVCFs carried it over to the final VCF essentially unchanged. This was problematic because SAC includes the counts for all alleles originally present (including NON-REF) even when some are not called in the final VCF. When the full list of original alleles is no longer available, parsing SAC could become difficult if not impossible.

  • Added new MQ jittering functionality to improve how VQSR handles MQ.

  • VariantAnnotator can now annotate FILTER field from an external resource. Usage: --resource:foo resource.vcf --expression foo.FILTER

  • VariantAnnotator can now check allele concordance when annotating with an external resource. Usage: --resourceAlleleConcordance

  • Bug fix: The annotation framework was improved to allow for the collection of sufficient statistics during GVCF creation which are then used to compute the final annotation during the genotyping. This avoids the use of median as the representative annotation from the collection of values (one from each sample). TL;DR annotations will be more accurate when using the GVCF workflow for joint discovery.

Variant manipulation tools

  • Allowed overriding hard-coded cutoff for allele length in ValidateVariants and in LeftAlignAndTrimVariants. Usage: --reference_window_stop N where N is the desired cutoff.

  • Also in LeftAlignAndTrimVariants, trimming multiallelic alleles is now the default behavior.

  • Fixed ability to mask out snps with --snpmask in FastaAlternateReferenceMaker.

  • Also in FastaAlternateReferenceMaker, fixed merging of contiguous intervals properly, and made the tool produce more informative contig names.

  • Fixed a bug in CombineVariants that occurred when one record has a spanning deletion and needs a padded reference allele.

  • Added a new VariantEval evaluation module, MetricsCollection, that summarizes metrics from several EV modules.

  • Enabled family-level stratification in MendelianViolationEvaluator of VariantEval (if a ped file is provided), making it possible to count Mendelian violations for each family in a callset with multiple families.

GVCF tools

  • Various improvements to the tools’ performance, especially HaplotypeCaller, by making the code more efficient and cutting out crud.

  • GenotypeGVCFs now emits a no-call (./.) when the evidence is too ambiguous to make a call at all (e.g. all the PLs are zero). Previously this would have led to a hom-ref call with RGQ=0.

  • Fixed a bug in GenotypeGVCFs that sometimes generated invalid VCFs for haploid callsets. The tool was carrying over the AD from alleles that had been trimmed out, causing field length mismatches.

  • Changed the genotyping implementation for haploid organisms to address performance problems reported when running GenotypeGVCFs on haploid callsets. Note that this change may lead to a slight loss of sensitivity at low-coverage sites -- let us know if you observe anything dramatic.

Genotyping engine tweaks

  • Ensured inputPriors get used if they are specified to the genotyper (previously they were ignored). Also improved docs on --heterozygosity and --indel_ heterozygosity priors.

  • Fixed bug that affected the --ignoreInputSamples behavior of CalculateGenotypePosteriors.

  • Limited emission of the scary warning message about max number of alleles (“this tool is set to genotype at most x alleles but we found more; only x will be used”) to a single occurrence unless DEBUG logging mode is activated. Otherwise it fills up our output logs.

Miscellaneous tool fixes

  • Added option to OverclippedReadFilter to not require soft-clips on both ends. Contributed by Jacob Silterra.

  • Fixed a bug in IndelRealigner where the tool was incorrectly "fixing" mates when supplementary alignments are present. The patch involves ignoring supplementary alignments.

  • Fixed a bug in CatVariants. Previously, VCF files were being sorted solely on the base pair position of the first record, ignoring the chromosome. This can become problematic when merging files from different chromosomes, especially if you have multiple VCFs per chromosome. Contributed by John Wallace.

Engine-level behaviors and capabilities

  • Support for reading and writing CRAM files. Some improvements are still expected in htsjdk. Contributed by Vadim Zalunin at EBI and collaborators at the Sanger Institute.

  • Added the ability to enforce 4.2 version output of the VCF spec when processing older files. Use case: the 4.2 spec specifies that GQ must be an integer; by default we don’t enforce it (so if reading an older file that used decimals, we don’t change it) but the new argument --forceValidOutput converts the values on request. Not made default because of some performance slowdown -- so writing VCFs is now fast by default, compliant by choice.

  • Made interval-list output format dependent on the file extension (for RealignerTargetCreator). If the extension is .interval_list, output will be formatted as a proper Picard interval list (with sequence dictionary). Otherwise it will be a basic GATK interval list as previously.

  • Adding static binning capability for base recalibration (BQSR).


  • Added a new JobRunner called ParallelShell that will run jobs locally on one node concurrently as specified by the DAG, with the option to limit the maximum number of concurrently running jobs using the flag maximumNumberOfJobsToRunConcurrently. Contributed by Johan Dahlberg.

  • Updated extension for Picard CalculateHsMetrics to include PER_TARGET_COVERAGE argument and added extension for Picard CollectWgsMetrics.

Deprecation notice


  • BeagleOutputToVCF, VariantsToBeagleUnphased, ProduceBeagleInput. These are tools for handling Beagle data. The latest versions of Beagle support VCF input and output, so there is no longer any reason for us to provide converters.
  • ReadAdaptorTrimmer and VariantValidationAssessor. These were experimental tools which we think are not useful and not operating on a sufficiently sound basis.
  • BaseCoverageDistribution and CoveredByNSamplesSites. These tools were redundant with DiagnoseTargets and/or DepthOfCoverage.
  • LiftOverVariants, FilterLiftedVariants and liftOverVCF.pl. The Picard liftover tool LiftoverVCF works better and is easier to operate.
  • sortByRef.pl. Use Picard SortVCF instead.
  • ListAnnotations. This was intended as a utility for listing annotations easily from command line, but it has not proved useful.


  • Made various documentation improvements.
  • Updated date and street address in license text.
  • Moved htsjdk & picard to version 1.141

Created 2013-10-01 00:55:10 | Updated 2013-10-01 14:24:41 | Tags: official mutect appistry webinar cancer
Comments (0)

Our partner Appistry (who distribute GATK and MuTect to commercial users) will be holding a webinar on 3 October. Registration is open to all; you can find more details on the Appistry website here:


Created 2013-07-24 20:26:27 | Updated 2013-07-26 14:41:56 | Tags: official mutect appistry gatk webinar
Comments (0)

Heads up, cancer researchers! Appistry (our commercial licensing partner for GATK and now MuTect) is announcing an upcoming webinar on best practices for somatic mutation studies using GATK and MuTect. Registration for the webinar is open to all (not just Appistry customers) so be sure to sign up for this. See the announcement on Appistry's website for more detailed information.

Created 2015-11-22 18:54:45 | Updated | Tags: mutect cosmic
Comments (5)

Hi GATK team and community !

I'm working on a pool of five couple of tumor/normal bam samples and I'm looking for variants (hg19 ref). I did the pre-processing steps successfully and I want to perform the variant calling step with Mutect. I merged tumor samples in a unique merge bam, same for normal alignments.

For mutect I used dbSNP vcf file provide on the broadinstitue ftp dbsnp_138.hg19.vcf and COSMIC vcf file on cosmic ftp /cosmic/grch37/cosmic/v74/CosmicCodingMuts.vcf.gz

First I renamed cosmic contigs according to hg19 reference and reordered it using picard SortVCF tool, ordering vcf file like this :

> awk -F "\t" '{ print $1 }' /home/data/src/cosmic/hg19/CosmicCodingMuts_sorted.vcf | grep "^chr" | uniq

Troubles come when I'm launching mutect :

> mutect --analysis_type MuTect --reference_sequence /home/data/src/broadinstitute/ucsc.hg19.fasta --cosmic /home/data/src/cosmic/hg19/CosmicCodingMuts_sorted.vcf --dbsnp /home/data/src/broadinstitute/dbsnp_138.hg19.vcf --input_file:normal workspace/pn_merge/pn_merge_recal_reads.bam --input_file:tumor workspace/pt_merge/pt_merge_recal_reads.bam --out call_stats.out
INFO  19:36:51,977 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  19:36:51,979 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-0-g72492bb, Compiled 2015/01/21 17:10:56 
INFO  19:36:51,979 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  19:36:51,979 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  19:36:51,982 HelpFormatter - Program Args: --analysis_type MuTect --reference_sequence /home/data/src/broadinstitute/ucsc.hg19.fasta --cosmic /home/data/src/cosmic/hg19/CosmicCodingMuts_sorted.vcf --dbsnp /home/data/src/broadinstitute/dbsnp_138.hg19.vcf --input_file:normal workspace/pn_merge/pn_merge_recal_reads.bam --input_file:tumor workspace/pt_merge/pt_merge_recal_reads.bam --out call_stats.out 
INFO  19:36:51,987 HelpFormatter - Executing as guillaume@Tibioputer on Linux 3.19.0-33-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_85-b01. 
INFO  19:36:51,987 HelpFormatter - Date/Time: 2015/11/22 19:36:51 
INFO  19:36:51,987 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  19:36:51,987 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  19:36:52,061 GenomeAnalysisEngine - Strictness is SILENT 
INFO  19:36:52,272 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO  19:36:52,282 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  19:36:52,378 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.09 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.1-0-g72492bb): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in cosmic.
##### ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
##### ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
##### ERROR You can use the ReorderSam utility to fix this problem: http://gatkforums.broadinstitute.org/discussion/58/companion-utilities-reordersam
##### ERROR   cosmic contigs = [chr1, chr10, chr11, chr11_gl000202_random, chr12, chr13, chr14, chr15, chr16, chr17, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18, chr18_gl000207_random, chr19, chr19_gl000208_random, chr19_gl000209_random, chr1_gl000191_random, chr1_gl000192_random, chr2, chr20, chr21, chr21_gl000210_random, chr22, chr3, chr4, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr5, chr6, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7, chr7_gl000195_random, chr8, chr8_gl000196_random, chr8_gl000197_random, chr9, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chrM, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249, chrX, chrY]
##### ERROR ------------------------------------------------------------------------------------------

If I run the same command without the cosmic file, it's working. What is surprising for me is contigs names and ordering is the same compare to dbSNP vcf file. I don't understand what's going wrong...

I always found topics about my previous issues on GATK forum, but it's like I'm alone to found this problem with a cosmic vcf file, sounds like a minor problem but I can't point it... Any help will be very appreciate :\



Created 2015-11-20 15:45:01 | Updated | Tags: mutect somatic-variants tumor-only
Comments (1)


I recently went to the workshop for variant calling and mentioned that I would like to perform somatic variant calling with Mutect using only tumor samples (no matched normal sample). I was told that there is a pipeline under development that is not yet fully tested that you would be able to provide. Would you be able to provide this along with any other recommendations?

Thank you!

Created 2015-11-12 22:35:00 | Updated | Tags: mutect removing-low-quality-reads perl
Comments (2)

Before variant calling, MuTect removes low-quality reads first, please look at short read preprocessing at nature.com/nbt/journal/v31/n3/extref/nbt.2514-S1.pdf. I want to use this short read pre-processing method for my BAM files, and tried to program by perl. But I have no idea about how to program these sentences: (c) if there is an overlapping read pair, and both reads agree the read with the highest quality score is retained otherwise both are discarded. (b) if there is an overlapping read pair, and both reads agree the read with the highest quality score is retained otherwise the read that disagrees with the reference is retained. Can anybody help me to understand them? Thanks very much for help!

--best Jing

Created 2015-10-19 14:03:33 | Updated | Tags: variantfiltration mutect fa
Comments (6)

Hi Folks, I'm using Mutect to call somatic variants on some human gene panel data (~660 amplicons). After making the calls I applied VariantFiltration in order to limit the variants to a specific set of coordinates (via a bed file). VariantFiltration seems to have done the job in terms of selecting the correct regions; however, the FA can be quite different, but not always.

For example (HOT = VariantFiltration Applied): 3 samples [sample name = c2; FA = c15]

HOT ras119 12 25360224 rs61764370 A C KRAS 0/1 853 38 24 891 0.043 4.3 FULL ras119 12 25360224 rs61764370 A C KRAS 0/1 584 290 28 878 0.332 33.20 HOT ras116 12 25360224 rs61764370 A C KRAS 0/1 536 345 25 884 0.392 39.2 FULL ras116 12 25360224 rs61764370 A C KRAS 0/1 447 405 27 854 0.475 47.50 HOT ras25 12 25360224 rs61764370 A C KRAS 0/1 833 97 25 930 0.104 10.4 FULL ras25 12 25360224 rs61764370 A C KRAS 0/1 638 236 28 876 0.27 27.00

It isn't clear to me why this would change, nor is it clear which number I should believe.

Any ideas?

Thanks, Robert

Created 2015-10-01 08:24:18 | Updated | Tags: mutect rat
Comments (4)

I would like to confirm that when i use mutect with Rat data, it is not necessary to add neither dbsnp file nor COSMIC file (As COSMIC data is for human cancer only). What about if i provided only dpsnp file, any call that is in dpsnp file will be excluded wether it is germ or somatic snp??

Thank you.

Created 2015-09-16 09:11:43 | Updated 2015-09-16 09:13:37 | Tags: mutect gatk-protected
Comments (4)

I'm trying to install mutect, and as directed in the README.md, I've git cloned gatk-protected and tried to do 'mvn -Ddisable.queue install'. But I get the following issue. I've java 1.7 and maven 3.3.3.

[INFO] -------------------------------------------------------------
[INFO] -------------------------------------------------------------
[WARNING] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/utils/threading/ThreadEfficiencyMonitor.java: Some input files use or override a deprecated API.
[WARNING] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/utils/threading/ThreadEfficiencyMonitor.java: Recompile with -Xlint:deprecation for details.
[WARNING] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/datasources/reads/SAMDataSource.java: Some input files use unchecked or unsafe operations.
[WARNING] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/datasources/reads/SAMDataSource.java: Recompile with -Xlint:unchecked for details.
[WARNING] Some messages have been simplified; recompile with -Xdiags:verbose to get full output
[INFO] 5 warnings
[INFO] -------------------------------------------------------------
[INFO] -------------------------------------------------------------
[INFO] -------------------------------------------------------------
[ERROR] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/walkers/annotator/interfaces/AnnotationInterfaceManager.java:[129,24] no suitable method found for add(java.lang.Object)
    method java.util.Collection.add(T) is not applicable
      (argument mismatch; java.lang.Object cannot be converted to T)
    method java.util.List.add(T) is not applicable
      (argument mismatch; java.lang.Object cannot be converted to T)
[INFO] 1 error
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] Sting Root ......................................... SUCCESS [  0.455 s]
[INFO] Sting Aggregator ................................... SUCCESS [  0.185 s]
[INFO] Sting GSALib ....................................... SUCCESS [  0.447 s]
[INFO] Sting Utils ........................................ SUCCESS [  0.698 s]
[INFO] GATK Framework ..................................... FAILURE [  4.181 s]
[INFO] GATK Protected ..................................... SKIPPED
[INFO] GATK Package ....................................... SKIPPED
[INFO] Sting Public ....................................... SKIPPED
[INFO] Sting Protected .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6.134 s
[INFO] Finished at: 2015-09-16T14:27:14+05:30
[INFO] Final Memory: 44M/1583M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (compile-java) on project gatk-framework: Compilation failure
[ERROR] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/walkers/annotator/interfaces/AnnotationInterfaceManager.java:[129,24] no suitable method found for add(java.lang.Object)
[ERROR] method java.util.Collection.add(T) is not applicable
[ERROR] (argument mismatch; java.lang.Object cannot be converted to T)
[ERROR] method java.util.List.add(T) is not applicable
[ERROR] (argument mismatch; java.lang.Object cannot be converted to T)
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (compile-java) on project gatk-framework: Compilation failure
/home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/walkers/annotator/interfaces/AnnotationInterfaceManager.java:[129,24] no suitable method found for add(java.lang.Object)
    method java.util.Collection.add(T) is not applicable
      (argument mismatch; java.lang.Object cannot be converted to T)
    method java.util.List.add(T) is not applicable
      (argument mismatch; java.lang.Object cannot be converted to T)

        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
        at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
        at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
        at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
        at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
        at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
        at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862)
        at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286)
        at org.apache.maven.cli.MavenCli.main(MavenCli.java:197)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
        at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
        at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.compiler.CompilationFailureException: Compilation failure
/home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/walkers/annotator/interfaces/AnnotationInterfaceManager.java:[129,24] no suitable method found for add(java.lang.Object)
    method java.util.Collection.add(T) is not applicable
      (argument mismatch; java.lang.Object cannot be converted to T)
    method java.util.List.add(T) is not applicable
      (argument mismatch; java.lang.Object cannot be converted to T)

        at org.apache.maven.plugin.compiler.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:858)
        at org.apache.maven.plugin.compiler.CompilerMojo.execute(CompilerMojo.java:129)
        at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
        ... 20 more
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :gatk-framework

I'm not able to understand how to resolve the issue. Could anybody please help me with it?

Created 2015-09-11 15:35:09 | Updated 2015-09-11 15:35:29 | Tags: intervals mutect b37 i reference-error
Comments (15)

I hate to put this same error on the GATK forum again, but I went through many of these errors already posted on the forum, but none of the answers shed light on my issue. I have my bam files aligned to GRCh37-lite and am using the same reference genome downloaded from ftp://ftp.ncbi.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37/special_requests

I have next performed GATK best practices for pre-processing of these bams using the same ref genome without throwing any error in the process. Currently I'm running MuTect as java -Xmx56g -jar muTect-1.1.4.jar --analysis_type MuTect --reference_sequence ./resources/b37/human_g1k_v37.fasta --cosmic ./resources/Cosmic.b37.vcf --dbsnp ./resources/dbsnp_138.b37.vcf --intervals ./resources/mirna.1.5flank-interval-list.list --input_file:normal $normal.recal_reads.bam --input_file:tumor $tumor.recal_reads.bam --out $sample.call_stats.out --coverage_file $sample.coverage.wig.txt

And getting this error message:

ERROR MESSAGE: Badly formed genome loc: Contig 'chr1' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?

What more tests should I run to troubleshoot this issue? Also, the interval list is what I created from a .bed file. I have restricted my bam files to a limited bed regions using the same file in a command "samtools view -@8 -b -h -L"

This was the file I was most confused about. Is it possible that this file is causing the error? First few lines of this file are: chr1:15869-18936 chr1:28866-32003 chr1:566205-569293 chr1:1100984-1104078 chr1:1101743-1104832 chr1:1102885-1105967 chr1:1229990-1233050 chr1:1246382-1249446 chr1:1273530-1276588 chr1:3043039-3046099 chr1:3475759-3478854 chr1:5622631-5625703 chr1:5921232-5924301 chr1:6488394-6491456 chr1:8925061-8928149 chr1:9210227-9213336 chr1:10025939-10029016 chr1:10286276-10289361




Thanks a ton for your help!

Created 2015-05-15 01:23:27 | Updated | Tags: mutect
Comments (3)

I'm trying to build SomaticSpike, which is included in MuTect 1.1.4. Is there any way to actually build this from the github repository? Is there perhaps a binary floating around?

Created 2015-02-12 07:12:06 | Updated | Tags: indelrealigner baserecalibrator mutect
Comments (4)

Hello Do you recommened realign around indels and recalibrate quality score before running Mutect? Thanks!

Created 2014-05-23 16:41:17 | Updated | Tags: snpeff mutect
Comments (7)

I am interested to annotate using snpEff, only those somatic mutations that were flagged as 'KEEP' in the judgement call column of the *callstats file generated from muTect. I can see that these 'KEEP' calls (in the callstats file) are flagged as 'PASS' in the 'Filter' column of the corresponding VCF file.

I am now not sure if I should filter the VCF output files from Mutect (keep only the 'PASS' calls) for snpEff annotation. snpEff does the Ti/Tv ratios apart from functional annotation, so is it expected to provide snpEff the unfiltered VCF for accurate calculation of Ti/Tv ratio OR is it Ok to provide a filtered VCF with only the passed calls?

Thanks Parthav