# Tagged with #mutect 3 documentation articles | 2 announcements | 37 forum discussions

Created 2016-04-21 17:14:27 | Updated 2016-04-27 15:18:12 | Tags: tutorials tutorial mutect picard qc oncotator contest mutect2 firecloud

### Overview

The Broad Mutation Calling Workflow is a somatic mutation analysis and annotation pipeline. It is broken down into four Best Practice workspaces in FireCloud:

The Broad Mutation Calling MuTect workflow runs on a tumor and normal pair or pair set, and consists of three main steps: 1) preparation, 2) somatic mutation analysis, and 3) somatic mutation annotation. As a part of this workflow, MuTect is run in force-call mode to search for somatic variants at a set of specified loci for clinical relevance (for example, TP53 gene on chromosome 17).

The three main steps are described here: 1. Preparation: The tumor and normal BAMs are split up by chromosome conditioned on interval overlap. For each chromosome, the BAM files are decomposed if the supplied interval file has any overlap on the chromosome. In part of the workflow, the supplied intervals are from an Agilent targeted sequencing system. This step may be known as “scatter preparation.”

2. Somatic Mutation Analysis: 2 somatic variant analyses are carried out in parallel: MuTect1 and MuTect2. MuTect1 is used for its somatic single-nucleotide variant detection capabilities. MuTect2 is used for its indel detection capabilities. The programs are run in parallel based on the splitting of the BAMs and intervals from the preparation step before. This step may be known as “scatter”.

3. Somatic Mutation Gather: This final step serves to annotate variants found in the previous step. First the results are gathered and merged. Then, from the MuTect2 output, the index-variants are separated and merged with the single-nucleotide variants from MuTect1. Next, only variants tagged with PASS are fed to Oncotator for annotation with Oncotator. Finally, all variants are passed to the VEP (Variant Effect Predictor) program for further annotation.

#### Runtime

The total expected runtime for the Broad Mutation Calling MuTect workflow depends on the size of the pair you select for analysis. Pair HCC1954 runs on "mini" BAMs and the expected runtime is roughly 45 minutes. Pair HCC1143 runs on larger BAMs and the expected runtime is roughly 1.5 hours. If you run both pairs as a pair set, the pairs will complete at the same expected runtimes respectively with HCC1954 finishing first.

#### MuTect1

MuTect1 identifies somatic point mutations in next generation sequencing data of cancer genomes. It inputs sequencing data for matched normal and tumor tissue samples, and outputs mutation calls and optional coverage results.

In a nutshell, the analysis itself consists of three steps:

• Pre-process the aligned reads in the tumor and normal sequencing data

• Identify using statistical analysis, sites that are likely to carry somatic mutations with high confidence

• Post-processing of candidate somatic mutations

For complete details, please see the 2013 publication in Nature Biotechnology.

#### MuTect2

MuTect2 is a somatic SNP and indel caller that combines the DREAM challenge-winning somatic genotyping engine of the original MuTect (Cibulskis et al., 2013) with the assembly-based machinery of HaplotypeCaller.

The basic operation of MuTect2 proceeds similarly to that of the HaplotypeCaller While the HaplotypeCaller relies on a ploidy assumption (diploid by default) to inform its genotype likelihood and variant quality calculations, MuTect2 allows for a varying allelic fraction for each variant, as is often seen in tumors with purity less than 100%, multiple subclones, and/or copy number variation (either local or aneuploidy). MuTect2 also differs from the HaplotypeCaller in that it does apply some hard filters to variants before producing output.

#### Oncotator

Oncotator is a tool for annotating information onto genomic point mutations (SNPs/SNVs) and indels. It is primarily used for human genome variant callsets. However, the tool can also be used to annotate any kind of information onto variant callsets from any organism.

By default, Oncotator uses a simple TSV file (e.g., MAFLITE) as an input and produces a TCGA MAF as an output. Oncotator also supports VCF files as an input and output format. By extension, Oncotator can be configured to annotate genomic point mutation data with HTML reports as it does in this workflow.

#### VEP (Variant Effect Predictor)

Ensembl’s VEP (Variant Effect Predictor) program processes variants for further annotation. This tool allows annotates variants and determines the effect on relevant transcripts and proteins.

VEP accepts as input coordinates any identified alleles or variant identifiers. A full list of input files is available here. If a variant that you enter as input causes a change in the protein sequence, the VEP will calculate the possible amino acids at that position and the variant would be given a consequence type of missense.

#### Inputs and Outputs

Below are the tool-specific inputs and outputs for this workflow. Note that these inputs and outputs may vary from the “standard” version of the tool.

MuTect1 and MuTect2 Inputs

• normalBam

• normalBamIndex

• tumorBam

• normalBamIndex

• tumorBam

• tumorBamIndex

• ReferenceFasta

• NormalPanel

• ReferenceFastaIndex

• COSMICVCF

• DBSNPVCF

• fracContam

• MutectIntervals

MuTect1 and MuTect2 Outputs

• MuTect PowerFile

• MuTect CoverageFile

• MuTect CallStatsFile

Oncotator and VEP Inputs

• MuTect PowerFile

• MuTect CoverageFile

• MuTect CallStatsFile

• File oncoDBTarBall

• VEP_File

• ReferenceFastaIndex

• ReferenceFasta

Oncotator and VEP Outputs

• Array[File] all_outs=glob("*")

• File oncotator.log

• File MuTect.1.2.call_stats.M1All.M2IndelsOnly.filtered.vcf.annotated.vcf

• FileMuTect1.call_stats.txt

• File MuTect2.call_stats.txt

• File variant_effect_output.txt

• File variant_effect_output.txt_summary.html

#### How to Run this Workflow in FireCloud

• Access the Broad_MutationCalling_MuTect_Workflow_V1_BestPractice workspace to run this workflow.
• Navigate to the Method Configurations tab and click on the method, MutationCalling_MuTect.
• Click Launch Analysis.
• In the Launch Analysis window, toggle to pair and select a pair on which to run this workflow, e.g., HCC1143. You can also run this workflow on a pair set. To do so, toggle to pair_set, select HCC_pairs and enter this.pairs in the Define Expression field.
• Finally, click the Launch button. Check back on the Monitor tab later to view results from your workflow analysis.
• When the status displays Done, select the most recent analysis run, e.g., HCC1143.
• Click on Outputs: Show, then select output files to view the results of this analysis.

#### Use of Tutorial Workspaces

Tutorial workspaces contain open access data and workflows. These Tutorial workspaces will appear in your account after your account is activated.

Tutorial workspaces may ONLY be used for running the tutorial exercises. You must not upload your own data sets to these workspaces, nor should you add tools (Method Configs). If you do not follow these guidelines, your Firecloud account may be deactivated.

#### Runtime

The total expected runtime for this workflow depends on the size of the pair or pair set you select for analysis. Pair HCC1954 runs on "mini" BAMs and the expected runtime is roughly 15 minutes. Pair HCC1143 runs on larger BAMs and the expected runtime is roughly 2.5 hours. If you run both pairs as a pair set, the pairs will complete at the same expected runtimes respectively with HCC1954 finishing first.

The QC task counts reads overlapping regions for tumor and normal BAM files. The task concludes with a report of the counts over the BAMs and lanes. Correlation values are included for comparison purposes.

ContEst uses a Bayesian approach to calculate the posterior probability of the contamination level and determine the maximum a posteriori probability (MAP) estimate of the contamination level. ContEst supports array-free mode, where we genotype on the fly from matched normals, and use that as our source of homozygous variant calls. It currently calls anything with > 80% of bases as the alternate with at least 50X coverage a homozygous alternate site.

Picard Metrics Tasks invoke multiple metrics reporting routines from the Picard toolkit. The following routines are invoked:

• CollectAlignmentSummaryMetrics

• CollectInsertSizeMetrics

• QualityScoreDistribution

• CollectInsertSizeMetrics

• QualityScoreDistribution

• MeanQualityByCycle

• CollectBaseDistributionByCycle

• CollectSequencingArtifactMetrics

• CollectQualityYieldMetrics

• CollectOxoGMetrics.

SAM/BAM validation routines (ValidateSamFile) in both summary and verbose mode are called.

Other routines such as CollectHsMetrics, and MarkDuplicates are also called.

#### Inputs and Outputs

Below are the tool-specific inputs and outputs for this workflow. Note that these inputs and outputs may vary from the “standard” version of the tool.

QC Inputs

• tumorBam

• tumorBamIdx

• normalBam

• normalBamIdx

• regionFile

• captureNormalsDBRCLZip

• caseIdNoSpace

• controlIdNoSpace

QC Outputs

• tumorBamLaneList

• normalBamLaneList

• tumorRCL

• normalRCL

• CopyNumQC

ContEST Inputs

• tumorBam

• tumorBamIdx

• normalBam

• normalBamIdx

• refFasta

• refFastaIdx

• exomeIntervals

• SNP6Bed

• HapMapVCF

• pairName

ContEST Outputs

• contamination_validation.array_free.txt

• contamination.af.txt

• contamination.base_report.txt

• contest_validation.output.tsv

Picard Metrics Inputs (Tumor)

• tumorBam

• tumorBamIdx

• refFasta

• HapMapVCF

• picardHapMap

• picardBaitIntervals

• picardTargetIntervals

• DB_SNP_VCF

• DB_SNP_VCF_IDX

Picard Metrics Inputs (Normal)

• normalBam

• normalBamIdx

• refFasta

• HapMapVCF

• picardHapMap

• picardBaitIntervals

• picardTargetIntervals

• DB_SNP_VCF

• DB_SNP_VCF_IDX

Picard Metrics Outputs

• HSMetrics.txt

• picard_crosscheck_report.txt

• validation_summary.txt

• validation_verbose.txt

• piccard_multiple_metrics.zip

• oxoG_metrics.txt

• duplicate_records.bam

• duplicate_metrics.txt

• sequencing_artifact_metrics.txt.bait_bias_detail_metrics

• sequencing_artifact_metrics.txt.bait_bias_summary_metrics

#### How to Run this Workflow in FireCloud

• Access the Broad_MutationCalling_QC_Workflow_V1_BestPractice workspace to run this workflow.
• Navigate to the Method Configurations tab and click on the method, MutationCalling_QC.
• Click Launch Analysis.
• In the Launch Analysis window, toggle to pair and select a pair on which to run this workflow, e.g., HCC1143. You can also run this workflow on a pair set. To do so, toggle to pair_set, select HCC_pairs and enter this.pairs in the Define Expression field.
• Finally, click the Launch button. Check back on the Monitor tab later to view results from your workflow analysis.
• You can view results and outputs by clicking on the most recent analysis run, e.g., HCC1143.
• Click on Outputs: Show, then select output files to view the results of this analysis.

#### Use of Tutorial Workspaces

Tutorial workspaces contain open access data and workflows. These Tutorial workspaces will appear in your account after your account is activated.

Tutorial workspaces may ONLY be used for running the tutorial exercises. You must not upload your own data sets to these workspaces, nor should you add tools (Method Configs). If you do not follow these guidelines, your Firecloud account may be deactivated.

#### References

Created 2016-04-02 04:24:40 | Updated | Tags: mutect panel-of-normals mutect2 somatic pon

The Panel of Normals (PoN) plays two important roles in somatic variant analysis:

1. Exclude germline variant sites that are found in the normals to avoid calling them as potential somatic variants in the tumor;
2. Exclude technical artifacts that arise from particular techniques (eg sample preservation) and technologies (eg library capture, sequencing chemistry).

Given these roles, the most important selection criteria are the technical properties of how the normal data was generated. It's very important to use normals that are as technically similar as possible to the tumor. Also, the samples should come from subjects that were young and healthy (to minimize the chance of using as normal a sample from someone who has an undiagnosed tumor).

If possible it is better to use normals generated from the same type of tissue because if the tissues were preserved differently, the artifact patterns may be different.

Created 2016-01-26 17:15:19 | Updated 2016-04-26 19:25:41 | Tags: mutect oncotator contest

### Overview

This "Mini" Mutation Calling Tutorial includes a subset of tools in our complete Broad Mutation Calling Workflow. It contains ContEst, MuTect, and Oncotator tools. When run on "mini" tumor and cell line BAMs (containing only 100 genes), the expected runtime is roughly 30 minutes.

ContEst

ContEst estimates contamination levels in next-generation sequencing data. It uses a Bayesian approach to calculate the posterior probability of the contamination level and determine the maximum a posteriori probability (MAP) estimate of the contamination level.

MuTect

MuTect identifies somatic point mutations in next generation sequencing data of cancer genomes. It inputs sequencing data for matched normal and tumor tissue samples, and outputs mutation calls and optional coverage results.

In a nutshell, the analysis itself consists of three steps:

1. Pre-process the aligned reads in the tumor and normal sequencing data

2. Identify using statistical analysis, sites that are likely to carry somatic mutations with high confidence

3. Post-processing of candidate somatic mutations

For complete details, please see the 2013 publication in Nature Biotechnology.

Oncotator

Oncotator is a tool for annotating information onto genomic point mutations (SNPs/SNVs) and indels. It is primarily used for human genome variant callsets. However, the tool can also be used to annotate any kind of information onto variant callsets from any organism.

### Method Flow

Below is an overview of the individual tools within the Broad Mutation Calling Workflow.

What does ContEst do?

ContEst uses a Bayesian approach to calculate the posterior probability of the contamination level and determine the maximum a posteriori probability (MAP) estimate of the contamination level.

ContEst supports array-free mode, where we genotype on the fly from matched normals, and use that as our source of homozygous variant calls. It currently calls anything with > 80% of bases as the alternate with at least 50X coverage a homozygous alternate site.

What does MuTect do?

Pre-process the aligned reads in the tumor and normal sequencing data.

In this step MuTect ignores reads with too many mismatches or very low quality scores since these represent noisy reads that introduce more noise than signal.

Identify using statistical analysis sites that are likely to carry somatic mutations with high confidence.

The statistical analysis predicts a somatic mutation by using two Bayesian classifiers – the first aims to detect whether the tumor is non-reference at a given site and, for those sites that are found as non-reference, the second classifier makes sure the normal does not carry the variant allele. In practice the classification is performed by calculating a LOD score (log odds) and comparing it to a cutoff determined by the log ratio of prior probabilities of the considered events. For more information, refer to the MuTect Cancer Genome Analysis page.

Post-processing of candidate somatic mutations

This step aims to eliminate artifacts of next-generation sequencing, short read alignment and hybrid capture. For example, sequence context can cause hallucinated alternate alleles but often only in a single direction. Therefore, MuTect tests whether the alternate alleles supporting the mutations are observed in both directions.

What does Oncotator do?

Oncotator annotates information onto genomic point mutations (SNPs/SNVs) and indels.

By default, Oncotator uses a simple TSV file (e.g., MAFLITE) as an input and produces a TCGA MAF as an output. Oncotator also supports VCF files as an input and output format.

By extension, Oncotator can be configured to annotate genomic data with HTML reports. In this BasicSomaticMutationCalling workflow, Oncotator populates an HTML report to the Workspace Data tab.

### Inputs and Outputs

Below are the tool-specific inputs and outputs for this workflow.

ContEst Inputs

• normalBamHG19

• normalBamIndexHG19

• tumorBamHG19

• tumorBamIndexHG19

• ReferenceFasta

• ContESTIntervals

• HapMapVCF

• SNP6Bed

ContEst Outputs

MuTect Inputs

• normalBamHG19

• normalBamIndexHG19

• tumorBamHG19

• normalBamIndexHG19

• tumorBamHG19

• tumorBamIndexHG19

• ReferenceFasta

• HapMapVCF

• ReferenceFastaIndex

• COSMICVCF

• DBSNPVCF

• MutectIntervals

MuTect Outputs

Oncotator Inputs

• OncoVCF

Oncotator Outputs

• SAMPLE.vcf

• oncotator.log

• oncotator_out.html

### How to run this workflow in FireCloud

1. Access the broad-firecloud-tutorials/MiniMutationCalling_V1_Tutorial workspace to run this workflow.

2. Navigate to the Method Configurations tab and click on the method, MiniMutationCalling.

3. Click Launch Analysis.

4. In the Launch Analysis window, toggle to pair and select a pair on which to run this workflow, e.g., HCC1143_pair_100_gene_250bp_pad. You can also run this workflow on a pair set by toggling to pair_set. Note: You must then type this.pairs in the Define Expression field.

5. Finally, click the Launch button. Check back on the Monitor tab after 30 minutes or so to view results from your workflow analysis.

6. When the status displays Done, click on the most recent analysis run to view outputs and results, e.g., HCC1143_pair_100_gene_250bp_pad (pair).

7. Click on Outputs: Show, then select output files to view the results of this analysis.

8. You can also view the Oncotator HTML report as an attribute in the Data tab.

### Use of Tutorial Workspaces

Tutorial workspaces contain open access data and workflows. These Tutorial workspaces will appear in your account upon registration and will be active for 30 days.

Tutorial workspaces may ONLY be used for running the tutorial exercises. You must not upload your own data sets to these workspaces, nor should you add tools (Method Configs). If you do not follow these guidelines, your Firecloud account may be deactivated.

### References

Created 2015-11-25 07:37:00 | Updated 2015-11-25 14:21:18 | Tags: haplotypecaller release mutect version-highlights topstory mutect2

The last GATK 3.x release of the year 2015 has arrived!

The major feature in GATK 3.5 is the eagerly awaited MuTect2 (beta version), which brings somatic SNP and Indel calling to GATK. This is just the beginning of GATK’s scope expansion into the somatic variant domain, so expect some exciting news about copy number variation in the next few weeks! Meanwhile, more on MuTect2 awesomeness below.

In addition, we’ve got all sorts of variant context annotation-related treats for you in the 3.5 goodie bag -- both new annotations and new capabilities for existing annotations, listed below.

In the variant manipulation space, we enhanced or fixed functionality in several tools including LeftAlignAndTrimVariants, FastaAlternateReferenceMaker and VariantEval modules. And in the variant calling/genotyping space, we’ve made some performance improvements across the board to HaplotypeCaller and GenotypeGVCFs (mostly by cutting out crud and making the code more efficient) including a few improvements specifically for haploids. Read the detailed release notes for more on these changes. Note that GenotypeGVCFs will now emit no-calls at sites where RGQ=0 in acknowledgment of the fact that those sites are essentially uncallable.

We’ve got good news for you if you’re the type who worries about disk space (whether by temperament or by necessity): we finally have CRAM support -- and some recommendations for keeping the output of BQSR down to reasonable file sizes, detailed below.

Finally, be sure to check out the detailed release notes for the usual variety show of minor features (including a new Queue job runner that enables local parallelism), bug fixes and deprecation notices (a few tools have been removed from the codebase, in the spirit of slimming down ahead of the holiday season).

### Introducing MuTect2 (beta): calling somatic SNPs and Indels natively in GATK

MuTect2 is the next-generation somatic SNP and indel caller that combines the DREAM challenge-winning somatic genotyping engine of the original MuTect with the assembly-based machinery of HaplotypeCaller.

The original MuTect (Cibulskis et al., 2013) was built on top of the GATK engine by the Cancer Genome Analysis group at the Broad Institute, and was distributed as a separate package. By all accounts it did a great job calling somatic SNPs, and was part of the winning entries for multiple DREAM challenges (including some submitted by groups outside the Broad). However it was not able to call indels; and the less said about the indel caller that accompanied it (first named SomaticIndelDetector then Indelocator) the better.

This new incarnation of MuTect leverages much of the HaplotypeCaller’s internal machinery (including the all-important graph assembly bit) to call both SNPs and indels together. Yet it retains key parts of the original MuTect’s internal genotyping engine that allow it to model somatic variation appropriately. This is a major differentiation point compared to HaplotypeCaller, which has expectations about ploidy and allele frequencies that make it unsuitable for calling somatic variants.

As a convenience add-on to MuTect2, we also integrated the cross-sample contamination estimation tool ContEst into GATK 3.5. Note that while the previous public version of this tool relied on genotyping chip data for its operation, this version of the tool has been upgraded to enable on-the-fly genotyping for the case where genotyping data is not available. Documentation of this feature will be provided in the near future. Both MuTect2 and ContEst are now featured in the Tool Documentation section of the Guide. Stay tuned for pipeline-level documentation on performing somatic variant discovery, to be added to the Best Practices docs in the near future.

Please note that this release of MuTect2 is a beta version intended for research purposes only and should not be applied in production/clinical work. MuTect2 has not yet undergone the same degree of scrutiny and validation as the original MuTect since it is so new. Early validation results suggest that MuTect2 has a tendency to generate more false positives as compared to the original MuTect; for example, it seems to overcall somatic mutations at low allele frequencies, so for now we recommend applying post-processing filters, e.g. by hard-filtering calls with low minor allele frequencies. Rest assured that data is being generated and the tools are being improved as we speak. We’re also looking forward to feedback from you, the user community, to help us make it better faster.

Finally, note also that MuTect2 is distributed under the same restricted license as the original MuTect; for-profit users are required to seek a license to use it (please email softwarelicensing@broadinstitute.org). To be clear, while MuTect2 is released as part of GATK, the commercial licensing has not been consolidated under a single license. Therefore, current holders of a GATK license will still need to contact our licensing office if they wish to use MuTect2.

### Annotate this: new and improved variant context annotations

Whew that was a long wall of text on MuTect2, wasn’t it. Let’s talk about something else now. Annotations! Not functional annotations, mind you -- we’re not talking about e.g. predicting synonymous vs. non-synonymous mutations here. I mean variant context annotations, i.e. all those statistics calculated during the variant calling process which we mostly use to estimate how confident we are that the variants are real vs. artifacts (for filtering and related purposes).

So we have two new annotations, BaseCountsBySample (what it says on the can) and ExcessHet (for excess heterozygosity, i.e. the number of heterozygote calls made in excess of the Hardy-Weinberg expectations), as well as a set of new annotations that are allele-specific versions of existing annotations (with AS_ prefix standing for Allele-Specific) which you can browse here. Right now we’re simply experimenting with these allele-specific annotations to determine what would be the best way to make use of them to improve variant filtering. In the meantime, feel free to play around with them (via e.g. VariantsToTable) and let us know if you come up with any interesting observations. Crowdsourcing is all the rage, let’s see if it gets us anywhere on this one!

We also made some improvements to the StrandAlleleCountsBySample annotation, to how VQSR handles MQ, and to how VariantAnnotator makes use of external resources -- and we fixed that annoying bug where default annotations were getting dropped. All of which you can read about in the detailed release notes.

### These Three Awesome File Hacks Will Restore Your Faith In Humanity’s Ability To Free Up Some Disk Space

CRAM support! Long-awaited by many, lovingly implemented by Vadim Zalunin at EBI and colleagues at the Sanger Institute. We haven’t done extensive testing, and there are a few tickets for improvements that are planned at the htsjdk level -- but it works well enough that we’re comfortable releasing it under a beta designation. Meaning have fun with it, but do your own thorough testing before putting it into production or throwing out your old BAMs!

Static binning of base quality scores. In a nutshell, binning (or quantizing) the base qualities in a BAM file means that instead of recording all possible quality values separately, we group them into bins represented by a single value (by default, 10, 20, 30 or 40). By doing this we end up having to record fewer separate numbers, which through the magic of BAM compression yields substantially smaller files. The idea is that we don’t actually need to be able to differentiate between quality scores at a very high resolution -- if the binning scheme is set up appropriately, it doesn’t make any difference to the variant discovery process downstream. This is not a new concept, but now the GATK engine has an argument to enable binning quality scores during the base recalibration (BQSR) process using a static binning scheme that we have determined produces optimal results in our hands. The level of compression is of course adjustable if you’d like to set your own tradeoff between compression and base quality resolution. We have validated that this type of binning (with our chosen default parameters) does not have any noticeable adverse effect on germline variant discovery. However we are still looking into some possible effects on somatic variant discovery, so we can’t yet recommend binning for that application.

Disable indel quality scores. The Base Recalibration process produces indel quality scores in addition to the regular base qualities. They are stored in the BI and BD tags of the read records, taking up a substantial amount of space in the resulting BAM files. There has been a lot of discussion about whether these indel quals are worth the file size inflation. Well, we’ve done a lot of testing and we’ve now decided that no, for most use cases the indel quals don’t make enough of a difference to justify the extra file size. The one exception to this is when processing PacBio data, it seems that indel quals may help model the indel-related errors of that technology. But for the rest, we’re now comfortable recommending the use of the --disable_indel_quals argument when writing out the recalibrated BAM file with PrintReads.

Created 2015-11-25 07:10:45 | Updated 2016-02-17 06:37:17 | Tags: Promote haplotypecaller release-notes mutect gatk3 mutect2

GATK 3.5 was released on November 25, 2015. Itemized changes are listed below. For more details, see the user-friendly version highlights.

### New tools

• MuTect2: somatic SNP and indel caller based on HaplotypeCaller and the original MuTect.
• ContEst: estimation of cross-sample contamination (primarily for use in somatic variant discovery).
• GatherBqsrReports: utility to gather recalibration tables from scatter-parallelized BaseRecalibrator runs.

### Variant Context Annotations

• Added allele-specific version of existing annotations: AS_BaseQualityRankSumTest, AS_FisherStrand, AS_MappingQualityRankSumTest, AS_RMSMappingQuality, AS_RankSumTest, AS_ReadPosRankSumTest, AS_StrandOddsRatio, AS_QualByDepth and AS_InbreedingCoeff.

• Added BaseCountsBySample annotation. Intended to provide insight into the pileup of bases used by HaplotypeCaller in the calling process, which may differ from the pileup observed in the original bam file because of the local realignment and additional filtering performed internally by HaplotypeCaller. Can only be requested from HaplotypeCaller, not VariantAnnotator.

• Added ExcessHet annotation. Estimates excess heterozygosity in a population of samples. Related to but distinct from InbreedingCoeff, which estimates evidence for inbreeding in a population. ExcessHet scales more reliably to large cohort sizes.

• Added FractionInformativeReads annotation. Reports the number of reads that were considered informative by HaplotypeCaller (over all samples).

• Enforced calculating GenotypeAnnotations before InfoFieldAnnotations. This ensures that the AD value is available to use in the QD calculation.

• Reorganized standard annotation groups processing to ensure that all default annotations always get annotated regardless of what is specified on the command line. This fixes a bug where default annotations were getting dropped when the command line included annotation requests.

• Made GenotypeGVCFs subset StrandAlleleCounts intelligently, i.e. subset the SAC values to the called alleles. Previously, when the StrandAlleleCountsBySample (SAC) annotation was present in GVCFs, GenotypeGVCFs carried it over to the final VCF essentially unchanged. This was problematic because SAC includes the counts for all alleles originally present (including NON-REF) even when some are not called in the final VCF. When the full list of original alleles is no longer available, parsing SAC could become difficult if not impossible.

• Added new MQ jittering functionality to improve how VQSR handles MQ. Note that HaplotypeCaller now calculates a new annotation called RAW_MQ per-sample, which is then integrated per-cohort by GenotypeGVCFs to produce the MQ annotation.

• VariantAnnotator can now annotate FILTER field from an external resource. Usage: --resource:foo resource.vcf --expression foo.FILTER

• VariantAnnotator can now check allele concordance when annotating with an external resource. Usage: --resourceAlleleConcordance

• Bug fix: The annotation framework was improved to allow for the collection of sufficient statistics during GVCF creation which are then used to compute the final annotation during the genotyping. This avoids the use of median as the representative annotation from the collection of values (one from each sample). TL;DR annotations will be more accurate when using the GVCF workflow for joint discovery.

### Variant manipulation tools

• Allowed overriding hard-coded cutoff for allele length in ValidateVariants and in LeftAlignAndTrimVariants. Usage: --reference_window_stop N where N is the desired cutoff.

• Also in LeftAlignAndTrimVariants, trimming multiallelic alleles is now the default behavior.

• Fixed ability to mask out snps with --snpmask in FastaAlternateReferenceMaker.

• Also in FastaAlternateReferenceMaker, fixed merging of contiguous intervals properly, and made the tool produce more informative contig names.

• Fixed a bug in CombineVariants that occurred when one record has a spanning deletion and needs a padded reference allele.

• Added a new VariantEval evaluation module, MetricsCollection, that summarizes metrics from several EV modules.

• Enabled family-level stratification in MendelianViolationEvaluator of VariantEval (if a ped file is provided), making it possible to count Mendelian violations for each family in a callset with multiple families.

• Added the ability to SelectVariants to enforce 4.2 version output of the VCF spec when processing older files. Use case: the 4.2 spec specifies that GQ must be an integer; by default we don’t enforce it (so if reading an older file that used decimals, we don’t change it) but the new argument --forceValidOutput converts the values on request. Not made default because of some performance slowdown -- so writing VCFs is now fast by default, compliant by choice.

• Improved VCF sequence dictionary validation. Note that as a side effect of the additional checks, some users have experienced an error that starts with "ERROR MESSAGE: Lexicographically sorted human genome sequence detected in variant." that is due to unintentional activation of a check that is not necessary. This will be fixed in the next release; in the meantime -U ALLOW_SEQ_DICT_INCOMPATIBILITY can be used (with caution) to override the check.

### GVCF tools

• Various improvements to the tools’ performance, especially HaplotypeCaller, by making the code more efficient and cutting out crud.

• GenotypeGVCFs now emits a no-call (./.) when the evidence is too ambiguous to make a call at all (e.g. all the PLs are zero). Previously this would have led to a hom-ref call with RGQ=0.

• Fixed a bug in GenotypeGVCFs that sometimes generated invalid VCFs for haploid callsets. The tool was carrying over the AD from alleles that had been trimmed out, causing field length mismatches.

• Changed the genotyping implementation for haploid organisms to address performance problems reported when running GenotypeGVCFs on haploid callsets. Note that this change may lead to a slight loss of sensitivity at low-coverage sites -- let us know if you observe anything dramatic.

### Genotyping engine tweaks

• Ensured inputPriors get used if they are specified to the genotyper (previously they were ignored). Also improved docs on --heterozygosity and --indel_ heterozygosity priors.

• Fixed bug that affected the --ignoreInputSamples behavior of CalculateGenotypePosteriors.

• Limited emission of the scary warning message about max number of alleles (“this tool is set to genotype at most x alleles but we found more; only x will be used”) to a single occurrence unless DEBUG logging mode is activated. Otherwise it fills up our output logs.

### Miscellaneous tool fixes

• Added option to OverclippedReadFilter to not require soft-clips on both ends. Contributed by Jacob Silterra.

• Fixed a bug in IndelRealigner where the tool was incorrectly "fixing" mates when supplementary alignments are present. The patch involves ignoring supplementary alignments.

• Fixed a bug in CatVariants. Previously, VCF files were being sorted solely on the base pair position of the first record, ignoring the chromosome. This can become problematic when merging files from different chromosomes, especially if you have multiple VCFs per chromosome. Contributed by John Wallace.

### Engine-level behaviors and capabilities

• Support for reading and writing CRAM files. Some improvements are still expected in htsjdk. Contributed by Vadim Zalunin at EBI and collaborators at the Sanger Institute.

• Made interval-list output format dependent on the file extension (for RealignerTargetCreator). If the extension is .interval_list, output will be formatted as a proper Picard interval list (with sequence dictionary). Otherwise it will be a basic GATK interval list as previously.

• Adding static binning capability for base recalibration (BQSR).

### Queue

• Added a new JobRunner called ParallelShell that will run jobs locally on one node concurrently as specified by the DAG, with the option to limit the maximum number of concurrently running jobs using the flag maximumNumberOfJobsToRunConcurrently. Contributed by Johan Dahlberg.

• Updated extension for Picard CalculateHsMetrics to include PER_TARGET_COVERAGE argument and added extension for Picard CollectWgsMetrics.

### Deprecation notice

Removed:

• BeagleOutputToVCF, VariantsToBeagleUnphased, ProduceBeagleInput. These are tools for handling Beagle data. The latest versions of Beagle support VCF input and output, so there is no longer any reason for us to provide converters.
• ReadAdaptorTrimmer and VariantValidationAssessor. These were experimental tools which we think are not useful and not operating on a sufficiently sound basis.
• BaseCoverageDistribution and CoveredByNSamplesSites. These tools were redundant with DiagnoseTargets and/or DepthOfCoverage.
• LiftOverVariants, FilterLiftedVariants and liftOverVCF.pl. The Picard liftover tool LiftoverVCF works better and is easier to operate.
• sortByRef.pl. Use Picard SortVCF instead.
• ListAnnotations. This was intended as a utility for listing annotations easily from command line, but it has not proved useful.

### Meta

• Moved htsjdk & picard to version 1.141

Created 2016-04-22 11:36:57 | Updated | Tags: mutect

Dear All, we actually use GATK3.5 in combination with MuTect v2 to identify somatic variants. In the output, both FoxoG and t_LOD scores are listed. However, I see a predominance of C>A mutations (appr. 70%) and wonder, whether the variants are already filtered for oxidation induced changes as described by Costello et al. (PMID 23303777). If I do calculations for the C>A variants I get approximately half t_LOD values above the threshold (-10+(100/3)*FoxoG) - i.e. the "good ones" - and another half below (the putative artifacts). So, I suppose that filtering has not been done - is that right? Thanks for your help - Stefan

Created 2016-03-13 02:54:27 | Updated | Tags: mutect

HI， I've been calling somatic mutations with Mutect2 on WES data, and have noted that the time Mutect2 takes were too long.Like below(sorry for I can not upload pictures) INFO 15:10:32,095 ProgressMeter - | processed | time | per 1M | | total | remaining 21 INFO 15:10:32,096 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime 24139 INFO 10:08:19,413 ProgressMeter - chr21:47664676 2.60987991203E11 16.8 d 5.0 s 94.0% 17.9 d 25.9 h 24140 INFO 10:09:19,433 ProgressMeter - chr21:47693250 2.60995921308E11 16.8 d 5.0 s 94.0% 17.9 d 25.9 h 24141 INFO 10:10:19,455 ProgressMeter - chr21:47700345 2.60998197418E11 16.8 d 5.0 s 94.0% 17.9 d 25.9 h 24142 INFO 10:11:19,475 ProgressMeter - chr21:47703946 2.60998767018E11 16.8 d 5.0 s 94.0% 17.9 d 25.9 h 24143 INFO 10:12:19,495 ProgressMeter - chr21:47704320 2.60998767018E11 16.8 d 5.0 s 94.0% 17.9 d 25.9 h Is this normal ? Am I missing something/misinterpreting?

my Command: -T MuTect2 -R reference.fa -I:normal normal.bam -I:tumor tumor.bam --dbsnp dbsnp.vcf.gz --cosmic cosmic_v74.vcf.gz -L target.intervals -o result.vcf

Any comments and help greatly appreciated!

Thanks,

Lugeye

Created 2016-01-30 14:07:10 | Updated | Tags: mutect mutect2 germline

Hello all, Being relatively new to NGS analysis techniques I recently build a pipeline for calling somatic variants in matched Tumor/Normal cancer Exome data.

I recently decided to analyze the "germline risk variants" as well. Currently I'm calling germline variants separately using (HC) and then hard-filtering using the VariantFilter tool(since I only have a few examples at the moment) and annotating the variants with Oncotator.

However, I realized that Mutect2-Oncotator output has a column titled "germline_risk", would this serve the same purpose? If so, would this mean the separate germline variant calling is useless?

Thanks beforehand for the answers, Best, -E

Created 2016-01-18 08:20:49 | Updated | Tags: mutect short-read-preprocessing

Hi, I am using MuTect on paired tumor/normal exome data. MuTect pre_process low quality reads before somatic SNV discovery. How can I get the processed bam files? I know there is source code for MuTect at github, but I am not sure which java files are used in preprocessing steps. Can you please tell me the program files for low-quality reads preprocessing? Thanks very much for your time!

Best Jing

Created 2016-01-05 20:37:22 | Updated | Tags: mutect genotype vcf-file

Hello,

I have noticed that when running MuTect, the variant calls in my output always have a genotype of 0/1. I have not seen any 1/1 genotypes, even when the variant allele frequency is 100%.

Are there any instances when MuTect gives a 1/1 genotype for a variant call?

Thank you, Jeremy

Created 2015-12-11 09:52:54 | Updated | Tags: mutect

hello, I would like to replace MuTect with MuTect2 in my analyses pipelines but I need the information available in the extended text output from MuTect. Is there an option with MuTect2 to output the same infos?

thanks

Created 2015-11-25 00:07:21 | Updated | Tags: mutect

Mutect output only single point mutations, What about t_ins_count and t_del_count,Can I use them to identify indels in a sample??

t_ins_count: count of insertion events at this locus in tumor t_del_count: count of deletion events at this locus in tumor

Created 2015-11-20 15:45:01 | Updated | Tags: mutect somatic-variants tumor-only

Hi,

I recently went to the workshop for variant calling and mentioned that I would like to perform somatic variant calling with Mutect using only tumor samples (no matched normal sample). I was told that there is a pipeline under development that is not yet fully tested that you would be able to provide. Would you be able to provide this along with any other recommendations?

Thank you!

Created 2015-11-12 22:35:00 | Updated | Tags: mutect perl

--best Jing

Created 2015-10-13 07:30:35 | Updated | Tags: mutect

Dear mutect developers, When I using the following command to do calling snp \indel, $java -Xmx1g -jar$mutect_bin/mutect-1.1.7.jar \ --analysis_type MuTect \ --reference_sequence $GATK_ref/2.8/ucsc.hg19.fasta \ --cosmic$GATK_ref/2.8/hg19_cosmic_v54_120711.vcf \ --dbsnp $GATK_ref/2.8/dbsnp_132_b37.leftAligned.vcf \ --intervals 1:1-249250621 \ --input_file:normal$normal/chr1.merged.uniqPairs.sort.dupMarked.addGR.order.realn.bam \ --input_file:tumor primary/chr1.merged.uniqPairs.sort.dupMarked.addGR.order.realn.bam \ --out chr1_n6Vp5_call_stats.txt \ --coverage_file chr1_n6Vp5_coverage.wig.txt # *** I got problem: ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR stack trace java.lang.ExceptionInInitializerError at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.(GenomeAnalysisEngine.java:167) at org.broadinstitute.sting.gatk.CommandLineExecutable.(CommandLineExecutable.java:57) at org.broadinstitute.sting.gatk.CommandLineGATK.(CommandLineGATK.java:66) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:106) Caused by: java.lang.NullPointerException at org.reflections.Reflections.scan(Reflections.java:220) at org.reflections.Reflections.scan(Reflections.java:166) at org.reflections.Reflections.(Reflections.java:94) at org.broadinstitute.sting.utils.classloader.PluginManager.(PluginManager.java:79) ... 4 more ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.1-0-g72492bb): ##### ERROR ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem. ##### ERROR If not, please post the error message, with stack trace, to the GATK forum. ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk ##### ERROR ##### ERROR MESSAGE: Code exception (see stack trace for error itself) ##### ERROR ------------------------------------------------------------------------------------------ I'm confused by that ,could you tell me how to conquer it,thank you! Xu,ZhengZheng Beijing Institute of Genomics,Chinese Academy of Sciences Created 2015-10-08 09:18:10 | Updated | Tags: mutect Hello, I read that there is a new version of MuTect dealing with indels coming. Would it be possible to know its release date? We are planning analyses for a new project and would like to know if we can count on MuTect2 or not. thanks Created 2015-09-16 09:11:43 | Updated 2015-09-16 09:13:37 | Tags: mutect gatk-protected I'm trying to install mutect, and as directed in the README.md, I've git cloned gatk-protected and tried to do 'mvn -Ddisable.queue install'. But I get the following issue. I've java 1.7 and maven 3.3.3. [INFO] ------------------------------------------------------------- [WARNING] COMPILATION WARNING : [INFO] ------------------------------------------------------------- [WARNING] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/utils/threading/ThreadEfficiencyMonitor.java: Some input files use or override a deprecated API. [WARNING] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/utils/threading/ThreadEfficiencyMonitor.java: Recompile with -Xlint:deprecation for details. [WARNING] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/datasources/reads/SAMDataSource.java: Some input files use unchecked or unsafe operations. [WARNING] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/datasources/reads/SAMDataSource.java: Recompile with -Xlint:unchecked for details. [WARNING] Some messages have been simplified; recompile with -Xdiags:verbose to get full output [INFO] 5 warnings [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------- [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/walkers/annotator/interfaces/AnnotationInterfaceManager.java:[129,24] no suitable method found for add(java.lang.Object) method java.util.Collection.add(T) is not applicable (argument mismatch; java.lang.Object cannot be converted to T) method java.util.List.add(T) is not applicable (argument mismatch; java.lang.Object cannot be converted to T) [INFO] 1 error [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Sting Root ......................................... SUCCESS [ 0.455 s] [INFO] Sting Aggregator ................................... SUCCESS [ 0.185 s] [INFO] Sting GSALib ....................................... SUCCESS [ 0.447 s] [INFO] Sting Utils ........................................ SUCCESS [ 0.698 s] [INFO] GATK Framework ..................................... FAILURE [ 4.181 s] [INFO] GATK Protected ..................................... SKIPPED [INFO] GATK Package ....................................... SKIPPED [INFO] Sting Public ....................................... SKIPPED [INFO] Sting Protected .................................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 6.134 s [INFO] Finished at: 2015-09-16T14:27:14+05:30 [INFO] Final Memory: 44M/1583M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (compile-java) on project gatk-framework: Compilation failure [ERROR] /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/walkers/annotator/interfaces/AnnotationInterfaceManager.java:[129,24] no suitable method found for add(java.lang.Object) [ERROR] method java.util.Collection.add(T) is not applicable [ERROR] (argument mismatch; java.lang.Object cannot be converted to T) [ERROR] method java.util.List.add(T) is not applicable [ERROR] (argument mismatch; java.lang.Object cannot be converted to T) [ERROR] -> [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (compile-java) on project gatk-framework: Compilation failure /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/walkers/annotator/interfaces/AnnotationInterfaceManager.java:[129,24] no suitable method found for add(java.lang.Object) method java.util.Collection.add(T) is not applicable (argument mismatch; java.lang.Object cannot be converted to T) method java.util.List.add(T) is not applicable (argument mismatch; java.lang.Object cannot be converted to T) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286) at org.apache.maven.cli.MavenCli.main(MavenCli.java:197) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) Caused by: org.apache.maven.plugin.compiler.CompilationFailureException: Compilation failure /home/krb/Ramani/MUTECT/gatk-protected/public/gatk-framework/src/main/java/org/broadinstitute/sting/gatk/walkers/annotator/interfaces/AnnotationInterfaceManager.java:[129,24] no suitable method found for add(java.lang.Object) method java.util.Collection.add(T) is not applicable (argument mismatch; java.lang.Object cannot be converted to T) method java.util.List.add(T) is not applicable (argument mismatch; java.lang.Object cannot be converted to T) at org.apache.maven.plugin.compiler.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:858) at org.apache.maven.plugin.compiler.CompilerMojo.execute(CompilerMojo.java:129) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) ... 20 more [ERROR] [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :gatk-framework I'm not able to understand how to resolve the issue. Could anybody please help me with it? Created 2015-09-11 15:35:09 | Updated 2015-09-11 15:35:29 | Tags: intervals mutect b37 I hate to put this same error on the GATK forum again, but I went through many of these errors already posted on the forum, but none of the answers shed light on my issue. I have my bam files aligned to GRCh37-lite and am using the same reference genome downloaded from ftp://ftp.ncbi.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37/special_requests I have next performed GATK best practices for pre-processing of these bams using the same ref genome without throwing any error in the process. Currently I'm running MuTect as java -Xmx56g -jar muTect-1.1.4.jar --analysis_type MuTect --reference_sequence ./resources/b37/human_g1k_v37.fasta --cosmic ./resources/Cosmic.b37.vcf --dbsnp ./resources/dbsnp_138.b37.vcf --intervals ./resources/mirna.1.5flank-interval-list.list --input_file:normalnormal.recal_reads.bam --input_file:tumor $tumor.recal_reads.bam --out$sample.call_stats.out --coverage_file $sample.coverage.wig.txt And getting this error message: ##### ERROR MESSAGE: Badly formed genome loc: Contig 'chr1' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file? What more tests should I run to troubleshoot this issue? Also, the interval list is what I created from a .bed file. I have restricted my bam files to a limited bed regions using the same file in a command "samtools view -@8 -b -h -L" ## This was the file I was most confused about. Is it possible that this file is causing the error? First few lines of this file are: chr1:15869-18936 chr1:28866-32003 chr1:566205-569293 chr1:1100984-1104078 chr1:1101743-1104832 chr1:1102885-1105967 chr1:1229990-1233050 chr1:1246382-1249446 chr1:1273530-1276588 chr1:3043039-3046099 chr1:3475759-3478854 chr1:5622631-5625703 chr1:5921232-5924301 chr1:6488394-6491456 chr1:8925061-8928149 chr1:9210227-9213336 chr1:10025939-10029016 chr1:10286276-10289361 chr1:12087715-12090779 ## - - Thanks a ton for your help! Created 2015-06-05 19:54:57 | Updated | Tags: mutect Hi Everyone, Mutect seems to only report somatic point mutations. Is there a way to get germline mutations as well? Thank you very much. Created 2015-06-02 01:06:51 | Updated | Tags: mutect Hi Do you know how does MuTect manages bam files with mixed paired and unpaired reads? I ask since I notice that ver 1.7 does not report "total-pairs" anymore (now it reports "total-reads"). Created 2015-05-15 01:23:27 | Updated | Tags: mutect I'm trying to build SomaticSpike, which is included in MuTect 1.1.4. Is there any way to actually build this from the github repository? Is there perhaps a binary floating around? Created 2015-04-30 18:56:39 | Updated | Tags: mutect Hello, I am trying to run mutect on a mitochondrial genome, however the mitochondrial genome ploidy is variable, so I was hoping to run it as a haploid, is that possible in mutect? Any suggestions? Thanks, Ramiro Created 2014-12-24 00:00:35 | Updated | Tags: mutect I provided Mutect with an interval file (see below) which seems to be in a format (@SQ headers, followed by lines of chromosome number coordinates +/- cds, etc), and get the following error message: ERROR MESSAGE: File associated with name target_intervals.reduced.withhead.interval_list.bed is malformed: Problem reading the interval file caused by ##### ERROR Line: @SQ SN:1 LN:249250621 The interval file has the form: @SQ SN:chr1 LN:249250621 @SQ SN:chr2 LN:243199373 @SQ SN:chr3 LN:198022430 @SQ SN:chr4 LN:191154276 @SQ SN:chr5 LN:180915260 @SQ SN:chr6 LN:171115067 @SQ SN:chr7 LN:159138663 @SQ SN:chr8 LN:146364022 @SQ SN:chr9 LN:141213431 @SQ SN:chr10 LN:135534747 @SQ SN:chr11 LN:135006516 @SQ SN:chr12 LN:133851895 @SQ SN:chr13 LN:115169878 @SQ SN:chr14 LN:107349540 @SQ SN:chr15 LN:102531392 @SQ SN:chr16 LN:90354753 @SQ SN:chr17 LN:81195210 @SQ SN:chr18 LN:78077248 @SQ SN:chr19 LN:59128983 @SQ SN:chr20 LN:63025520 @SQ SN:chr21 LN:48129895 @SQ SN:chr22 LN:51304566 @SQ SN:chrX LN:155270560 @SQ SN:chrY LN:59373566 chr1 66999814 67000061 + NM_032291_exon_0_10_chr1_66999825_f etc.. Are there some header lines other than @SQ ... etc that are missing, since the error message references the first line of the file? Created 2014-12-18 19:57:32 | Updated 2014-12-18 19:58:07 | Tags: mutect variant-calling Hi, We are working on a cancer genome project and we use Mutect to call somatic mutations. We have a question related to the high-coverage depth filtering. Below is a list of nearby positions output by Mutect where the coverage in the tumor sample is very high but the coverage in the normal sample is not. Partial output from *call_stats.out file - contig position t_q20_count n_q20_count failure_reasons judgement 2 89849000 622 108 KEEP 2 89850117 508 152 KEEP 2 89850498 713 732 KEEP 2 89850649 583 403 KEEP 2 89850993 849 142 KEEP 2 89872002 540 286 KEEP 2 89877959 607 259 KEEP These positions are not filtered out by Mutect. I wonder why is a high-depth filter not used? Would it make sense to filter these variants and which high-depth threshold should we choose? Regards, Abhimanyu Krishna Created 2014-12-18 00:03:25 | Updated | Tags: mutect I have been attempting to run Mutect on tumor/blood .bam files, and encounter the following error message when using Homo_sapiens.GRCh37.72.dna.fa fasta as a reference file and b37_cosmic_v54_120711.vcf as "cosmic" reference: ERROR MESSAGE: Input files /b37_cosmic_v54_120711.vcf and reference have incompatible contigs: No overlapping contigs found. ##### ERROR /b37_cosmic_v54_120711.vcf contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24] ##### ERROR reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY] Basically, it comes down to vs. chr in the cosmic vs. fasta file. Is there a simple way to work around this? Created 2014-12-05 18:49:40 | Updated | Tags: mutect Hello. I am getting a java.lang.ArrayIndexOutOfBoundsException: 83 error when running MuTect. I got this error on v1.1.4 and built the latest version this morning (v1.1.7) but the error is still there. None of the posts about this error apply to me. It appears to be the same position always in the BAM file but I cannot determine what is wrong or how to get past it. The analysis looks like it gets through chromosomes 1, 2, and most of 3 before the crash. I have tried running using commands to ignore the error but that doesn't help. If you could point me in a direction to solve this I would be very greatful. The stack trace is pasted below. Error processing chr3:195505766 java.lang.ArrayIndexOutOfBoundsException: 83 at org.broadinstitute.cga.tools.gatk.utils.CGAAlignmentUtils.mismatchesInRefWindow(CGAAlignmentUtils.java:135) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.filterReads(MuTect.java:787) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:511) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:79) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107) ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR stack trace java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 83 at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:649) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:79) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107) Caused by: java.lang.ArrayIndexOutOfBoundsException: 83 at org.broadinstitute.cga.tools.gatk.utils.CGAAlignmentUtils.mismatchesInRefWindow(CGAAlignmentUtils.java:135) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.filterReads(MuTect.java:787) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:511) ... 14 more ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.1-0-g72492bb): ##### ERROR ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem. ##### ERROR If not, please post the error message, with stack trace, to the GATK forum. ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk ##### ERROR ##### ERROR MESSAGE: java.lang.ArrayIndexOutOfBoundsException: 83 ##### ERROR ------------------------------------------------------------------------------------------ [pkMyt1@CLASPIN-1 GATK]$

Created 2014-11-07 15:49:41 | Updated 2014-11-07 15:51:34 | Tags: mutect panel-of-normals

Hi all,

In the 2013 Nature paper, a fixed threshold of 6.3 (LOD) is chosen for all results, which corresponds to a mutation frequency of 1-10 per Mb. Table 2 clearly shows the variety in mutation rates per cancer.

Based on this information, I have two questions.

• Do I need to create a separate Panel of Normals per cancer type?

• How do I correctly change the ‘--tumor_lod’ value to correspond to the mutation rate corresponding to the cancer type i use ?

Created 2014-10-22 18:55:55 | Updated 2014-10-22 19:11:09 | Tags: mutect

Hi, when I looked at the call_stats output from MuTect, I am wondering does MuTect only count the reads with base phred quality > 5 in the t_ref_count, t_alt_count, n_ref_count, n_alt_count fields? (I got this conclusion by looking at the base phred quality of each read in IGV). If so, is there any way to change the threshold for the base phred quality? Thanks!

Created 2014-08-29 16:42:05 | Updated | Tags: mutect

Hello, Would you please let us know why below sample field has been rejected rather than keep:

contig position ref_allele alt_allele score dbsnp_site covered power tumor_power normal_power total_pairs improper_pairs map_Q0_reads t_lod_fstar tumor_f contaminant_fraction contaminant_lod t_ref_count t_alt_count t_ref_sum t_alt_sum t_ins_count t_del_count normal_best_gt init_n_lod n_ref_count n_alt_count n_ref_sum n_alt_sum judgement chr5 112164616 C T 0 NOVEL COVERED 1 1 1 1905 168 0 670.750479 0.294294 0.02 32.630413 470 196 18315 7525 6 0 CC 218.268391 738 1 28645 37 REJECT

We appreciate all your help. Thank you

Created 2014-08-22 09:14:29 | Updated | Tags: mutect cosmic tcga pancan

Hi,

I am wondering about the possibility to use TCGA variants instead of cosmic in MuTect. Given that it's mostly used to upweight positions that are in dbSNP AND cosmic I think this would be a smart move. Cosmic seems like a mess, with small single-gene experiments mixed with larger experiments. The TCGA pancan MAF seems way mor structured. My guess is that most dbSNP somatic variants are so common that they will be in TCGA pancan as well.

Has anyone tested this or have experience with it?

(my basic plan is to convert TCGA pancan MAF to VCF and use in MuTect).

cheers

Created 2014-07-23 12:33:37 | Updated | Tags: mutect strand-bias

Hi, Mutect does have filter for strand bias, but does not give strand imformation(like DP4 or other ) in its output .call file or .vcf file. But sometimes I wanna check the strand distribution of the SNV called and have further filtering, I wonder how could I get such imformation?

Thanks! Hartblue

Created 2014-05-23 14:22:50 | Updated | Tags: mutect variant-calling radseq

Hi, I would like to analyze a dataset consisting of RADseq (Restriction-site Associated DNA) tags from tumor and normal samples. By nature of the technique, all of the reads start at restriction enzyme cut sites in the genome - therefore the assumptions that mutations will be covered by reads from both directions and staggered with respect to position in the read are violated. Is there a way to override the strand bias and clustered position filters in the MuTect pipeline?

Created 2014-02-18 19:53:06 | Updated | Tags: mutect

Hi,

I'm wondering if it is possible to use MuTect with bam files previously aligned with human_g1k_v37? In the documentation section, it is mention that we can use hg19, but what about g1K_V37?

Created 2014-02-14 10:39:55 | Updated | Tags: install mutect github git bcel

Hi

I was trying to install the github version of mutect and I have some questions as well as a hope that people who had similar problems might get help from my endeavours.

I followed the instructions posted on the github page, however when I tried to build:

# build ant -Dexternal.dir='pwd'/../mutect-src -Dexecutable=mutect package

It told me I didnt have the correct bcel files in my ~/.ant/lib/:

The bcel jar can be found in the lib directory of a GATK clone after compiling, and the ant-apache-bcel jar can be downloaded from here: http://repo1.maven.org/maven2/ant/ant-apache-bcel/1.6.5/ant-apache-bcel-1.6.5.jar Please copy these two jar files to ~/.ant/lib/

I had already downloaded the ant-apache-bcel and put it there so I figured it must be the GATK clone lib. I compiled with ant dist clean but it failed and the created "lib" folder was empty. However it did create a "dist" folder and in there i found bcel-5.2.jar. I popped this in ~/.ant/lib/ and now mutect seems to build correctly using:

# build ant -Dexternal.dir='pwd'/../mutect-src -Dexecutable=mutect package

So to my questions.

1. Is this an OK way to build it? (Can I trust the program despite unorthodox installation procedure).

2. Howcome the mutect install instructions dont specifically mention where to find the apache bcel library (I would not have found it without the error message) and guides you to compile the gatk-protected to get the second jar file that you need? Also where to put them!?

Created 2014-01-23 20:10:26 | Updated 2014-01-23 20:24:45 | Tags: mutect error runtime-error

Hello MuTect Team,

I encountered a error when running MuTect on our server for our tumor and normal pair data. If anyone can help me about his, it will be greatly appreciated.

I black out the file path by " ** " for security reason.

The tumor and normal BAM files are aligned against ucsc.hg19.fasta and all the references are using hg19 from GATK 2.8 resource bundle.

• dbSNP: dbsnp_137.hg19.vcf
• reference: ucsc.hg19.fasta
• COSMIC:

The cosmic file is generated by myself by using the following command:

perl **/GenomeAnalysisTK-2.8-1/liftOverVCF.pl -vcf 2.8/b37_cosmic_v54_120711.vcf -chain b37tohg19.chain -out hg19_cosmic_v54.vcf -newRef ucsc.hg19 -oldRef 2.8/human_g1k_v37 -gatk **/GenomeAnalysisTK-2.8-1/
• The liftOverVCF.pl and b37tohg19.chain are from GATK github site: https://github.com/broadgsa/gatk/tree/master/public

The command I run the MuTect is:

java -jar -Xmx16g **/muTect-1.1.4/muTect-1.1.4.jar --analysis_type MuTect --reference_sequence **/ucsc.hg19.fasta --cosmic **/hg19_cosmic_v54.vcf --dbsnp **/dbsnp_137.hg19.vcf --input_file:tumor **/ReduceReads_P1T.bam --input_file:normal **/ReduceReads_P1N.bam --out **/MuTect_P1.out --coverage_file MT_coverage_P1.txt

The log message is as follows:

INFO 13:04:40,315 HelpFormatter - --------------------------------------------------------------------------------- INFO 13:04:40,317 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.2-25-g2a68eab, Compiled 2012/11/08 10:30:02 INFO 13:04:40,317 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 13:04:40,317 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 13:04:40,321 HelpFormatter - Program Args: --analysis_type MuTect --reference_sequence /ucsc.hg19.fasta --cosmic /hg19_cosmic_v54.vcf --dbsnp /dbsnp_137.hg19.vcf --input_file:tumor /ReduceReads_P1T.bam --input_file:normal /ReduceReads_P1N.bam --out /MuTect_P1.out --coverage_file MT_coverage_P1.txt INFO 13:04:40,321 HelpFormatter - Date/Time: 2014/01/14 13:04:40 INFO 13:04:40,321 HelpFormatter - --------------------------------------------------------------------------------- INFO 13:04:40,321 HelpFormatter - --------------------------------------------------------------------------------- INFO 13:04:40,341 ArgumentTypeDescriptor - Dynamically determined type of /dbsnp_137.hg19.vcf to be VCF INFO 13:04:40,346 ArgumentTypeDescriptor - Dynamically determined type of /hg19_cosmic_v54.vcf to be VCF INFO 13:04:40,353 GenomeAnalysisEngine - Strictness is SILENT INFO 13:04:40,414 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE Target Coverage: 1000

##### ERROR ------------------------------------------------------------------------------------------

Created 2013-12-16 18:42:29 | Updated 2013-12-16 19:32:00 | Tags: mutect

Hello, I've been running 3 different versions of MuTect on a set of tumor and matched-normal samples. Although all three versions produce comparable counts of somatic mutations, only the pre-release version (muTect-1.0.27783) returns anything with the 'KEEP' status. The other 2 versions ( muTect-1.1.1 and muTect-1.1.4) reject ALL returned somatic variants. I'm running all 3 versions of the software with default parameters. I was wondering if you could tell me what's different between the 3 releases of MuTect and why the later versions seem to be more conservative.

Thank you,

Alina

Created 2013-12-10 09:03:20 | Updated | Tags: mutect

I used the following command -

pps/jdk/1.6.0_25/bin/java -Xmx2g -jar muTect-1.1.4.jar -T MuTect --reference_sequence /home/exome/repository/ref_genomes/human_g1k_v37.fasta --cosmic HOME/b37_cosmic_v54_120711.vcf --dbsnp reference_files/dbsnp_132_b37.leftAligned.vcf --input_file:tumor reference_files/S0343/S0343_novoalign.bam --input_file:normal reference_files/S0345/S0345_novoalign.bam --out S0345_S0343.out --vcf S0345_S0343.out.vcf Weirdly no variants were found past chromosome 5. This has also occurred with the other samples I used. I know this is incorrect as variants were found by varscan in other chromosomes. I am not sure why this is happening? If you have a suggestion I would greatly appreciate it. Thank you. Created 2013-11-26 01:16:34 | Updated | Tags: mutect trimming Hi, GATK 2.7 does not require quality trimming anymore as the tools take base qualities into account. Is it OK to use these untrimmed reads with Mutect as well? Should I expect a big difference comparing the Mutect calls of trimmed and untrimmed versions of the raw reads? Created 2013-10-29 15:38:14 | Updated | Tags: mutect Hi, When I run mutect I get the following error. Commad: java -Xmx100g -jar muTect-1.1.1.jar --analysis_type MuTect --reference_sequence hs37d5.fa --input_file:tumor sample2.bam --input_file:normal sample1.bam Error processing 1:897790 java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeLo(TimSort.java:747) at java.util.TimSort.mergeAt(TimSort.java:483) at java.util.TimSort.mergeCollapse(TimSort.java:410) at java.util.TimSort.sort(TimSort.java:214) at java.util.TimSort.sort(TimSort.java:173) at java.util.Arrays.sort(Arrays.java:659) at java.util.Collections.sort(Collections.java:217) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:471) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:32) at org.broadinstitute.sting.gatk.traversals.TraverseLociNanoTraverseLociMap.apply(TraverseLociNano.java:168) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:156) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:229) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:200) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:44) at org.broadinstitute.sting.gatk.traversals.TraverseLociBase.traverse(TraverseLociBase.java:61) at org.broadinstitute.sting.gatk.traversals.TraverseLociBase.traverse(TraverseLociBase.java:16) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:73) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:277) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93) INFO 11:35:20,285 GATKRunReport - Uploaded run statistics report to AWS S3 ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR stack trace java.lang.RuntimeException: java.lang.IllegalArgumentException: Comparison method violates its general contract! at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:714) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:32) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:168) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:156) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:229) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:200) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:44) at org.broadinstitute.sting.gatk.traversals.TraverseLociBase.traverse(TraverseLociBase.java:61) at org.broadinstitute.sting.gatk.traversals.TraverseLociBase.traverse(TraverseLociBase.java:16) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:73) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:277) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93) Caused by: java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeLo(TimSort.java:747) at java.util.TimSort.mergeAt(TimSort.java:483) at java.util.TimSort.mergeCollapse(TimSort.java:410) at java.util.TimSort.sort(TimSort.java:214) at java.util.TimSort.sort(TimSort.java:173) at java.util.Arrays.sort(Arrays.java:659) at java.util.Collections.sort(Collections.java:217) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:471) ... 14 more ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A GATK RUNTIME ERROR has occurred (version 2.1-202-g2fe6a31): ##### ERROR ##### ERROR Please visit the wiki to see if this is a known problem ##### ERROR If not, please post the error, with stack trace, to the GATK forum ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk ##### ERROR ##### ERROR MESSAGE: java.lang.IllegalArgumentException: Comparison method violates its general contract! ##### ERROR ------------------------------------------------------------------------------------------ Created 2013-07-25 06:16:13 | Updated | Tags: mutect Gatk MuTect page mentions "We currently use cutoffs of at least 14 reads in the tumor and at least 8 in the normal" My question is how can I change these values. Are there any specific arguments that can be employed? Specifically, I would like to reduce these values as some of my exome samples contain very poorly covered regions but with good base and mapping quality, in IGV these mutations can be seen but MuTect REJECTs them. (I can risk increased false positives so that is not a concern). And also, is it possible to tell MuTect not to take into consideration strand bias ? Thanks Created 2013-04-11 22:33:51 | Updated | Tags: mutect java malformedvcf In addition to the standard mutect output, I'm interested in vcf output, and was happy to find a previous related question showing how to output vcf. However, I seem to be having some trouble with what I think is misformed output. Specifically, the genotype field is "0" for normal and "0/1" for tumor on every line #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT normal tumor 7 55230840 rs7781264 A G . REJECT DB GT:AD:BQ:DP:FA 0:0,1:.:1:1.00 0/1:0,27:29:27:1.00 7 55233109 rs150899403 G A . PASS DB;SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0:134,0:.:132:0.00:0 0/1:380,296:24:688:0.438:2 7 55233265 . A C . REJECT . GT:AD:BQ:DP:FA 0:6,0:.:6:0.00 0/1:251,24:12:275:0.087  The corresponding lines in the mutect output file are ## muTector v1.0.47986 contig position context ref_allele alt_allele tumor_name normal_name score dbsnp_site covered power tumor_power normal_power total_pairs improper_pairs map_Q0_reads t_lod_fstar tumor_f contaminant_fraction contaminant_lod t_ref_count t_alt_count t_ref_sum t_alt_sum t_ref_max_mapq t_alt_max_mapq t_ins_count t_del_count normal_best_gt init_n_lod n_ref_count n_alt_count n_ref_sum n_alt_sum judgement 7 55230840 ACTxTGC A G tumor normal 0 DBSNP UNCOVERED 0 0.612407 0 30 1 0 93.647278 1 0.02 -0.236654 0 27 0 808 0 70 0 0 GG -3.882263 0 1 0 30 REJECT 7 55233109 TGTxCCA G A tumor normal 0 DBSNP+COSMIC COVERED 1 1 1 1140 3 7 665.64967 0.43787 0.02 28.941878 380 296 10901 7263 70 70 0 0 GG 40.298595 134 0 4097 0 KEEP 7 55233265 CCCxCAG A C tumor normal 0 NOVEL UNCOVERED 0 1 0 305 5 0 8.112745 0.087273 0.02 2.430961 251 24 6289 302 70 70 0 0 AA 1.803681 6 0 154 0 REJECT  If it matters, this was with openjdk 1.6: $ /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/bin/java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.5) (rhel-1.50.1.11.5.el6_3-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

Any idea what might be causing this, and is there anything you or I can do to fix it?

Thanks, Kevin

Created 2013-02-20 16:06:16 | Updated | Tags: bundle mutect dbsnp cosmic

I'm having trouble finding the recommended COSMIC and dbSNP file for hg19 to use with MuTect (hg19_cosmic_v54_120711.vcf and dbsnp_132_b37.leftAligned.vcf). I can't find these in any of the bundles on the GATK public FTP site. I see a dbSNP file called dbsnp_132_b37.vcf; is this the same? I don't see any COSMIC file at all. I'm currently using bundle 2.3 for hg19 for the dbSNP files (and the standard indels from 1000G and Mills for indel realignment). Thanks!