Note: only do this if you have been explicitly asked to do so.
You posted a question about a problem you had with GATK tools, we answered that we think it's a bug, and we asked you to submit a detailed bug report.
A snippet file is a slice of the original BAM file which contains the problematic region and is sufficient to reproduce the error. We need it in order to reproduce the problem on our end, which is the first necessary step to finding and fixing the bug. We ask you to provide this as a snippet rather than the full file so that you don't have to upload (and we don't have to process) huge giga-scale files.
-L argument and progressively narrowing down the interval-L to write the problematic region (with 500 bp padding on either side) to a new file -- this is your snippet file..zip or .tar.gz archive As reported here:
If you encounter this bug too, please don't post a new question about it. Feel free to comment in this thread to let us know you have also had the same problem. Tell us what version of the GATK you were using and post your command line.
Thank you for your patience while we work to fix this issue.
Please be aware that if you have been using BaseRecalibrator scatter-gathered with Queue (GATK versions 2.0 and 2.1), your results may be wrong. You will need to redo the base recalibration of your data WITHOUT scatter-gathering.
This issue will be fixed in the next release (version 2.2). We apologize for any inconvenience this may cause you!
We have identified a major bug in ReduceReads -- GATK versions 2.0 and 2.1. The effect of the bug is that variant regions with more than 100 reads and fewer than 250 reads get downsampled to 0 reads.
This has now been fixed in the most recent release.
To check if you are using a buggy version, run the following:
samtools view -H $BAM
This will produce the following output:
@PG ID:GATK ReduceReads VN:XXX
If XXX is 2.0 or 2.1, any results obtained with your current version are suspect, and you will need to upgrade to the most recent version then rerun your processing.
Our most sincere apologies for the inconvenience.
Hi,
I did search on the site and seems several have had a similar problem but I couldn't fix mine based on those. I've managed to get to this point succesfully with creating a realigned bam file with IndelRealigner. I'm using a non-model organism with self created masking list from a VCF run done before the realignment (list in bed format). I've had to use the -fixMisencodedQuals so far through out my runs. I checked the masking file so that the ranges do not go over the ranges that are specified in my list of regions to cover (-L option).
Is there anything else I would need to check still?
Command line view:
sulyba@hippu4:/fs/lustre/wrk/sulyba/stickleback_capture> java -jar ./GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T BaseRecalibrator -R gasAcu_combinedbac_inv7.fa -I realigned_FF1_inv7c.bam -knownSites Sites_to_Mask.bed -L Capture_Target_Regions.intervals -o recalc_FF1_inv7c.grp -plots recal_FF1_plots.grp.pdf -fixMisencodedQuals
INFO 12:34:10,502 HelpFormatter - --------------------------------------------------------------------------------
INFO 12:34:10,510 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.3-9-ge5ebf34, Compiled 2013/01/11 22:43:14
INFO 12:34:10,510 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 12:34:10,510 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 12:34:10,516 HelpFormatter - Program Args: -T BaseRecalibrator -R gasAcu_combinedbac_inv7.fa -I realigned_FF1_inv7c.bam -knownSites Sites_to_Mask.bed -L Capture_Target_Regions.intervals -o recalc_FF1_inv7c.grp -plots recal_FF1_plots.grp.pdf -fixMisencodedQuals
INFO 12:34:10,516 HelpFormatter - Date/Time: 2013/02/01 12:34:10
INFO 12:34:10,517 HelpFormatter - --------------------------------------------------------------------------------
INFO 12:34:10,517 HelpFormatter - --------------------------------------------------------------------------------
INFO 12:34:10,551 ArgumentTypeDescriptor - Dynamically determined type of Sites_to_Mask.bed to be BED
INFO 12:34:10,563 GenomeAnalysisEngine - Strictness is SILENT
INFO 12:34:11,309 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 12:34:11,319 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 12:34:11,392 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.07
INFO 12:34:11,416 RMDTrackBuilder - Loading Tribble index from disk for file Sites_to_Mask.bed
INFO 12:34:11,660 GenomeAnalysisEngine - Processing 20350670 bp from intervals
INFO 12:34:11,672 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 12:34:11,674 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 12:34:11,946 BaseRecalibrator - The covariates being used here:
INFO 12:34:11,947 BaseRecalibrator - ReadGroupCovariate
INFO 12:34:11,947 BaseRecalibrator - QualityScoreCovariate
INFO 12:34:11,947 BaseRecalibrator - ContextCovariate
INFO 12:34:11,947 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
INFO 12:34:11,947 BaseRecalibrator - CycleCovariate
INFO 12:34:11,952 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 3]
INFO 12:34:11,952 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 12:34:11,952 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 12:34:11,953 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 3]
INFO 12:34:11,953 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 12:34:11,953 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 12:34:11,953 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 1012, 3]
INFO 12:34:11,953 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 12:34:11,953 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 12:34:11,953 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 1002, 3]
INFO 12:34:11,954 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO 12:34:11,954 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO 12:34:12,109 ReadShardBalancer$1 - Loading BAM index data for next contig
INFO 12:34:12,120 ReadShardBalancer$1 - Done loading BAM index data for next contig
INFO 12:34:12,644 ReadShardBalancer$1 - Loading BAM index data for next contig
INFO 12:34:12,645 ReadShardBalancer$1 - Done loading BAM index data for next contig
INFO 12:34:14,901 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.lang.ArrayIndexOutOfBoundsException: -2
at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158)
at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:246)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530)
at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateBAQArray(BaseRecalibrator.java:428)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:243)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:248)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:219)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:91)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:55)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.3-9-ge5ebf34):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: -2
##### ERROR ------------------------------------------------------------------------------------------
Dear GATK Team,
I am running a pipeline on several high coverage human individuals that have been mapped using bwa and processed using samtools, picard and gatk. The bam-files pass ValidateSam from picard, but when I run the bqsr step some of them fails giving a Malformed read error (using -filterMBQ does not help in this case). I tracked down the error to bamfiles that ends with a paired end read where the mate maps in the beginning of the contig (in my case human mtDNA).
Eg, this will make it crash:
readX 177 MT 16558 37 7S2M2I10M80S = 294 -16176 GACCTGTGATCC...
readY 177 MT 16558 37 7S2M2I10M80S = 238 -16232 GACCTGTGATCC...
readZ 113 MT 16558 37 7S2M2I10M80S = 273 -16197 GACCTGTGATCC...
[END]
where a file ending like this wont crash:
readX 83 MT 16469 60 101M = 16246 -324 TGGGGGTAGCTAAAGTGAAC...
readY 147 MT 16469 60 101M = 16267 -303 TGGGGGTAGCTAAAGTGA...
readZ 147 MT 16469 60 101M = 16193 -377 TGGGGGTAGCTAAAGTGAAC...
[END]
I am running GATK v2.3-9-ge5ebf34, but the same error occurs using GATK v-2.2-3 (my previous version). I can genotype the files using UnifiedGenotyper without any problem as well.
This is the error:
org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Array length mismatch detected. Malformed read? at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateFractionalErrorArray(BaseRecalibrator.java:380) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:246) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$MapReduceJob.run(NanoScheduler.java:468) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
Cheers,
Simon
Hi,
I'm trying to apply my recalibrated quality values with the following command:
java -jar GenomeAnalysisTK.jar -T PrintReads -R genome_noHap.fa -I realigned.bam -BQSR realigned.recal_data.grp -o realigned.recal.bam -nct 25
And get the following error. I couldn't find it elsewhere. Would be happy if I could get some advice.
Let me know if you need more info.
Best,
Karl
java.lang.ArrayIndexOutOfBoundsException: 0 at org.broadinstitute.sting.utils.recalibration.BaseRecalibration.recalibrateRead(BaseRecalibration.java:151) at org.broadinstitute.sting.utils.recalibration.BQSRReadTransformer.apply(BQSRReadTransformer.java:93) at org.broadinstitute.sting.gatk.walkers.readutils.PrintReads.map(PrintReads.java:251) at org.broadinstitute.sting.gatk.walkers.readutils.PrintReads.map(PrintReads.java:98) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:230) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:218) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722)
Dear GATK team,
I think I found a bug as shown in the attached screenshot. The highlighted line of the vcf file after using UnifiedGenotyper, VariantRecalibrator, ApplyRecalibration, ReadBackedPhasing, SelectVariants, CombineVariants, and SnpEff (following Best Practices) shows X 307897 . C CGTGAAGG 9494.14 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=8.010;DP=18;EFF=DOWNSTREAM(MODIFIER||||PPP2R3B|protein_coding|CODING|ENST00000381625|),INTRON(MODIFIER||||PPP2R3B|protein_coding|CODING|ENST00000390665|4),INTRON(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000445792|5),INTRON(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000475859|2),INTRON(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000477110|1),INTRON(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000496630|3),UPSTREAM(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000468169|),UPSTREAM(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000477636|),UPSTREAM(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000484364|);FS=0.000;InbreedingCoeff=0.3019;MQ=45.55;MQ0=0;MQRankSum=-8.092;QD=9.55;ReadPosRankSum=2.020;VQSLOD=2.27;culprit=ReadPosRankSum;set=29_BOTH GT:AD:DP:GQ:PL ./. 0/1:0,14:9:99:367,42,0
To be sure, I checked the two samples' bams at the PrintReads stage but do not see such a long indel. Is this a potentially similar bug to http://gatkforums.broadinstitute.org/discussion/2516/rare-homozygous-indel-bug-in-gatk-2-4-9 ?
Thank you for your attention.
Jamie