Tagged with #bug
1 documentation article | 3 events or announcements | 4 forum discussions


Note: only do this if you have been explicitly asked to do so.

Scenario:

You posted a question about a problem you had with GATK tools, we answered that we think it's a bug, and we asked you to submit a detailed bug report.

Here's what you need to provide:

  • The exact command line that you used when you had the problem (in a text file)
  • The full stack trace (program output in the console) from the start of the run to the end or error message (in a text file)
  • A snippet of the BAM file if applicable and the index (.bai) file associated with it
  • If a non-standard reference (i.e. not available in our resource bundle) was used, we need the .fasta, .fai, and .dict files for the reference
  • Any other relevant files such as recalibration plots

A snippet file is a slice of the original BAM file which contains the problematic region and is sufficient to reproduce the error. We need it in order to reproduce the problem on our end, which is the first necessary step to finding and fixing the bug. We ask you to provide this as a snippet rather than the full file so that you don't have to upload (and we don't have to process) huge giga-scale files.

Here's how you create a snippet file:

  • Look at the error message and see if it cites a specific position where the error occurred
  • If not, identify what region caused the problem by running with -L argument and progressively narrowing down the interval
  • Once you have the region, use PrintReads with -L to write the problematic region (with 500 bp padding on either side) to a new file -- this is your snippet file.
  • Test your command line on this snippet file to make sure you can still reproduce the error on it.

And finally, here's how you send us the files:

  • Put all those files into a .zip or .tar.gz archive
  • Upload them onto our FTP server as explained here (make sure you use the proper UPLOAD credentials)
  • Post in the original discussion thread that you have done this
  • Be sure to tell us the name of your archive file!

We will get back to you --hopefully with a bug fix!-- as soon as we can.

As reported here:

  • http://gatkforums.broadinstitute.org/discussion/2342/unifiedgenotyper-causes-arrayindexoutofboundsexception-3-how-to-fix-it
  • http://gatkforums.broadinstitute.org/discussion/2343/printreads-yet-another-arrayindexoutofboundsexception
  • http://gatkforums.broadinstitute.org/discussion/2345/arrayindexoutofboundsexception-in-haplotypecaller

If you encounter this bug too, please don't post a new question about it. Feel free to comment in this thread to let us know you have also had the same problem. Tell us what version of the GATK you were using and post your command line.

Thank you for your patience while we work to fix this issue.

Update: it seems the cases in the different tools aren't directly related. The PrintReads and HaplotypeCaller cases are now solved; still waiting on test files for the UG case.

We have discovered a bug that seriously impacts the results of BQSR/ BaseRecalibrator when it is run with the scatter-gather functionality of Queue. To be clear, the bug does NOT affect BaseRecalibrator runs performed "normally", i.e. WITHOUT Queue's scatter-gather.

Consequences/ Solution:

Please be aware that if you have been using BaseRecalibrator scatter-gathered with Queue (GATK versions 2.0 and 2.1), your results may be wrong. You will need to redo the base recalibration of your data WITHOUT scatter-gathering.

This issue will be fixed in the next release (version 2.2). We apologize for any inconvenience this may cause you!

We have identified a major bug in ReduceReads -- GATK versions 2.0 and 2.1. The effect of the bug is that variant regions with more than 100 reads and fewer than 250 reads get downsampled to 0 reads.

This has now been fixed in the most recent release.

To check if you are using a buggy version, run the following:

    samtools view -H $BAM

This will produce the following output:

    @PG ID:GATK ReduceReads VN:XXX

If XXX is 2.0 or 2.1, any results obtained with your current version are suspect, and you will need to upgrade to the most recent version then rerun your processing.

Our most sincere apologies for the inconvenience.

Hi,

I did search on the site and seems several have had a similar problem but I couldn't fix mine based on those. I've managed to get to this point succesfully with creating a realigned bam file with IndelRealigner. I'm using a non-model organism with self created masking list from a VCF run done before the realignment (list in bed format). I've had to use the -fixMisencodedQuals so far through out my runs. I checked the masking file so that the ranges do not go over the ranges that are specified in my list of regions to cover (-L option).

Is there anything else I would need to check still?

Command line view:

sulyba@hippu4:/fs/lustre/wrk/sulyba/stickleback_capture> java -jar ./GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T BaseRecalibrator -R gasAcu_combinedbac_inv7.fa -I realigned_FF1_inv7c.bam -knownSites Sites_to_Mask.bed -L Capture_Target_Regions.intervals -o recalc_FF1_inv7c.grp -plots recal_FF1_plots.grp.pdf -fixMisencodedQuals

INFO  12:34:10,502 HelpFormatter - --------------------------------------------------------------------------------
INFO  12:34:10,510 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.3-9-ge5ebf34, Compiled 2013/01/11 22:43:14
INFO  12:34:10,510 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  12:34:10,510 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  12:34:10,516 HelpFormatter - Program Args: -T BaseRecalibrator -R gasAcu_combinedbac_inv7.fa -I realigned_FF1_inv7c.bam -knownSites Sites_to_Mask.bed -L Capture_Target_Regions.intervals -o recalc_FF1_inv7c.grp -plots recal_FF1_plots.grp.pdf -fixMisencodedQuals
INFO  12:34:10,516 HelpFormatter - Date/Time: 2013/02/01 12:34:10
INFO  12:34:10,517 HelpFormatter - --------------------------------------------------------------------------------
INFO  12:34:10,517 HelpFormatter - --------------------------------------------------------------------------------
INFO  12:34:10,551 ArgumentTypeDescriptor - Dynamically determined type of Sites_to_Mask.bed to be BED
INFO  12:34:10,563 GenomeAnalysisEngine - Strictness is SILENT
INFO  12:34:11,309 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO  12:34:11,319 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:34:11,392 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.07
INFO  12:34:11,416 RMDTrackBuilder - Loading Tribble index from disk for file Sites_to_Mask.bed
INFO  12:34:11,660 GenomeAnalysisEngine - Processing 20350670 bp from intervals
INFO  12:34:11,672 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  12:34:11,674 ProgressMeter -        Location processed.reads  runtime per.1M.reads completed total.runtime remaining
INFO  12:34:11,946 BaseRecalibrator - The covariates being used here:
INFO  12:34:11,947 BaseRecalibrator -   ReadGroupCovariate
INFO  12:34:11,947 BaseRecalibrator -   QualityScoreCovariate
INFO  12:34:11,947 BaseRecalibrator -   ContextCovariate
INFO  12:34:11,947 ContextCovariate -           Context sizes: base substitution model 2, indel substitution model 3
INFO  12:34:11,947 BaseRecalibrator -   CycleCovariate
INFO  12:34:11,952 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 3]
INFO  12:34:11,952 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO  12:34:11,952 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO  12:34:11,953 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 3]
INFO  12:34:11,953 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO  12:34:11,953 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO  12:34:11,953 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 1012, 3]
INFO  12:34:11,953 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO  12:34:11,953 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO  12:34:11,953 NestedIntegerArray - Creating NestedIntegerArray with dimensions [1, 94, 1002, 3]
INFO  12:34:11,954 NestedIntegerArray - Pre-allocating first 2 dimensions
INFO  12:34:11,954 NestedIntegerArray - Done pre-allocating first 2 dimensions
INFO  12:34:12,109 ReadShardBalancer$1 - Loading BAM index data for next contig
INFO  12:34:12,120 ReadShardBalancer$1 - Done loading BAM index data for next contig
INFO  12:34:12,644 ReadShardBalancer$1 - Loading BAM index data for next contig
INFO  12:34:12,645 ReadShardBalancer$1 - Done loading BAM index data for next contig
INFO  12:34:14,901 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.lang.ArrayIndexOutOfBoundsException: -2
        at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158)
        at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:246)
        at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542)
        at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595)
        at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530)
        at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663)
        at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateBAQArray(BaseRecalibrator.java:428)
        at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:243)
        at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112)
        at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203)
        at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:248)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:219)
        at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:91)
        at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:55)
        at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
        at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147)
        at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.3-9-ge5ebf34):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: -2
##### ERROR ------------------------------------------------------------------------------------------

Dear GATK Team,

I am running a pipeline on several high coverage human individuals that have been mapped using bwa and processed using samtools, picard and gatk. The bam-files pass ValidateSam from picard, but when I run the bqsr step some of them fails giving a Malformed read error (using -filterMBQ does not help in this case). I tracked down the error to bamfiles that ends with a paired end read where the mate maps in the beginning of the contig (in my case human mtDNA).

Eg, this will make it crash:

readX 177 MT 16558 37 7S2M2I10M80S = 294 -16176 GACCTGTGATCC...
readY 177 MT 16558 37 7S2M2I10M80S = 238 -16232 GACCTGTGATCC...
readZ 113 MT 16558 37 7S2M2I10M80S = 273 -16197 GACCTGTGATCC...
[END]

where a file ending like this wont crash:

readX 83 MT 16469 60 101M = 16246 -324 TGGGGGTAGCTAAAGTGAAC...
readY 147 MT 16469 60 101M = 16267 -303 TGGGGGTAGCTAAAGTGA...
readZ 147 MT 16469 60 101M = 16193 -377 TGGGGGTAGCTAAAGTGAAC...
[END]

I am running GATK v2.3-9-ge5ebf34, but the same error occurs using GATK v-2.2-3 (my previous version). I can genotype the files using UnifiedGenotyper without any problem as well.

This is the error:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Array length mismatch detected. Malformed read? at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateFractionalErrorArray(BaseRecalibrator.java:380) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:246) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$MapReduceJob.run(NanoScheduler.java:468) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Array length mismatch detected. Malformed read?
ERROR ------------------------------------------------------------------------------------------

Cheers,

Simon

Hi,

I'm trying to apply my recalibrated quality values with the following command:

java -jar GenomeAnalysisTK.jar -T PrintReads -R genome_noHap.fa -I realigned.bam -BQSR realigned.recal_data.grp -o realigned.recal.bam -nct 25

And get the following error. I couldn't find it elsewhere. Would be happy if I could get some advice.

Let me know if you need more info.

Best,

Karl

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 0 at org.broadinstitute.sting.utils.recalibration.BaseRecalibration.recalibrateRead(BaseRecalibration.java:151) at org.broadinstitute.sting.utils.recalibration.BQSRReadTransformer.apply(BQSRReadTransformer.java:93) at org.broadinstitute.sting.gatk.walkers.readutils.PrintReads.map(PrintReads.java:251) at org.broadinstitute.sting.gatk.walkers.readutils.PrintReads.map(PrintReads.java:98) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:230) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:218) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.4-3-g2a7af43):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: 0
ERROR ------------------------------------------------------------------------------------------

Dear GATK team,

I think I found a bug as shown in the attached screenshot. The highlighted line of the vcf file after using UnifiedGenotyper, VariantRecalibrator, ApplyRecalibration, ReadBackedPhasing, SelectVariants, CombineVariants, and SnpEff (following Best Practices) shows X 307897 . C CGTGAAGG 9494.14 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=8.010;DP=18;EFF=DOWNSTREAM(MODIFIER||||PPP2R3B|protein_coding|CODING|ENST00000381625|),INTRON(MODIFIER||||PPP2R3B|protein_coding|CODING|ENST00000390665|4),INTRON(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000445792|5),INTRON(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000475859|2),INTRON(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000477110|1),INTRON(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000496630|3),UPSTREAM(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000468169|),UPSTREAM(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000477636|),UPSTREAM(MODIFIER||||PPP2R3B|retained_intron|CODING|ENST00000484364|);FS=0.000;InbreedingCoeff=0.3019;MQ=45.55;MQ0=0;MQRankSum=-8.092;QD=9.55;ReadPosRankSum=2.020;VQSLOD=2.27;culprit=ReadPosRankSum;set=29_BOTH GT:AD:DP:GQ:PL ./. 0/1:0,14:9:99:367,42,0

To be sure, I checked the two samples' bams at the PrintReads stage but do not see such a long indel. Is this a potentially similar bug to http://gatkforums.broadinstitute.org/discussion/2516/rare-homozygous-indel-bug-in-gatk-2-4-9 ?

Thank you for your attention.

Jamie