Patch for BadCigar filtering on Novoalign reads containing zero length CIGAR elements
Posted in Ask the GATK team | Last updated on


Comments (4)

I'm running into a HaplotypeCaller issue with the latest release (2.5-2) using Novoalign input reads. Here's a small reproducible input file:

https://s3.amazonaws.com/chapmanb/gatk_hc_problem_cigar.bam

Running:

java -Xms750m -Xmx3g -jar GenomeAnalysisTK.jar -R GRCh37.fa -I
problem_cigar.bam -L 4:120371315-120371586 -T HaplotypeCaller -o out.vcf
--read_filter BadCigar -debug

Errors out with:

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: START (0) >
(-1) STOP -- this should never happen, please check read:
HWI-ST1124:106:C15APACXX:1:1107:15450:87092 2/2 58b aligned read. (CIGAR: 38H4D58M)

Looking at the read, the CIGAR string appears to be tricking the BadCigar filter, since it has a 0M element between an insertion and deletion:

38M4I0M4D58M

This patch fixes the BadCigar filter by only considering CIGAR elements with non-zero length:

https://gist.github.com/chapmanb/5568411

With this applied, the read will be properly filtered and HaplotypeCaller can continue without a problem. Hope this helps, please let me know if any other detail about the problem would be helpful.


Return to top Comment on this article in the forum