Possible LeftAlignVariants bug for multi-allelic indel variant
Posted in Ask the GATK team | Last updated on

Comments (3)

We have a complex VCF record that doesn't appear to be properly treated by LeftAlignVariants, and I couldn't find evidence that this behavior has been reported anywhere else.

The record is

17      19561175        .       GGTTTGT G,GTTTGT        49      PASS    AC=1,1;AF=0.50,0.50;AN=2;DP=117;DS;MQ=60;MQ0=0;source=Locus     GT:AB:AD:DP     1/2:0.925:68,49:117

Admittedly, the GGTTTGT>GTTTGT variation is odd because it's better specified as GG>G, but there's nothing semantically wrong with this record as written. (if you're wondering where this came from, it came from simulated data)

The challenge however is that the left-aligned version of this variant is AG>G. So I could see expecting the following output from LeftAlignVariants:

17      19561174        .       AGGTTTGT AG,AGTTTGT

Rather ugly, but I think that's the right way to write the original complex variation after left-alignment. Alternatively a separated and phased representation would achieve the same:

17      19561174        .       AG  A             .... 0/1:...
17      19561175        .       GGTTTGT G   .... 1|0:...

But I bet that would introduce all types of problems in LeftAlignVariants if you tried to make that happen.

I think it's a really hard problem to solve in the main, just wanted to post here to see if 1. You agree that it behaves this way and 2. Help anyone else who might be seeing something like this.

Return to top Comment on this article in the forum