We have a complex VCF record that doesn't appear to be properly treated by LeftAlignVariants, and I couldn't find evidence that this behavior has been reported anywhere else.
The record is
17 19561175 . GGTTTGT G,GTTTGT 49 PASS AC=1,1;AF=0.50,0.50;AN=2;DP=117;DS;MQ=60;MQ0=0;source=Locus GT:AB:AD:DP 1/2:0.925:68,49:117
Admittedly, the GGTTTGT>GTTTGT variation is odd because it's better specified as GG>G, but there's nothing semantically wrong with this record as written. (if you're wondering where this came from, it came from simulated data)
The challenge however is that the left-aligned version of this variant is AG>G. So I could see expecting the following output from LeftAlignVariants:
17 19561174 . AGGTTTGT AG,AGTTTGT
Rather ugly, but I think that's the right way to write the original complex variation after left-alignment. Alternatively a separated and phased representation would achieve the same:
17 19561174 . AG A .... 0/1:... 17 19561175 . GGTTTGT G .... 1|0:...
But I bet that would introduce all types of problems in LeftAlignVariants if you tried to make that happen.
I think it's a really hard problem to solve in the main, just wanted to post here to see if 1. You agree that it behaves this way and 2. Help anyone else who might be seeing something like this.
I was browsing through some of the less used functions in the GATK documentation, hence the following question: Does the LeftAlignIndels function do something additional that is not happening with IndelRealigner? In other words, do you recommend to run LeftAlignIndels on top of the indel realignment?
Best regards, Sophia