I'm running haplotype caller (latest nightly build) with -A StrandAlleleCountsBySample parameter to get strand specific read counts (SAC). For variants with more than the default 6 maximal alt alleles there is a problem with the SAC field:

2 47641559 . TAAAAAAAAAAA T,TA,TAA,TAAA,TAAAA,TAAAAAA,<NON_REF> 1308.73 . BaseQRankSum=0.434;ClippingRankSum=0.768;DP=105;ExcessHet=3.0103;MLEAC=0,0,0,0,0,1,1;MLEAF=0.00,0.00,0.00,0.00,0.00,0.500,0.500;MQRankSum=-1.704;RAW_MQ=378000.00;ReadPosRankSum=1.971 GT:AD:DP:GQ:PL:SAC:SB 6/7:3,0,0,3,4,5,16,9:40:99:1346,1509,3479,1488,3459,3455,1204,2706,2706,2585,989,2303,2303,2215,2132,692,1723,1720,1604,1576,1507,277,1002,983,781,714,657,745,268,447,447,355,313,232,0,147:3,0,0,0,0,0,0,3,1,3,0,5,0,16,0,0:3,0,1,27

So there are 9 reads originating from another than one of the given alt alleles (=NON_REF), but the SAC field is missing these reads. This gets especially annoying if one of the NON_REF alleles is selected as most likely when combining the sample with others in GenotypeGVCFs.

Another example: 11 108141955 . CTTTT C,CT,CTT,CTTT,ATTTT,TTTTT,<NON_REF> 1552.73 . BaseQRankSum=-0.227;DP=704;ExcessHet=3.0103;MLEAC=0,0,0,1,0,0,0;MLEAF=0.00,0.00,0.00,0.500,0.00,0.00,0.00;MQ=60.02;MQRankSum=-0.254;ReadPosRankSum=1.249 GT:AD:DP:GQ:PL:SAC:SB 0/4:431,5,4,27,127,4,3,3:604:99:1590,3247,26394,3043,24416,23841,2156,18595,18550,17063,0,11232,11190,10498,9517,3572,20205,18965,15617,10454,20237,3558,20037,18797,15420,10362,19931,19926,2484,13421,13344,12357,9074,12834,12837,11563:213,218,2,3,2,2,14,13,54,73,0,4,0,3,0,0:213,218,72,98

Is there some way to make the VCF QD/FS filed support multiallelic ? I Want to filter VCF by QD/FS info for RNA data.

1)The QD/FS filed is NOT the VCF Type A'; Type 'A': If the Field has one value per alternate allele then this value should beA';

2)There is no way to let GATK to output VCF multiallelic separately. Now multiallelic of VCF share the same QD/FS value.

Best Regards. Wang Yugui

Is GATK Unified genotyper able to call multi-allelic positions in a single pooled sample? Case is a pool of 13 samples, we use UG with ploidy set to 26. If I understand the supplementaries of the original publications correct, UG will never be able to call three alleles at a single position. in single sample calling. Or does this not hold for high ploidy analysis?

If needed, we can call multiple pools together, but this becomes computationally intensive.

In summary, we would like to call a 14xG,6xA,6xT call for example.

Also, how does UG take noise into account when genotyping (sequencing errors), when for example 3% of reads is aberrant at a position, this could correspond to ~ 1/26.

Thanks for any guidelines,


I would like to know if GATK can call tri-allelic variants in one single sample? I am asking this because I am interested in clonal mosaicism and then looking at tri-allelic variants might be a way to look into that...

Thanks in advance, João Fadista

I'm attempting to use PhaseByTransmission