Tagged with #gt
0 documentation articles | 0 announcements | 2 forum discussions


No posts found with the requested search criteria.
No posts found with the requested search criteria.
Comments (1)

I have a VCF file with this line (i.e. GT=0/1=G/T):

20  10120854    .   G   T,A 32175.56    .   AC=399,18;AF=0.111,5.006e-03;AN=3596;BaseQRankSum=1

.03;DP=6710;FS=2.485;GQ_MEAN=15.45;GQ_STDDEV=20.21;InbreedingCoeff=0.1235;MLEAC=416,17;MLEAF=0.116,4.727e-03;MQ=60.00;MQ0=0 ;MQRankSum=0.358;NCC=189;QD=18.08;ReadPosRankSum=0.358 GT:AD:DP:GQ:PL 0/1:1,3,0:.:34:123,0,34,126,43,169

When I run it through version 3.2 of LeftAlignAndTrimVariants with the --splitMultiallelics flag, then the genotype information is lost; i.e. GT=./. and the output is:

20  10120854    .   G   T   32175.56    .   BaseQRankSum=1.03;DP=6710;FS=2.485;GQ_MEAN=15.45;GQ_STDDEV=20.21;InbreedingCoeff=0.1235;MLEAC=416,17;MLEAF=0.116,4.727e-03;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=189;QD=18.08;ReadPosRankSum=0.358 GT  ./.
20  10120854    .   G   A   32175.56    .   BaseQRankSum=1.03;DP=6710;FS=2.485;GQ_MEAN=15.45;GQ_STDDEV=20.21;InbreedingCoeff=0.1235;MLEAC=416,17;MLEAF=0.116,4.727e-03;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=189;QD=18.08;ReadPosRankSum=0.358 GT  ./.

I have attached the input VCF. I also got rid of the PL information, but still got the unexpected output.

Maybe I should just write some code for normalizing variants myself in order to save time :) Thank you very much as always!

Comments (10)

generated with gatk 2.8-1-g932cd3a

Although it is rare I see Genotype Fields that are inconsistent with the AD values (Read as table):

CHROM   POS ID  REF ALT FILTER  QUAL    ABHet   ABHom   AC  AF  AN  BaseCounts  BaseQRankSum    DP  Dels    FS  GC  HRun    HaplotypeScore  LowMQ   MLEAC   MLEAF   MQ  MQ0 MQRankSum   MeanDP  MinDP   OND PercentNBaseSolid   QD  ReadPosRankSum  Samples Somatic VariantType cosmic.ID   1.AB    1.AD    1.DP    1.F 1.GQ    1.GT    1.MQ0   1.PL    1.Z 2.AB    2.AD    2.DP    2.F 2.GQ    2.GT    2.MQ0   2.PL    2.Z 3.AB    3.AD    3.DP    3.F 3.GQ    3.GT    3.MQ0   3.PL    3.Z 4.AB    4.AD    4.DP    4.F 4.GQ    4.GT    4.MQ0   4.PL    4.Z 5.AB    5.AD    5.DP    5.F 5.GQ    5.GT    5.MQ0   5.PL    5.Z
11  92616485    0   A   C   PASS    63.71   0.333   0.698   1   0.1 10  89,54,0,0   -5.631  143 0   49.552  71.29   2   4.4154  0.0000,0.0000,143   1   0.1 50.27   0   -1.645  28.6    16  0.242   0   2.36    2.125   R5_A3_1 NA  SNP COSM467570  NA  24,9    33  0.2727272727    54  A/A 0   0,54,537    -1.3055824197   0.33    9,18    27  0.6666666667    96  A/C 0   96,0,178    0.8660254038    NA  21,11   32  0.34375 21  A/A 0   0,21,466    -0.8838834765   NA  12,4    16  0.25    27  A/A 0   0,27,272    -1  NA  23,12   35  0.3428571429    42  A/A 0   0,42,537    -0.9296696802

This shows that for example sample 5 has a AD value of '23,12' and a GT of 'A/A' aka homyzougous reference allele. I've included a screenshot wich shows low base quality and complete strand bias (Which I suspect to mis variants). So whats the prob? and how can i recalculate the GT's based on AD? because i cannot filter based on genotypes when they are buggy....