Created 2016-04-13 21:57:40 | Updated | Tags: diagnosetargets vcf gt

Comments (2)


I am running the following command: ${java} -jar ${GATK} \ -T DiagnoseTargets \ -R human_g1k_v37.fasta \ -L truseq_interval.bed \ -ip 100 \ -I cleaned.bam \ -o interval_stats.vcf \ -writeFullFormat \

It works great except that the output .vcf does not contain the GT field, which gives trouble after for annotating the vcc file.

Any workaround maybe?

Thanks !

Created 2016-03-24 12:31:15 | Updated 2016-03-24 12:32:14 | Tags: vcf-format gt

Comments (1)

Dear all,

After running Haplotyper caller, I get a VCF file. To understand the vcf file, I am following this document http://gatkforums.broadinstitute.org/gatk/discussion/1268/what-is-a-vcf-and-how-should-i-interpret-it.

According to GT section, in the output there should be one option among the following

0/0 - the sample is homozygous reference
0/1 - the sample is heterozygous, carrying 1 copy of each of the REF and ALT alleles
1/1 - the sample is homozygous alternate

However, in my output vcf file we also find values like GT 1/2, I am not sure how to interpret this output.

Could you please suggest what does this mean? I am using Haplotyper caller to call the variants

Created 2015-10-28 08:22:47 | Updated | Tags: haplotypecaller gt variant-calling

Comments (1)

Hello: I met a question when I used the GATK pipeline. When I perform single calling for my Sample A & B, I get the results like: Sample A Chr01 2245 . A C,G 171.31 PASS ... GT:AD:DP:GQ:PL 1/2:0,1,6:7:1:221,202,199,19,0,1 Sample B Chr01 2245 . A G 192.84 PASS ... GT:AD:DP:GQ:PL 1/1:0,8:8:18:221,18,0 These results are different. However, when I perform total calling for these two samples simultaneously, at that chromosone-position, I get this result: Chr01 2245 . A G 387.43 . .... GT:AD:DP:GQ:PL 1/1:0,6:7:18:220,18,0(A) 1/1:0,8:8:18:221,18,0(B) So that the SNP of Sample A is no longer C/G but just a G. I don't clearly know how it works out. Thanks for any help from your team.


Created 2015-03-09 16:48:43 | Updated | Tags: dp gt genotypegvcfs

Comments (7)


Sometimes when a sample has no reads, it still can have a genotype. I used GenotypeGVCFs (without CombineGVCFs) to generate a vcf with several samples. If I extract data for one sample, I sometimes see missing genotype with no read :

GT:AD:DP:GQ:PGT:PID:PL  ./.:0,0:0:.:.:.:.

which seem ok to me. Note that DP = 0.

but more rarely, I see a genotype with no read :

GT:AD:DP:GQ:PGT:PID:PL  0/0:0,0:.:3:.:.:0,3,45
GT:AD:DP:GQ:PGT:PID:PL  1/1:0,0:.:3:1|1:32486973_A_AG:45,3,0

and this is more questionnable. Note that DP = "."

Can you explain how genotype can be called without reads ? (I suspect haplotype to be involved, but without read I do not understand) any idea about the DP value ? why has it a value with missing genotype ?

thank you.

Created 2014-09-25 14:01:07 | Updated | Tags: gt leftalignandtrimvariants splitmultiallelics

Comments (3)

I have a VCF file with this line (i.e. GT=0/1=G/T):

20  10120854    .   G   T,A 32175.56    .   AC=399,18;AF=0.111,5.006e-03;AN=3596;BaseQRankSum=1

.03;DP=6710;FS=2.485;GQ_MEAN=15.45;GQ_STDDEV=20.21;InbreedingCoeff=0.1235;MLEAC=416,17;MLEAF=0.116,4.727e-03;MQ=60.00;MQ0=0 ;MQRankSum=0.358;NCC=189;QD=18.08;ReadPosRankSum=0.358 GT:AD:DP:GQ:PL 0/1:1,3,0:.:34:123,0,34,126,43,169

When I run it through version 3.2 of LeftAlignAndTrimVariants with the --splitMultiallelics flag, then the genotype information is lost; i.e. GT=./. and the output is:

20  10120854    .   G   T   32175.56    .   BaseQRankSum=1.03;DP=6710;FS=2.485;GQ_MEAN=15.45;GQ_STDDEV=20.21;InbreedingCoeff=0.1235;MLEAC=416,17;MLEAF=0.116,4.727e-03;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=189;QD=18.08;ReadPosRankSum=0.358 GT  ./.
20  10120854    .   G   A   32175.56    .   BaseQRankSum=1.03;DP=6710;FS=2.485;GQ_MEAN=15.45;GQ_STDDEV=20.21;InbreedingCoeff=0.1235;MLEAC=416,17;MLEAF=0.116,4.727e-03;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=189;QD=18.08;ReadPosRankSum=0.358 GT  ./.

I have attached the input VCF. I also got rid of the PL information, but still got the unexpected output.

Maybe I should just write some code for normalizing variants myself in order to save time :) Thank you very much as always!

Created 2014-02-27 15:25:19 | Updated 2014-02-27 22:40:50 | Tags: unifiedgenotyper genotype vcf-format gt

Comments (10)

generated with gatk 2.8-1-g932cd3a

Although it is rare I see Genotype Fields that are inconsistent with the AD values (Read as table):

CHROM   POS ID  REF ALT FILTER  QUAL    ABHet   ABHom   AC  AF  AN  BaseCounts  BaseQRankSum    DP  Dels    FS  GC  HRun    HaplotypeScore  LowMQ   MLEAC   MLEAF   MQ  MQ0 MQRankSum   MeanDP  MinDP   OND PercentNBaseSolid   QD  ReadPosRankSum  Samples Somatic VariantType cosmic.ID   1.AB    1.AD    1.DP    1.F 1.GQ    1.GT    1.MQ0   1.PL    1.Z 2.AB    2.AD    2.DP    2.F 2.GQ    2.GT    2.MQ0   2.PL    2.Z 3.AB    3.AD    3.DP    3.F 3.GQ    3.GT    3.MQ0   3.PL    3.Z 4.AB    4.AD    4.DP    4.F 4.GQ    4.GT    4.MQ0   4.PL    4.Z 5.AB    5.AD    5.DP    5.F 5.GQ    5.GT    5.MQ0   5.PL    5.Z
11  92616485    0   A   C   PASS    63.71   0.333   0.698   1   0.1 10  89,54,0,0   -5.631  143 0   49.552  71.29   2   4.4154  0.0000,0.0000,143   1   0.1 50.27   0   -1.645  28.6    16  0.242   0   2.36    2.125   R5_A3_1 NA  SNP COSM467570  NA  24,9    33  0.2727272727    54  A/A 0   0,54,537    -1.3055824197   0.33    9,18    27  0.6666666667    96  A/C 0   96,0,178    0.8660254038    NA  21,11   32  0.34375 21  A/A 0   0,21,466    -0.8838834765   NA  12,4    16  0.25    27  A/A 0   0,27,272    -1  NA  23,12   35  0.3428571429    42  A/A 0   0,42,537    -0.9296696802

This shows that for example sample 5 has a AD value of '23,12' and a GT of 'A/A' aka homyzougous reference allele. I've included a screenshot wich shows low base quality and complete strand bias (Which I suspect to mis variants). So whats the prob? and how can i recalculate the GT's based on AD? because i cannot filter based on genotypes when they are buggy....