PL for Genotype?
Posted in Ask the GATK team | Last updated on


Comments (2)

Hi -- I'm trying out GATK for the first time, but seeing some "confusing" results. I've just run what seems to be an essentially standard unifiedGenotyper on two sorted bam files (single samples P1003 and 03090), and gotten what appears to be heavily prior/other parameter driven results. Here is the strangeness I'm seeing at, e.g., Chr1 811659.

For the first sample: A called homozygous non ref (1/1) based on ~40ref ~10alt bases with PL favoring non ref or het call :207,21,0

For the second sample: A called het (0/1) based on ~44ref ~6alt bases with strong PL het evidence: 91,0,154

The behavior doesn't seem consistent, or at the very least, it seems quite sensitive to "small" data fluctuations.
At any rate, I don't find the calls particularly appealing.

I haven't had any luck looking through the documentation to see what exactly the PL field is telling me (model, prior(?), etc.) so that I can work out why the numbers come out the way they do (i.e., whether or not they're right/I like what they're trying to compute).
Can anyone point me in the right direction?
It's quite possible that I'm not using some of the command line parameters correctly, but I think I'm just using standard defaults and I don't see any other parameters that would seem to be crucial to these results.

Anyway, any help to expedite my learning curve here would be much appreciated.

Thanks! Scott

grep "^#" -v P1003_TruSeq_1_ATCACG_R1.gatk.raw.vcf | head -3063 | tail -1 Chr1 811659 . C G 178.80 . AC=2;AF=1.00;AN=2;BaseQRankSum=0.733;DP=50;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=11.84;MQ0=28;MQRankSum=3.665;QD=3.58;ReadPosRankSum=1.273 GT:AD:DP:GQ:PL 1/1:40,10:48:21:207,21,0

grep "^#" -v 03090_F2_TruSeq_8_ACTTGA_R1.gatk.raw.vcf | head -3063 | tail -1 Chr1 811659 . C G 62.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.672;DP=50;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=1;MLEAF=0.500;MQ=14.29;MQ0=29;MQRankSum=1.567;QD=1.26;ReadPosRankSum=-1.746 GT:AD:DP:GQ:PL 0/1:44,6:48:91:91,0,154

for f in P1003_TruSeq_1_ATCACG_R1.sorted.bam 03090_F2_TruSeq_8_ACTTGA_R1.sorted.bam do ff=${f%.sorted.bam} echo $ff java -jar /usr/local/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar \ --validation_strictness LENIENT \ -nt 4 \ -R all.fasta \ -T UnifiedGenotyper \ -I $ff.sorted.bam \ -o $ff.gatk.raw.vcf \ -L potential.variants.intervals \ --output_mode EMIT_ALL_SITES & done


Return to top Comment on this article in the forum