I am interested in the details of UnifiedGenotyper's statistical model. I came across GATKPaperGenotyper and was told that the statistical model applied there is not feasible for applying it to real-world data. Nevertheless, I am interested how the UnifiedGenotyper now works in detail, since I originally thought that GATKPaperGenotyper's model would have been applied by the UnifiedGenotyper. I cannot find any other documentation than from code (which is unfortunately not very detailed, and the code itself not being that expressive) except for this slide: http://www.broadinstitute.org/gatk/guide/article?id=1237
Unfortunately, I cannot find how P(b|G) is exactly calculated or what aspects are considered for calculation of P(G) and P(D|G)...
Any explanations, recommendations, or further references would be very appreciated!
I just walked a bit through GATK code and came across the class org.broadinstitute.sting.gatk.examples.GATKPaperGenotyper. The comments state that this genotyper is only intended to be used as example in the GATK paper and that it uses a much simpler model for calling than UnifiedGenotyper.
At first glance, this genotyper seems to apply the methods I thought GATK's UnifiedGenotyper would use in general (heterozygosity values for prior probability and using the base quality scores for genotype likelihood). So now I am interested in what way this simple genotyper differs from the strategies used in the actual tool - what other factors are considered in UnifiedGenotyper's statistical model? Up to now I could not figure out the actual differences to UnifiedGenotyper from just browsing through its code...
And what is more, how applicable is the simpler model for SNP calling in practice?
I'm trying to call variants from bowtie-aligned reads, I used PrintReads with ReassignMappingQuality filter to give all reads a mapping score of 60 to replace default value of 255. However, I'm wondering if this assignment would introduce any bias in variant calling.
Thanks a lot!