I was plotting the distribution of BaseQRankSum and noticed a large number of variants with a BaseQRankSum outside of Z-score of +/- 2, which suggests that a lot of variants have significant base quality differences between the REF and ALT. I plotted the distribution of ClippingRankSum, MQRankSum, and ReadPosRankSum and the majority of variants had Z-scores inside +/- 2.
Is this typical and what is this suggestive of? I followed the best practices for DNA sequencing using GATK3.
I found this post (http://gatkforums.broadinstitute.org/discussion/2035/z-scores-for-baseqranksum), which is similar to what I'm asking but has a different distribution of Z-scores.
Thank you in advance.
I have run UnifiedGenotyper with the -glm options SNP and BOTH. These two approaches yield identical variants and identical genotype likelihoods (at least the first 100k variants I checked). However, a few of the annotations have different values: BaseQRankSum MQRankSum ReadPosRankSum
-glm SNP on the left and -glm BOTH on the right:
Why is that?
I noticed another user got different variants, but I get the same variants and the same likelihoods: [http://gatkforums.broadinstitute.org/discussion/1782/unifiedgenotyper-different-glm-value-result-in-different-sets-of-variants]
I ran single threaded.
I use MQRankSum and ReadPosRankSum for VariantRecalibrator, so it affects my downstream results, if the annotations are -glm dependent. Hence I am asking my question. I hope you can illuminate me. Thank you.