I found that VariantAnnotator sometimes does not annotate some annotations that are requested.
A ) The Rank Sum Test annotations MQRankSum & BaseQRankSum I was not able to identify the requirements that have to be met, so they are being calculated for a variant.
B ) InbreedingCoeff This one seems to be connected to the number of total called alleles (AN). For me there needed to be at least 10% alleles be called (19/186). The doc for that says  "at least 10 founder samples". Maybe this has to be updated to 10%?
These are the ones I observed. Can someone tell me more about that?
On 2000 samples I have run HC3.2, CGVCFs3.2, GGVCFs3.2 and VR3.2.
For the GenotypeGVCFs step I used the current default annotations:
InbreedingCoeff FisherStrand QualByDepth ChromosomeCounts GenotypeSummaries
And these non-default annotations:
When running VariantRecalibrator and plotting each of the dimensions I noticed all of the non-default annotations taking on discrete values; see bottom of this post. Is it no longer recommended to use ReadPosRankSum and MQRankSum for VR? Should I calculate these annotation with VariantAnnotator instead of GenotypeGVCFs? If I have to run VariantAnnotator, should I then run it separately for SNPs and INDELs cf. my previous question about annotations being different, when applied to BOTH and SNPs: http://gatkforums.broadinstitute.org/discussion/2620
zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep ReadPosRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1 41649 ReadPosRankSum=0.731 41760 ReadPosRankSum=0.550 46305 ReadPosRankSum=0.720 47060 ReadPosRankSum=0.00 87348 ReadPosRankSum=0.406 105254 ReadPosRankSum=0.736 116426 ReadPosRankSum=0.727 164855 ReadPosRankSum=0.358 zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep "MQ=" | sort | uniq -c | awk '$1>5000' | sort -k1n,1 5802 MQ=57.05 8382 MQ=29.00 8525 MQ=56.62 10069 MQ=51.77 10574 MQ=53.95 10682 MQ=47.12 10818 MQ=56.04 11553 MQ=55.21 802603 MQ=60.00 zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep MQRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1 21511 MQRankSum=-7.360e-01 27222 MQRankSum=0.322 33699 MQRankSum=0.550 34481 MQRankSum=0.731 37603 MQRankSum=0.720 60729 MQRankSum=0.00 76031 MQRankSum=0.406 85812 MQRankSum=0.736 98519 MQRankSum=0.727 186092 MQRankSum=0.358
I read the documentation for MappingQualityRankSumTest and ReadPosRankSumTest: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_MappingQualityRankSumTest.html http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_ReadPosRankSumTest.html
Both pages read: "The ... rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles."
I have quite a few sites for which MQRankSum and ReadPosRankSum are missing. How does VariantRecalibrator handle this missing information?
m struggeling with some statistics given by the vcf file: the Ranksumtests. I started googleing arround, but that turned out to be not helpfult for understanding it (in may case). I really have no idea how to interprete the vcf-statistic-values comming from ranksumtest. I have no clue whether a negative, positive or value near zero is good/bad. Therefore im asking for some help here. Maybe someone knows a good tutorial-page or can give me a hint to better understand the values of MQRankSum, ReadPosRankSum and BaseQRankSum. I have the same problem with the FisherStrand statistics. Many, many thanks in advance.
I have run UnifiedGenotyper with the -glm options SNP and BOTH. These two approaches yield identical variants and identical genotype likelihoods (at least the first 100k variants I checked). However, a few of the annotations have different values: BaseQRankSum MQRankSum ReadPosRankSum
-glm SNP on the left and -glm BOTH on the right:
Why is that?
I noticed another user got different variants, but I get the same variants and the same likelihoods: [http://gatkforums.broadinstitute.org/discussion/1782/unifiedgenotyper-different-glm-value-result-in-different-sets-of-variants]
I ran single threaded.
I use MQRankSum and ReadPosRankSum for VariantRecalibrator, so it affects my downstream results, if the annotations are -glm dependent. Hence I am asking my question. I hope you can illuminate me. Thank you.