On 2000 samples I have run HC3.2, CGVCFs3.2, GGVCFs3.2 and VR3.2.
For the GenotypeGVCFs step I used the current default annotations:
InbreedingCoeff FisherStrand QualByDepth ChromosomeCounts GenotypeSummaries
And these non-default annotations:
When running VariantRecalibrator and plotting each of the dimensions I noticed all of the non-default annotations taking on discrete values; see bottom of this post. Is it no longer recommended to use ReadPosRankSum and MQRankSum for VR? Should I calculate these annotation with VariantAnnotator instead of GenotypeGVCFs? If I have to run VariantAnnotator, should I then run it separately for SNPs and INDELs cf. my previous question about annotations being different, when applied to BOTH and SNPs: http://gatkforums.broadinstitute.org/discussion/2620
zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep ReadPosRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1 41649 ReadPosRankSum=0.731 41760 ReadPosRankSum=0.550 46305 ReadPosRankSum=0.720 47060 ReadPosRankSum=0.00 87348 ReadPosRankSum=0.406 105254 ReadPosRankSum=0.736 116426 ReadPosRankSum=0.727 164855 ReadPosRankSum=0.358 zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep "MQ=" | sort | uniq -c | awk '$1>5000' | sort -k1n,1 5802 MQ=57.05 8382 MQ=29.00 8525 MQ=56.62 10069 MQ=51.77 10574 MQ=53.95 10682 MQ=47.12 10818 MQ=56.04 11553 MQ=55.21 802603 MQ=60.00 zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep MQRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1 21511 MQRankSum=-7.360e-01 27222 MQRankSum=0.322 33699 MQRankSum=0.550 34481 MQRankSum=0.731 37603 MQRankSum=0.720 60729 MQRankSum=0.00 76031 MQRankSum=0.406 85812 MQRankSum=0.736 98519 MQRankSum=0.727 186092 MQRankSum=0.358
I read the documentation for MappingQualityRankSumTest and ReadPosRankSumTest: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_MappingQualityRankSumTest.html http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_ReadPosRankSumTest.html
Both pages read: "The ... rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles."
I have quite a few sites for which MQRankSum and ReadPosRankSum are missing. How does VariantRecalibrator handle this missing information?
I'm attempting to use Variant Annotator to annotate some VCFs produced by samtools so I can run VQSR on them. Unfortunately I've gottent stuck and I'm trying to figure out why Variant Annotator wouldn't be annotating INDELs with MappingQualityRankSumTest and ReadPosRankSumTest, it seems to annotate SNPs fine. There are both Homs and het's called on the sample. Could it be I need to left align the indels to get enough coverage? What would you suggest is the best way to debug this? Is there a way to make GATK behave more verbosely about why it's refusing an annotation?