Tagged with #mappingqualityranksumtest
0 documentation articles | 0 announcements | 3 forum discussions

No articles to display.

No articles to display.

Created 2014-09-19 10:45:49 | Updated 2014-09-19 10:52:00 | Tags: mappingqualityranksumtest readposranksumtest mqranksum readposranksum genotypegvcfs

Comments (24)

On 2000 samples I have run HC3.2, CGVCFs3.2, GGVCFs3.2 and VR3.2.

For the GenotypeGVCFs step I used the current default annotations:

InbreedingCoeff FisherStrand QualByDepth ChromosomeCounts GenotypeSummaries

And these non-default annotations:

MappingQualityRankSumTest ReadPosRankSumTest

When running VariantRecalibrator and plotting each of the dimensions I noticed all of the non-default annotations taking on discrete values; see bottom of this post. Is it no longer recommended to use ReadPosRankSum and MQRankSum for VR? Should I calculate these annotation with VariantAnnotator instead of GenotypeGVCFs? If I have to run VariantAnnotator, should I then run it separately for SNPs and INDELs cf. my previous question about annotations being different, when applied to BOTH and SNPs: http://gatkforums.broadinstitute.org/discussion/2620

Thank you.

zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep ReadPosRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1
  41649 ReadPosRankSum=0.731
  41760 ReadPosRankSum=0.550
  46305 ReadPosRankSum=0.720
  47060 ReadPosRankSum=0.00
  87348 ReadPosRankSum=0.406
 105254 ReadPosRankSum=0.736
 116426 ReadPosRankSum=0.727
 164855 ReadPosRankSum=0.358

zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep "MQ=" | sort | uniq -c | awk '$1>5000' | sort -k1n,1
   5802 MQ=57.05
   8382 MQ=29.00
   8525 MQ=56.62
  10069 MQ=51.77
  10574 MQ=53.95
  10682 MQ=47.12
  10818 MQ=56.04
  11553 MQ=55.21
 802603 MQ=60.00

zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep MQRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1
  21511 MQRankSum=-7.360e-01
  27222 MQRankSum=0.322
  33699 MQRankSum=0.550
  34481 MQRankSum=0.731
  37603 MQRankSum=0.720
  60729 MQRankSum=0.00
  76031 MQRankSum=0.406
  85812 MQRankSum=0.736
  98519 MQRankSum=0.727
 186092 MQRankSum=0.358
Comments (4)

I read the documentation for MappingQualityRankSumTest and ReadPosRankSumTest: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_MappingQualityRankSumTest.html http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_ReadPosRankSumTest.html

Both pages read: "The ... rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles."

I have quite a few sites for which MQRankSum and ReadPosRankSum are missing. How does VariantRecalibrator handle this missing information?

Created 2012-11-19 02:07:53 | Updated | Tags: mappingqualityranksumtest readposranksumtest variantannotator

Comments (11)


I'm attempting to use Variant Annotator to annotate some VCFs produced by samtools so I can run VQSR on them. Unfortunately I've gottent stuck and I'm trying to figure out why Variant Annotator wouldn't be annotating INDELs with MappingQualityRankSumTest and ReadPosRankSumTest, it seems to annotate SNPs fine. There are both Homs and het's called on the sample. Could it be I need to left align the indels to get enough coverage? What would you suggest is the best way to debug this? Is there a way to make GATK behave more verbosely about why it's refusing an annotation?

Thanks Martin