Tagged with #readposranksum
0 documentation articles | 0 announcements | 4 forum discussions

No articles to display.

No articles to display.

Created 2014-09-19 10:45:49 | Updated 2014-09-19 10:52:00 | Tags: mappingqualityranksumtest readposranksumtest mqranksum readposranksum genotypegvcfs

Comments (24)

On 2000 samples I have run HC3.2, CGVCFs3.2, GGVCFs3.2 and VR3.2.

For the GenotypeGVCFs step I used the current default annotations:

InbreedingCoeff FisherStrand QualByDepth ChromosomeCounts GenotypeSummaries

And these non-default annotations:

MappingQualityRankSumTest ReadPosRankSumTest

When running VariantRecalibrator and plotting each of the dimensions I noticed all of the non-default annotations taking on discrete values; see bottom of this post. Is it no longer recommended to use ReadPosRankSum and MQRankSum for VR? Should I calculate these annotation with VariantAnnotator instead of GenotypeGVCFs? If I have to run VariantAnnotator, should I then run it separately for SNPs and INDELs cf. my previous question about annotations being different, when applied to BOTH and SNPs: http://gatkforums.broadinstitute.org/discussion/2620

Thank you.

zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep ReadPosRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1
  41649 ReadPosRankSum=0.731
  41760 ReadPosRankSum=0.550
  46305 ReadPosRankSum=0.720
  47060 ReadPosRankSum=0.00
  87348 ReadPosRankSum=0.406
 105254 ReadPosRankSum=0.736
 116426 ReadPosRankSum=0.727
 164855 ReadPosRankSum=0.358

zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep "MQ=" | sort | uniq -c | awk '$1>5000' | sort -k1n,1
   5802 MQ=57.05
   8382 MQ=29.00
   8525 MQ=56.62
  10069 MQ=51.77
  10574 MQ=53.95
  10682 MQ=47.12
  10818 MQ=56.04
  11553 MQ=55.21
 802603 MQ=60.00

zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep MQRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1
  21511 MQRankSum=-7.360e-01
  27222 MQRankSum=0.322
  33699 MQRankSum=0.550
  34481 MQRankSum=0.731
  37603 MQRankSum=0.720
  60729 MQRankSum=0.00
  76031 MQRankSum=0.406
  85812 MQRankSum=0.736
  98519 MQRankSum=0.727
 186092 MQRankSum=0.358
Comments (4)

I read the documentation for MappingQualityRankSumTest and ReadPosRankSumTest: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_MappingQualityRankSumTest.html http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_ReadPosRankSumTest.html

Both pages read: "The ... rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles."

I have quite a few sites for which MQRankSum and ReadPosRankSum are missing. How does VariantRecalibrator handle this missing information?

Created 2013-09-20 13:36:22 | Updated | Tags: basequalityranksumtest fisherstrand vcf mqranksum readposranksum

Comments (1)

Hey guys,

im struggeling with some statistics given by the vcf file: the Ranksumtests. I started googleing arround, but that turned out to be not helpfult for understanding it (in may case). I really have no idea how to interprete the vcf-statistic-values comming from ranksumtest. I have no clue whether a negative, positive or value near zero is good/bad. Therefore im asking for some help here. Maybe someone knows a good tutorial-page or can give me a hint to better understand the values of MQRankSum, ReadPosRankSum and BaseQRankSum. I have the same problem with the FisherStrand statistics. Many, many thanks in advance.

Created 2013-05-09 16:40:31 | Updated | Tags: unifiedgenotyper mqranksum readposranksum baseqranksum glm

Comments (2)

I have run UnifiedGenotyper with the -glm options SNP and BOTH. These two approaches yield identical variants and identical genotype likelihoods (at least the first 100k variants I checked). However, a few of the annotations have different values: BaseQRankSum MQRankSum ReadPosRankSum

-glm SNP on the left and -glm BOTH on the right:

MQRankSum=-1.762 MQRankSum=-1.785

MQRankSum=-5.307 MQRankSum=-4.970

MQRankSum=0.262 MQRankSum=-0.022

MQRankSum=-0.680 MQRankSum=-0.710

MQRankSum=1.016 MQRankSum=0.231

MQRankSum=-0.693 MQRankSum=-0.681

MQRankSum=-0.839 MQRankSum=-0.830

MQRankSum=1.924 MQRankSum=1.889

MQRankSum=-0.991 MQRankSum=-0.665

MQRankSum=-0.459 MQRankSum=-0.958

BaseQRankSum=-1.803 BaseQRankSum=-1.881

BaseQRankSum=6.918 BaseQRankSum=6.894

BaseQRankSum=-2.512 BaseQRankSum=-2.524

BaseQRankSum=2.000 BaseQRankSum=2.020

BaseQRankSum=2.095 BaseQRankSum=2.006

BaseQRankSum=2.134 BaseQRankSum=2.223

BaseQRankSum=-3.622 BaseQRankSum=-3.547

BaseQRankSum=1.569 BaseQRankSum=1.586

BaseQRankSum=-3.416 BaseQRankSum=-3.733

BaseQRankSum=-1.745 BaseQRankSum=-1.769

ReadPosRankSum=-0.341 ReadPosRankSum=-0.280

ReadPosRankSum=4.207 ReadPosRankSum=4.190

ReadPosRankSum=-3.809 ReadPosRankSum=-3.832

ReadPosRankSum=-2.047 ReadPosRankSum=-2.060

ReadPosRankSum=-1.279 ReadPosRankSum=-1.232

ReadPosRankSum=-3.921 ReadPosRankSum=-3.955

ReadPosRankSum=-1.500 ReadPosRankSum=-1.486

ReadPosRankSum=-0.374 ReadPosRankSum=-0.403

ReadPosRankSum=3.209 ReadPosRankSum=3.188

ReadPosRankSum=1.889 ReadPosRankSum=1.868

Why is that?

I noticed another user got different variants, but I get the same variants and the same likelihoods: [http://gatkforums.broadinstitute.org/discussion/1782/unifiedgenotyper-different-glm-value-result-in-different-sets-of-variants]

I ran single threaded.

I use MQRankSum and ReadPosRankSum for VariantRecalibrator, so it affects my downstream results, if the annotations are -glm dependent. Hence I am asking my question. I hope you can illuminate me. Thank you.