Tagged with #mqranksum
0 documentation articles | 0 announcements | 6 forum discussions

No articles to display.

No articles to display.

Created 2016-02-08 17:01:32 | Updated | Tags: variantfiltration mqranksum

Comments (3)

Hi, I am running VariantFiltration (GATK v 3.4) using the following command:

java -jar GenomeAnalysisTK.jar -T VariantFiltration -R hg19.fa -V raw_snps.vcf --filterExpression "QUAL/DP < 2.0 || DP < 10.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0 || SOR > 4.0" --missingValuesInExpressionsShouldEvaluateAsFailing --filterName "current" -o filtered_snps.vcf

Only half of the variants have the MQRankSum values and in spite of having the argument '--missingValuesInExpressionsShouldEvaluateAsFailing', the output file has many variants marked as PASS that do not have MQRankSum. Ideally, the ones without MQRankSum annotation should not be marked as PASS, right? Is this a bug issue?

Also, on the side, how do get GATK to remove the variants from the file that do not PASS. So, only have the PASS variants in the output file?

Thanks, ~N

Created 2015-09-09 21:56:12 | Updated 2015-09-09 21:56:49 | Tags: inbreedingcoeff variantannotator mqranksum baseqranksum

Comments (7)

Hi @Team,

I found that VariantAnnotator sometimes does not annotate some annotations that are requested.

A ) The Rank Sum Test annotations MQRankSum & BaseQRankSum I was not able to identify the requirements that have to be met, so they are being calculated for a variant.

B ) InbreedingCoeff This one seems to be connected to the number of total called alleles (AN). For me there needed to be at least 10% alleles be called (19/186). The doc for that says [1] "at least 10 founder samples". Maybe this has to be updated to 10%?

These are the ones I observed. Can someone tell me more about that?

Thanks, Alexander

[1] https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_InbreedingCoeff.php

Created 2014-09-19 10:45:49 | Updated 2014-09-19 10:52:00 | Tags: mappingqualityranksumtest readposranksumtest mqranksum readposranksum genotypegvcfs

Comments (24)

On 2000 samples I have run HC3.2, CGVCFs3.2, GGVCFs3.2 and VR3.2.

For the GenotypeGVCFs step I used the current default annotations:

InbreedingCoeff FisherStrand QualByDepth ChromosomeCounts GenotypeSummaries

And these non-default annotations:

MappingQualityRankSumTest ReadPosRankSumTest

When running VariantRecalibrator and plotting each of the dimensions I noticed all of the non-default annotations taking on discrete values; see bottom of this post. Is it no longer recommended to use ReadPosRankSum and MQRankSum for VR? Should I calculate these annotation with VariantAnnotator instead of GenotypeGVCFs? If I have to run VariantAnnotator, should I then run it separately for SNPs and INDELs cf. my previous question about annotations being different, when applied to BOTH and SNPs: http://gatkforums.broadinstitute.org/discussion/2620

Thank you.

zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep ReadPosRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1
  41649 ReadPosRankSum=0.731
  41760 ReadPosRankSum=0.550
  46305 ReadPosRankSum=0.720
  47060 ReadPosRankSum=0.00
  87348 ReadPosRankSum=0.406
 105254 ReadPosRankSum=0.736
 116426 ReadPosRankSum=0.727
 164855 ReadPosRankSum=0.358

zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep "MQ=" | sort | uniq -c | awk '$1>5000' | sort -k1n,1
   5802 MQ=57.05
   8382 MQ=29.00
   8525 MQ=56.62
  10069 MQ=51.77
  10574 MQ=53.95
  10682 MQ=47.12
  10818 MQ=56.04
  11553 MQ=55.21
 802603 MQ=60.00

zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep MQRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1
  21511 MQRankSum=-7.360e-01
  27222 MQRankSum=0.322
  33699 MQRankSum=0.550
  34481 MQRankSum=0.731
  37603 MQRankSum=0.720
  60729 MQRankSum=0.00
  76031 MQRankSum=0.406
  85812 MQRankSum=0.736
  98519 MQRankSum=0.727
 186092 MQRankSum=0.358
Comments (4)

I read the documentation for MappingQualityRankSumTest and ReadPosRankSumTest: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_MappingQualityRankSumTest.html http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_ReadPosRankSumTest.html

Both pages read: "The ... rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles."

I have quite a few sites for which MQRankSum and ReadPosRankSum are missing. How does VariantRecalibrator handle this missing information?

Created 2013-09-20 13:36:22 | Updated | Tags: basequalityranksumtest fisherstrand vcf mqranksum readposranksum

Comments (1)

Hey guys,

im struggeling with some statistics given by the vcf file: the Ranksumtests. I started googleing arround, but that turned out to be not helpfult for understanding it (in may case). I really have no idea how to interprete the vcf-statistic-values comming from ranksumtest. I have no clue whether a negative, positive or value near zero is good/bad. Therefore im asking for some help here. Maybe someone knows a good tutorial-page or can give me a hint to better understand the values of MQRankSum, ReadPosRankSum and BaseQRankSum. I have the same problem with the FisherStrand statistics. Many, many thanks in advance.

Created 2013-05-09 16:40:31 | Updated | Tags: unifiedgenotyper mqranksum readposranksum baseqranksum glm

Comments (2)

I have run UnifiedGenotyper with the -glm options SNP and BOTH. These two approaches yield identical variants and identical genotype likelihoods (at least the first 100k variants I checked). However, a few of the annotations have different values: BaseQRankSum MQRankSum ReadPosRankSum

-glm SNP on the left and -glm BOTH on the right:

MQRankSum=-1.762 MQRankSum=-1.785

MQRankSum=-5.307 MQRankSum=-4.970

MQRankSum=0.262 MQRankSum=-0.022

MQRankSum=-0.680 MQRankSum=-0.710

MQRankSum=1.016 MQRankSum=0.231

MQRankSum=-0.693 MQRankSum=-0.681

MQRankSum=-0.839 MQRankSum=-0.830

MQRankSum=1.924 MQRankSum=1.889

MQRankSum=-0.991 MQRankSum=-0.665

MQRankSum=-0.459 MQRankSum=-0.958

BaseQRankSum=-1.803 BaseQRankSum=-1.881

BaseQRankSum=6.918 BaseQRankSum=6.894

BaseQRankSum=-2.512 BaseQRankSum=-2.524

BaseQRankSum=2.000 BaseQRankSum=2.020

BaseQRankSum=2.095 BaseQRankSum=2.006

BaseQRankSum=2.134 BaseQRankSum=2.223

BaseQRankSum=-3.622 BaseQRankSum=-3.547

BaseQRankSum=1.569 BaseQRankSum=1.586

BaseQRankSum=-3.416 BaseQRankSum=-3.733

BaseQRankSum=-1.745 BaseQRankSum=-1.769

ReadPosRankSum=-0.341 ReadPosRankSum=-0.280

ReadPosRankSum=4.207 ReadPosRankSum=4.190

ReadPosRankSum=-3.809 ReadPosRankSum=-3.832

ReadPosRankSum=-2.047 ReadPosRankSum=-2.060

ReadPosRankSum=-1.279 ReadPosRankSum=-1.232

ReadPosRankSum=-3.921 ReadPosRankSum=-3.955

ReadPosRankSum=-1.500 ReadPosRankSum=-1.486

ReadPosRankSum=-0.374 ReadPosRankSum=-0.403

ReadPosRankSum=3.209 ReadPosRankSum=3.188

ReadPosRankSum=1.889 ReadPosRankSum=1.868

Why is that?

I noticed another user got different variants, but I get the same variants and the same likelihoods: [http://gatkforums.broadinstitute.org/discussion/1782/unifiedgenotyper-different-glm-value-result-in-different-sets-of-variants]

I ran single threaded.

I use MQRankSum and ReadPosRankSum for VariantRecalibrator, so it affects my downstream results, if the annotations are -glm dependent. Hence I am asking my question. I hope you can illuminate me. Thank you.