According to the link http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41.
quality score (phred score) is defined as below. (i.e. 1% error rate is equal to phred score of 20 (-10xlog 0.01))
QUAL phred-scaled quality score for the assertion made in ALT. i.e. -10log_10 prob(call in ALT is wrong). If ALT is ”.” (no variant) then this is -10log_10 p(variant), and if ALT is not ”.” this is -10log_10p(no variant). High QUAL scores indicate high confidence calls. Although traditionally people use integer phred scores, this field is permitted to be a floating point to enable higher resolution for low confidence calls if desired. If unknown, the missing value should be specified. (Numeric)
Using GATK to generate vcf files and looking through the quality column of those files, I found out that the max quality score is 441,453 which is extremely huge number.
I wonder if the quality score of GATK tool follows the phred score system; if not, how do you calculate the quality score and what do the numbers of quality score represent?
Look forward to hearing back from you soon and thank you very much.
Hi, I got errors when ran GATK RealignerTargetCreator and IndelRealigner in v2.4.9. I've checked many related discussions and comments. First, I got an error like "we encountered an extremely high quality score of 69" with option -S LENIENT and the GATK program stalled. So I added "--fix_misencoded_quality_scores", and then I got different error message "ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '0'" now. I tried older versions of GATK and both java 1.6 and 1.7. I'm hoping that you can help this. Please let me know if I'm missing something. Thanks!
Hi, I use GATK for Variant Call in an Investigation Unit. When I use RealignerTargetCreator, I get an error: I'm using a wrong encoding for quiality scores. Which encoding (sanger, illumina, solexa) is ok to use with GATK? Do you have any tool to convert to that encoding? Thank you
Hello dear GATK Team,
since Version 2.3 I get the following error with some Lifescope 2.5 mapped SoLID exome Bam files: "[...]appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 64; please see the GATK --help documentation for options related to this error".
After carefully seaching the forum I found this discussion: gatkforums.broadinstitute.org/discussion/1592/baserecalibrator-error where ebanks offered the "--allow_potentially_misencoded_quality_scores" argument as solution. Actually this seemed to work at first, all walkers with the argument applied don't crash any more.
The Problem is that UnifiedGenotyper and HaplotypeCaller seem to somehow ignore the reads (or something else...) because in these exomes both call only about 3000 variants, allthough they seem to process the whole file judged by the runtime and logfiles.
The exomes used to work and had normal calls prior to GATK 2.3.
(the argument "--fix_misencoded_quality_scores" / "-fixMisencodedQuals" as mentioned in this post: gatkforums.broadinstitute.org/discussion/1991/version-highlights-for-gatk-version-2-3 messes things up more for the Lifescope BAMs)