I am looking at the distribution of quality scores (QUAL column of VCF) emitted by the Unified Genotyper v1.4 (old version, sorry :)) for SNPs called on 769 samples with 12x coverage HiSeq data. The median quality score is at ~Q750 and the mean at ~Q28000. I was just wondering if you could comment on how well calibrated these quality scores are and whether/how the number of samples and/or coverage would affect them?
Thanks in advance, Laurent
I was just wondering if anyone uses GATK's SelectVariants walker to call de novo mutations (Mendelian violations) and, if so, what -mvq cut-off do they use? My data is exome sequencing with a large range of read depths - from mean target coverage of 14X to >50X.
According to the link http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41.
quality score (phred score) is defined as below. (i.e. 1% error rate is equal to phred score of 20 (-10xlog 0.01))
QUAL phred-scaled quality score for the assertion made in ALT. i.e. -10log_10 prob(call in ALT is wrong). If ALT is ”.” (no variant) then this is -10log_10 p(variant), and if ALT is not ”.” this is -10log_10p(no variant). High QUAL scores indicate high confidence calls. Although traditionally people use integer phred scores, this field is permitted to be a floating point to enable higher resolution for low confidence calls if desired. If unknown, the missing value should be specified. (Numeric)
Using GATK to generate vcf files and looking through the quality column of those files, I found out that the max quality score is 441,453 which is extremely huge number.
I wonder if the quality score of GATK tool follows the phred score system; if not, how do you calculate the quality score and what do the numbers of quality score represent?
Look forward to hearing back from you soon and thank you very much.
Hi, I got errors when ran GATK RealignerTargetCreator and IndelRealigner in v2.4.9. I've checked many related discussions and comments. First, I got an error like "we encountered an extremely high quality score of 69" with option -S LENIENT and the GATK program stalled. So I added "--fix_misencoded_quality_scores", and then I got different error message "ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '0'" now. I tried older versions of GATK and both java 1.6 and 1.7. I'm hoping that you can help this. Please let me know if I'm missing something. Thanks!
Hi, I use GATK for Variant Call in an Investigation Unit. When I use RealignerTargetCreator, I get an error: I'm using a wrong encoding for quiality scores. Which encoding (sanger, illumina, solexa) is ok to use with GATK? Do you have any tool to convert to that encoding? Thank you
Hello dear GATK Team,
since Version 2.3 I get the following error with some Lifescope 2.5 mapped SoLID exome Bam files: "[...]appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 64; please see the GATK --help documentation for options related to this error".
After carefully seaching the forum I found this discussion: gatkforums.broadinstitute.org/discussion/1592/baserecalibrator-error where ebanks offered the "--allow_potentially_misencoded_quality_scores" argument as solution. Actually this seemed to work at first, all walkers with the argument applied don't crash any more.
The Problem is that UnifiedGenotyper and HaplotypeCaller seem to somehow ignore the reads (or something else...) because in these exomes both call only about 3000 variants, allthough they seem to process the whole file judged by the runtime and logfiles.
The exomes used to work and had normal calls prior to GATK 2.3.
(the argument "--fix_misencoded_quality_scores" / "-fixMisencodedQuals" as mentioned in this post: gatkforums.broadinstitute.org/discussion/1991/version-highlights-for-gatk-version-2-3 messes things up more for the Lifescope BAMs)
Hi :), I'am new to NGS and have a questions reagarding the filtering. I have 13 individuals, 3 of them with a coverage of 11x-15x, the rest with a coverage of 5x-7x. I did multi-sample SNPcalling and hard filtering, the latter as there are no known SNPs so far. Now I am not sure how to set the minimal SNP quality (QUAL). On the best practise it is suggested to juse at least 30 for individuals with a coverage of at least 30, but 4 if the coverage is below 10. So what is the best way to set the QUAL filter?
many thanks in advance