Does GATK BaseRecalibrator work with Bam files produces with the SOLID Lifescope mapper?
You show in the a base quality recalibration presentation that recalibration also should work on SOLID data. But you don't mention if it also works for Bam files produced with lifescope. BWA mapping quality is from 0-37 , Lifescope mapping quality is from 0 - 95.
I get an ArrayIndexOutOfBoundsException on the lifescope Bam files.
`##### ERROR ------------------------------------------------------------------------------------------
java.lang.ArrayIndexOutOfBoundsException: -92 at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158) at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:225) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530) at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateBAQArray(BaseRecalibrator.java:428) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:243) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:248) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:219) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:91) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:55) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
I am currently working on a Exome sequencing projekt with older single-end SOLiD exomes and new paired-end exomes. In a first attempt (GATK 1.7 and best practices v3 back then) i tried calling and recalibrating all exomes together (at that time 120) without selecting for paired/single-end. As I already had validatet many variants I could check the quality of the calls and got very bad results, especially for InDels (previously called, true positive variants missing). My idee is that the UnifiedGenotyper has Problems mixing paired-end with single-end exomes.
Is there any official recommendation for this problem? My solution right now is to group the exomes in batches (40-50 Exomes) which ran on the same technology.
Also a second Problem/Question: For some individuals exomes where sequenced twice, and for some of these the first run was single-end and the second one was paired. The best practices mentions one should se all available reads for a individual for calling. Do you have any experience on how to handle these cases?
Any help is greatly appreciated!
Hi, I am working on SOLiD 5500xl data and used SHRiMP2.2.3 for performing mapping. The library type is paired-end. I have read some discussions regarding SOLiD problem but I still have some doubts regarding some steps in best practices
So after mapping, I simply used GATK UnifiedGenotyper to call SNPs and InDels under default values. I end up getting around 40 million variants. Is there any way I can get a more refined variant calling? Do you still consider me applying the above pre-processing steps or do you recommend me applying some variant filteratiion on the called variants? If yes for the previous, then could you explain how my above concerns are taken care of? I was trying to look at some general recommended filter values on INFO fields in VCF format such as BQ, MQ, MQ0, DP, SB etc. Do you recommend some generally used values of these fields on which I can filter and hence refine my variant data?
I may have posted a subset of the above question, which I am not sure was posted successfully since at that time I just created an account. If you have already answered this question then I apologize for that. Could you then provide me the link where you answered it?
Thanks in advance
Hello dear GATK Team,
since Version 2.3 I get the following error with some Lifescope 2.5 mapped SoLID exome Bam files: "[...]appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 64; please see the GATK --help documentation for options related to this error".
After carefully seaching the forum I found this discussion: gatkforums.broadinstitute.org/discussion/1592/baserecalibrator-error where ebanks offered the "--allow_potentially_misencoded_quality_scores" argument as solution. Actually this seemed to work at first, all walkers with the argument applied don't crash any more.
The Problem is that UnifiedGenotyper and HaplotypeCaller seem to somehow ignore the reads (or something else...) because in these exomes both call only about 3000 variants, allthough they seem to process the whole file judged by the runtime and logfiles.
The exomes used to work and had normal calls prior to GATK 2.3.
(the argument "--fix_misencoded_quality_scores" / "-fixMisencodedQuals" as mentioned in this post: gatkforums.broadinstitute.org/discussion/1991/version-highlights-for-gatk-version-2-3 messes things up more for the Lifescope BAMs)