Realignment with high base qualities
Posted in Ask the GATK team | Last updated on 2013-03-13 12:40:59


Comments (11)

Hi,

I am currently working with a project where we have sequenced a library of approximately 70 bps insert sizes using 2x100 paired-end seq. While this can seem unnecessary, it can improve base qualities a lot.

I have used SeqPrep (https://github.com/jstjohn/SeqPrep) which strips adaptors and merges reads that overlap, in our case the entire read most of the times. This also boosts the base qualities, if a base was sequenced twice, the quality improves quite a bit. This way, base qualities can stretch up to 70 and over (probability of error 0.0001 x 0.0001 if both reads had Q40 at that base, it merged qual = 80). No funny business there. :)

However, this does not seem to play nicely with GATK. The realignment crashes (see below) saying the the base quals must be erroneous. In my case, but they are correct. Can I force GATK to work with these BQs? (--validation_strictness LENIENT didn't help as you can see below :)

cheers Daniel Klevebring

   INFO  13:04:07,408 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:07,411 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.4-7-g5e89f01, Compiled 2013/03/06 01:01:28 
   INFO  13:04:07,411 HelpFormatter - Copyright (c) 2010 The Broad Institute 
   INFO  13:04:07,411 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
   INFO  13:04:07,416 HelpFormatter - Program Args: -T RealignerTargetCreator -I /scratch/3041404/P394_102.prmdup.bam -R /bubo/proj/b2010040/private/GoldenPath/hg19/GATK_resource_bundle/human_g1k_v37_clean.fasta -o /scratch/3041404/P394_102.realn.intervals --intervals /bubo/proj/b2010040/private/GoldenPath/NG_design/1000G_REF_picard_custom_design_target_regions_HG19.bed.interval_list --validation_strictness LENIENT 
   INFO  13:04:07,416 HelpFormatter - Date/Time: 2013/03/13 13:04:07 
   INFO  13:04:07,416 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:07,416 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:08,461 GenomeAnalysisEngine - Strictness is LENIENT 
   INFO  13:04:08,632 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
   INFO  13:04:08,640 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
   INFO  13:04:08,655 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
   INFO  13:04:09,782 IntervalUtils - Processing 39772003 bp from intervals 
   INFO  13:04:10,001 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files 
   INFO  13:04:10,262 GenomeAnalysisEngine - Done creating shard strategy 
   INFO  13:04:10,262 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
   INFO  13:04:10,263 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
   INFO  13:04:18,482 GATKRunReport - Uploaded run statistics report to AWS S3 
   ##### ERROR ------------------------------------------------------------------------------------------
   ##### ERROR A USER ERROR has occurred (version 2.4-7-g5e89f01): 
   ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
   ##### ERROR Please do not post this error to the GATK forum
   ##### ERROR
   ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
   ##### ERROR Visit our website and forum for extensive documentation and answers to 
   ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
   ##### ERROR
   ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/scratch/3041404/P394_102.prmdup.bam} appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 70; please see the GATK --help documentation for options related to this error
   ##### ERROR ------------------------------------------------------------------------------------------

Return to top Comment on this article in the forum