Tagged with #baq
1 documentation article | 0 announcements | 1 forum discussion


Comments (3)

1. Introduction

The GATK provides an implementation of the Per-Base Alignment Qualities (BAQ) developed by Heng Li in late 2010. See this SamTools page for more details.

2. Using BAQ

The BAQ algorithm is applied by the GATK engine itself, which means that all GATK walkers can potentially benefit from it. By default, BAQ is OFF, meaning that the engine will not use BAQ quality scores at all.

The GATK engine accepts the argument -baq with the following enum values:

public enum CalculationMode {
    OFF,                        // don't apply a BAQ at all, the default
    CALCULATE_AS_NECESSARY,     // do HMM BAQ calculation on the fly, as necessary, if there's no tag
    RECALCULATE                 // do HMM BAQ calculation on the fly, regardless of whether there's a tag present
}

If you want to enable BAQ, the usual thing to do is CALCULATE_AS_NECESSARY, which will calculate BAQ values if they are not in the BQ read tag. If your reads are already tagged with BQ values, then the GATK will use those. RECALCULATE will always recalculate the BAQ, regardless of the tag, which is useful if you are experimenting with the gap open penalty (see below).

If you are really an expert, the GATK allows you to specify the BAQ gap open penalty (-baqGOP) to use in the HMM. This value should be 40 by default, a good value for whole genomes and exomes for highly sensitive calls. However, if you are analyzing exome data only, you may want to use 30, which seems to result in more specific call set. We continue to play with these values some. Some walkers, where BAQ would corrupt their analyses, forbid the use of BAQ and will throw an exception if -baq is provided.

3. Some example uses of the BAQ in the GATK

  • For UnifiedGenotyper to get more specific SNP calls.

  • For PrintReads to write out a BAM file with BAQ tagged reads

  • For TableRecalibrator or IndelRealigner to write out a BAM file with BAQ tagged reads. Make sure you use -baq RECALCULATE so the engine knows to recalculate the BAQ after these tools have updated the base quality scores or the read alignments. Note that both of these tools will not use the BAQ values on input, but will write out the tags for analysis tools that will use them.

Note that some tools should not have BAQ applied to them.

This last option will be a particularly useful for people who are already doing base quality score recalibration. Suppose I have a pipeline that does:

RealignerTargetCreator
IndelRealigner

BaseRecalibrator
PrintReads (with --BQSR input)

UnifiedGenotyper

A highly efficient BAQ extended pipeline would look like

RealignerTargetCreator
IndelRealigner // don't bother with BAQ here, since we will calculate it in table recalibrator

BaseRecalibrator
PrintReads (with --BQSR input) -baq RECALCULATE // now the reads will have a BAQ tag added.  Slows the tool down some

UnifiedGenotyper -baq CALCULATE_AS_NECESSARY // UG will use the tags from TableRecalibrate, keeping UG fast

4. BAQ and walker control

Walkers can control via the @BAQMode annotation how the BAQ calculation is applied. Can either be as a tag, by overwriting the qualities scores, or by only returning the baq-capped qualities scores. Additionally, walkers can be set up to have the BAQ applied to the incoming reads (ON_INPUT, the default), to output reads (ON_OUTPUT), or HANDLED_BY_WALKER, which means that calling into the BAQ system is the responsibility of the individual walker.

No posts found with the requested search criteria.
Comments (18)

Hello

We are working with canine whole genome and exome sequence data that has been aligned to canFam3.1. When we run the Unified Genotyper tool with the following option:

-T UnifiedGenotyper -baq CALCULATE_AS_NECESSARY -glm BOTH -nt 16 -R /scratch/sswaminathan/canine_genomes_canfam3.1/canFam3.1/canFam3.1.fa -S SILENT -D /scratch/sswaminathan/canine_genomes_canfam3.1/canFam3.1/canFam3.1.dbSNP.ens72.vcf -I AF23.jir.rc.bam -l INFO -o AF23.vcf -metrics AF23.gakt.metrics

we get the error message:

ERROR MESSAGE: SAM/BAM file SAMFileReader{/scratch/sswaminathan/canine_genomes_canfam3.1/alignments/AF23/AF23.jir.rc.bam} is malformed: BAQ tag error: the BAQ value is larger than the base quality

We also tried the following options:

-T UnifiedGenotyper -baq CALCULATE_AS_NECESSARY -fixMisencodedQuals -glm BOTH -nt 16 -R /scratch/sswaminathan/canine_genomes_canfam3.1/canFam3.1/canFam3.1.fa -S SILENT -D /scratch/sswaminathan/canine_genomes_canfam3.1/canFam3.1/canFam3.1.dbSNP.ens72.vcf -I AF23.jir.rc.bam -l INFO -o AF23.vcf -metrics AF23.gakt.metrics

and

-T UnifiedGenotyper -baq RECALCULATE -fixMisencodedQuals -glm BOTH -nt 16 -R /scratch/sswaminathan/canine_genomes_canfam3.1/canFam3.1/canFam3.1.fa -S SILENT -D /scratch/sswaminathan/canine_genomes_canfam3.1/canFam3.1/canFam3.1.dbSNP.ens72.vcf -I AF23.jir.rc.bam -l INFO -o AF23.vcf -metrics AF23.gakt.metrics

but we receive the error message:

ERROR MESSAGE: Bad input: while fixing mis-encoded base qualities we encountered a read that was correctly encoded; we cannot handle such a mixture of reads so unfortunately the BAM must be fixed with some other tool

We are running The Genome Analysis Toolkit (GATK) v2.7-2-g6bda569, Compiled 2013/08/28 16:30:29.

Could you please tell us what is the cause of this error and how we can rectify it?

Thank you Yours sincerely Shanker Swaminathan