No official posts found with tag CallableLoci

CallableLoci

Emits a data file containing information about callable, uncallable, poorly mapped, and other parts of the genome

Category Diagnostics and Quality Control Tools

Traversal LocusWalker

PartitionBy LOCUS


Overview

A very common question about a NGS set of reads is what areas of the genome are considered callable. The system considers the coverage at each locus and emits either a per base state or a summary interval BED file that partitions the genomic intervals into the following callable states:

REF_N
the reference base was an N, which is not considered callable the GATK
PASS
the base satisfied the min. depth for calling but had less than maxDepth to avoid having EXCESSIVE_COVERAGE
NO_COVERAGE
absolutely no reads were seen at this locus, regardless of the filtering parameters
LOW_COVERAGE
there were less than min. depth bases at the locus, after applying filters
EXCESSIVE_COVERAGE
more than -maxDepth read at the locus, indicating some sort of mapping problem
POOR_MAPPING_QUALITY
more than --maxFractionOfReadsWithLowMAPQ at the locus, indicating a poor mapping quality of the reads

Input

A BAM file containing exactly one sample.

Output

  • -o: a OutputFormatted (recommended BED) file with the callable status covering each base
  • -summary: a table of callable status x count of all examined bases

Examples

     -T CallableLociWalker \
     -I my.bam \
     -summary my.summary \
     -o my.bed
 

would produce a BED file (my.bed) that looks like:

     20 10000000 10000864 PASS
     20 10000865 10000985 POOR_MAPPING_QUALITY
     20 10000986 10001138 PASS
     20 10001139 10001254 POOR_MAPPING_QUALITY
     20 10001255 10012255 PASS
     20 10012256 10012259 POOR_MAPPING_QUALITY
     20 10012260 10012263 PASS
     20 10012264 10012328 POOR_MAPPING_QUALITY
     20 10012329 10012550 PASS
     20 10012551 10012551 LOW_COVERAGE
     20 10012552 10012554 PASS
     20 10012555 10012557 LOW_COVERAGE
     20 10012558 10012558 PASS
     et cetera...
 
as well as a summary table that looks like:

                        state nBases
                        REF_N 0
                     PASS 996046
                  NO_COVERAGE 121
                 LOW_COVERAGE 928
           EXCESSIVE_COVERAGE 0
         POOR_MAPPING_QUALITY 2906
 

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by CallableLoci.


Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

CallableLoci specific arguments

This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.

Name Type Default value Summary
Required
--summary File NA Name of file for output summary
Optional
-frlmq double 0.1 If the fraction of reads at a base with low mapping quality exceeds this value, the site may be poorly mapped
--maxDepth int -1 Maximum read depth before a locus is considered poorly mapped
--maxLowMAPQ byte 1 Maximum value for MAPQ to be considered a problematic mapped read.
--minBaseQuality byte 20 Minimum quality of bases to count towards depth.
--minMappingQuality byte 10 Minimum mapping quality of reads to count towards depth.
--out PrintStream stdout An output file created by the walker. Will overwrite contents if file exists
Advanced
--format OutputFormat BED Output format
--minDepth int 4 Minimum QC+ read depth before a locus is considered callable
--minDepthForLowMAPQ int 10 Minimum read depth before a locus is considered a potential candidate for poorly mapped

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--format / -format ( OutputFormat with default value BED )

Output format. The output of this walker will be written in this format. The recommended option is BED.
The --format argument is an enumerated type (OutputFormat), which can have one of the following values:

BED
The output will be written as a BED file. There's a BED element for each continuous run of callable states (i.e., PASS, REF_N, etc). This is the recommended format
STATE_PER_BASE
Emit chr start stop state quads for each base. Produces a potentially disasterously large amount of output.

-frlmq / --maxFractionOfReadsWithLowMAPQ ( double with default value 0.1 )

If the fraction of reads at a base with low mapping quality exceeds this value, the site may be poorly mapped. If the number of reads at this site is greater than minDepthForLowMAPQ and the fraction of reads with low mapping quality exceeds this fraction then the site has POOR_MAPPING_QUALITY.

--maxDepth / -maxDepth ( int with default value -1 )

Maximum read depth before a locus is considered poorly mapped. If the QC+ depth exceeds this value the site is considered to have EXCESSIVE_DEPTH

--maxLowMAPQ / -mlmq ( byte with default value 1 )

Maximum value for MAPQ to be considered a problematic mapped read.. The gap between this value and mmq are reads that are not sufficiently well mapped for calling but aren't indicative of mapping problems. For example, if maxLowMAPQ = 1 and mmq = 20, then reads with MAPQ == 0 are poorly mapped, MAPQ >= 20 are considered as contributing to calling, where reads with MAPQ >= 1 and < 20 are not bad in and of themselves but aren't sufficiently good to contribute to calling. In effect this reads are invisible, driving the base to the NO_ or LOW_COVERAGE states

--minBaseQuality / -mbq ( byte with default value 20 )

Minimum quality of bases to count towards depth.. Bases with less than minBaseQuality are viewed as not sufficiently high quality to contribute to the PASS state

--minDepth / -minDepth ( int with default value 4 )

Minimum QC+ read depth before a locus is considered callable. If the number of QC+ bases (on reads with MAPQ > minMappingQuality and with base quality > minBaseQuality) exceeds this value and is less than maxDepth the site is considered PASS.

--minDepthForLowMAPQ / -mdflmq ( int with default value 10 )

Minimum read depth before a locus is considered a potential candidate for poorly mapped. We don't want to consider a site as POOR_MAPPING_QUALITY just because it has two reads, and one is MAPQ. We won't assign a site to the POOR_MAPPING_QUALITY state unless there are at least minDepthForLowMAPQ reads covering the site.

--minMappingQuality / -mmq ( byte with default value 10 )

Minimum mapping quality of reads to count towards depth.. Reads with MAPQ > minMappingQuality are treated as usable for variation detection, contributing to the PASS state.

--out / -o ( PrintStream with default value stdout )

An output file created by the walker. Will overwrite contents if file exists.

--summary / -summary ( required File )

Name of file for output summary. Callable loci summary counts (see outputs) will be written to this file.


See also Guide Index | Technical Documentation Index | Support Forum

GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.