DepthOfCoverage

Toolbox for assessing sequence coverage by a wide array of metrics, partitioned by sample, read group, or library

Category Diagnostics and Quality Control Tools

Traversal LocusWalker

PartitionBy NONE


Overview

Coverage processes a set of bam files to determine coverage at different levels of partitioning and aggregation. Coverage can be analyzed per locus, per interval, per gene, or in total; can be partitioned by sample, by read group, by technology, by center, or by library; and can be summarized by mean, median, quartiles, and/or percentage of bases covered to or beyond a threshold. Additionally, reads and bases can be filtered by mapping or base quality score.

Input

One or more bam files (with proper headers) to be analyzed for coverage statistics

(Optional) A REFSEQ Rod to aggregate coverage to the gene level

(for information about creating the REFSEQ Rod, please consult the RefSeqCodec documentation)

Output

Tables pertaining to different coverage summaries. Suffix on the table files declares the contents:

- no suffix: per locus coverage

- _summary: total, mean, median, quartiles, and threshold proportions, aggregated over all bases

- _statistics: coverage histograms (# locus with X coverage), aggregated over all bases

- _interval_summary: total, mean, median, quartiles, and threshold proportions, aggregated per interval

- _interval_statistics: 2x2 table of # of intervals covered to >= X depth in >=Y samples

- _gene_summary: total, mean, median, quartiles, and threshold proportions, aggregated per gene

- _gene_statistics: 2x2 table of # of genes covered to >= X depth in >= Y samples

- _cumulative_coverage_counts: coverage histograms (# locus with >= X coverage), aggregated over all bases

- _cumulative_coverage_proportions: proprotions of loci with >= X coverage, aggregated over all bases

Examples

 java -Xmx2g -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T Coverage \
   -o file_name_base \
   -I input_bams.list
   [-geneList refSeq.sorted.txt] \
   [-pt readgroup] \
   [-ct 4 -ct 6 -ct 10] \
   [-L my_capture_genes.interval_list]
 

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by DepthOfCoverage.

Parallelism options

This tool can be run in multi-threaded mode using this option.

Downsampling settings

This tool overrides the engine's default downsampling settings.

  • Mode: NONE
  • To coverage: 2,147,483,647

Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

DepthOfCoverage specific arguments

This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.

Name Type Default value Summary
Optional
--calculateCoverageOverGenes File NA Calculate the coverage statistics over this list of genes. Currently accepts RefSeq.
--countType CountPileupType COUNT_READS How should overlapping reads from the same fragment be handled?
--maxBaseQuality byte 127 Maximum quality of bases to count towards depth. Defaults to 127 (Byte.MAX_VALUE).
--maxMappingQuality int 2147483647 Maximum mapping quality of reads to count towards depth. Defaults to 2^31-1 (Integer.MAX_VALUE).
--minBaseQuality byte -1 Minimum quality of bases to count towards depth. Defaults to -1.
--minMappingQuality int -1 Minimum mapping quality of reads to count towards depth. Defaults to -1.
--omitDepthOutputAtEachBase boolean false Will omit the output of the depth of coverage at each base, which should result in speedup
--omitIntervalStatistics boolean false Will omit the per-interval statistics section, which should result in speedup
--omitLocusTable boolean false Will not calculate the per-sample per-depth counts of loci, which should result in speedup
--omitPerSampleStats boolean false Omits the summary files per-sample. These statistics are still calculated, so this argument will not improve runtime.
--out Map[DoCOutputType,PrintStream] None An output file created by the walker. Will overwrite contents if file exists
--outputFormat String rtable the format of the output file (e.g. csv, table, rtable); defaults to r-readable table
--partitionType Set[Partition] [sample] Partition type for depth of coverage. Defaults to sample. Can be any combination of sample, readgroup, library.
--printBaseCounts boolean false Will add base counts to per-locus output.
Advanced
--ignoreDeletionSites boolean false Ignore sites consisting only of deletions
--includeDeletions boolean false Include information on deletions
--includeRefNSites boolean false If provided, sites with reference N bases but with coverage from neighboring reads will be included in DoC calculations.
--nBins int 499 Number of bins to use for granular binning
--printBinEndpointsAndExit boolean false Prints the bin values and exits immediately. Use to calibrate what bins you want before running on data.
--start int 1 Starting (left endpoint) for granular binning
--stop int 500 Ending (right endpoint) for granular binning
--summaryCoverageThreshold int[] [15] for summary file outputs, report the % of bases coverd to >= this number. Defaults to 15; can take multiple arguments.

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--calculateCoverageOverGenes / -geneList ( File )

Calculate the coverage statistics over this list of genes. Currently accepts RefSeq.. Path to the RefSeq file for use in aggregating coverage statistics over genes

--countType ( CountPileupType with default value COUNT_READS )

How should overlapping reads from the same fragment be handled?.
The --countType argument is an enumerated type (CountPileupType), which can have one of the following values:

COUNT_READS
Count all reads independently (even if from the same fragment).
COUNT_FRAGMENTS
Count all fragments (even if the reads that compose the fragment are not consistent at that base).
COUNT_FRAGMENTS_REQUIRE_SAME_BASE
Count all fragments (but only if the reads that compose the fragment are consistent at that base).

--ignoreDeletionSites ( boolean with default value false )

Ignore sites consisting only of deletions.

--includeDeletions / -dels ( boolean with default value false )

Include information on deletions. Consider a spanning deletion as contributing to coverage. Also enables deletion counts in per-base output.

--includeRefNSites ( boolean with default value false )

If provided, sites with reference N bases but with coverage from neighboring reads will be included in DoC calculations..

--maxBaseQuality ( byte with default value 127 )

Maximum quality of bases to count towards depth. Defaults to 127 (Byte.MAX_VALUE)..

--maxMappingQuality ( int with default value 2147483647 )

Maximum mapping quality of reads to count towards depth. Defaults to 2^31-1 (Integer.MAX_VALUE)..

--minBaseQuality / -mbq ( byte with default value -1 )

Minimum quality of bases to count towards depth. Defaults to -1..

--minMappingQuality / -mmq ( int with default value -1 )

Minimum mapping quality of reads to count towards depth. Defaults to -1..

--nBins ( int with default value 499 )

Number of bins to use for granular binning. Sets the number of bins for granular binning

--omitDepthOutputAtEachBase / -omitBaseOutput ( boolean with default value false )

Will omit the output of the depth of coverage at each base, which should result in speedup. Do not print the total coverage at every base

--omitIntervalStatistics / -omitIntervals ( boolean with default value false )

Will omit the per-interval statistics section, which should result in speedup. Do not tabulate interval statistics (mean, median, quartiles AND # intervals by sample by coverage)

--omitLocusTable / -omitLocusTable ( boolean with default value false )

Will not calculate the per-sample per-depth counts of loci, which should result in speedup. Do not tabulate locus statistics (# loci covered by sample by coverage)

--omitPerSampleStats / -omitSampleSummary ( boolean with default value false )

Omits the summary files per-sample. These statistics are still calculated, so this argument will not improve runtime.. Do not tabulate the sample summary statistics (total, mean, median, quartile coverage per sample)

--out / -o ( Map[DoCOutputType,PrintStream] with default value None )

An output file created by the walker. Will overwrite contents if file exists.

--outputFormat ( String with default value rtable )

the format of the output file (e.g. csv, table, rtable); defaults to r-readable table. The format of the output file

--partitionType / -pt ( Set[Partition] with default value [sample] )

Partition type for depth of coverage. Defaults to sample. Can be any combination of sample, readgroup, library.. A way of partitioning reads into groups. Can be sample, readgroup, or library.

--printBaseCounts / -baseCounts ( boolean with default value false )

Will add base counts to per-locus output.. Instead of reporting depth, report the base pileup at each locus

--printBinEndpointsAndExit ( boolean with default value false )

Prints the bin values and exits immediately. Use to calibrate what bins you want before running on data..

--start ( int with default value 1 )

Starting (left endpoint) for granular binning. Sets the low-coverage cutoff for granular binning. All loci with depth < START are counted in the first bin.

--stop ( int with default value 500 )

Ending (right endpoint) for granular binning. Sets the high-coverage cutoff for granular binning. All loci with depth > END are counted in the last bin.

--summaryCoverageThreshold / -ct ( int[] with default value [15] )

for summary file outputs, report the % of bases coverd to >= this number. Defaults to 15; can take multiple arguments.. A coverage threshold for summarizing (e.g. % bases >= CT for each sample)


See also Guide Index | Technical Documentation Index | Support Forum

GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.