# DepthOfCoverage

Assess sequence coverage by a wide array of metrics, partitioned by sample, read group, or library

## Overview

This tool processes a set of bam files to determine coverage at different levels of partitioning and aggregation. Coverage can be analyzed per locus, per interval, per gene, or in total; can be partitioned by sample, by read group, by technology, by center, or by library; and can be summarized by mean, median, quartiles, and/or percentage of bases covered to or beyond a threshold. Additionally, reads and bases can be filtered by mapping or base quality score.

### Input

One or more bam files (with proper headers) to be analyzed for coverage statistics

(Optional) A REFSEQ Rod to aggregate coverage to the gene level

(for information about creating the REFSEQ Rod, please consult the online documentation)

### Output

Tables pertaining to different coverage summaries. Suffix on the table files declares the contents:

- no suffix: per locus coverage

- _summary: total, mean, median, quartiles, and threshold proportions, aggregated over all bases

- _statistics: coverage histograms (# locus with X coverage), aggregated over all bases

- _interval_summary: total, mean, median, quartiles, and threshold proportions, aggregated per interval

- _interval_statistics: 2x2 table of # of intervals covered to >= X depth in >=Y samples

- _gene_summary: total, mean, median, quartiles, and threshold proportions, aggregated per gene

- _gene_statistics: 2x2 table of # of genes covered to >= X depth in >= Y samples

- _cumulative_coverage_counts: coverage histograms (# locus with >= X coverage), aggregated over all bases

- _cumulative_coverage_proportions: proprotions of loci with >= X coverage, aggregated over all bases

### Examples

 java -Xmx2g -jar GenomeAnalysisTK.jar \
-R ref.fasta \
-T DepthOfCoverage \
-o file_name_base \
-I input_bams.list
[-geneList refSeq.sorted.txt] \
[-ct 4 -ct 6 -ct 10] \
[-L my_capture_genes.interval_list]


These Read Filters are automatically applied to the data by the Engine before processing by DepthOfCoverage.

### Parallelism options

This tool can be run in multi-threaded mode using this option.

### Downsampling settings

This tool does not apply any downsampling by default.

## Command-line Arguments

### Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

### DepthOfCoverage specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Optional Outputs
--out
-o
NA An output file created by the walker. Will overwrite contents if file exists
Optional Parameters
--calculateCoverageOverGenes
-geneList
NA Calculate coverage statistics over this list of genes
--countType
NA How should overlapping reads from the same fragment be handled?
--maxBaseQuality
127 Maximum quality of bases to count towards depth
--maxMappingQuality
2147483647 Maximum mapping quality of reads to count towards depth
--minBaseQuality
-mbq
-1 Minimum quality of bases to count towards depth
--minMappingQuality
-mmq
-1 Minimum mapping quality of reads to count towards depth
--outputFormat
NA The format of the output file
--partitionType
-pt
NA Partition type for depth of coverage
Optional Flags
--omitDepthOutputAtEachBase
-omitBaseOutput
NA Do not output depth of coverage at each base
--omitIntervalStatistics
-omitIntervals
NA Do not calculate per-interval statistics
--omitLocusTable
NA Do not calculate per-sample per-depth counts of loci
--omitPerSampleStats
-omitSampleSummary
NA Do not output the summary files per-sample
--printBaseCounts
-baseCounts
NA Add base counts to per-locus output
--nBins
499 Number of bins to use for granular binning
--start
1 Starting (left endpoint) for granular binning
--stop
500 Ending (right endpoint) for granular binning
--summaryCoverageThreshold
-ct
NA Coverage threshold (in percent) for summarizing statistics
--ignoreDeletionSites
NA Ignore sites consisting only of deletions
--includeDeletions
-dels
NA Include information on deletions
--includeRefNSites
NA Include sites where the reference is N
--printBinEndpointsAndExit
NA Print the bin values and exit immediately

### Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

### --calculateCoverageOverGenes / -geneList

Calculate coverage statistics over this list of genes
Specify a RefSeq file for use in aggregating coverage statistics over genes.

File

### --countType / NA

How should overlapping reads from the same fragment be handled?

The --countType argument is an enumerated type (CountPileupType), which can have one of the following values:

Count all reads independently (even if from the same fragment).
COUNT_FRAGMENTS
Count all fragments (even if the reads that compose the fragment are not consistent at that base).
COUNT_FRAGMENTS_REQUIRE_SAME_BASE
Count all fragments (but only if the reads that compose the fragment are consistent at that base).

CountPileupType

### --ignoreDeletionSites / NA

Ignore sites consisting only of deletions

boolean

### --includeDeletions / -dels

Include information on deletions
Consider a spanning deletion as contributing to coverage. Also enables deletion counts in per-base output.

boolean

### --includeRefNSites / NA

Include sites where the reference is N
Normally, sites where the reference is N (or another non-canonical base) are skipped. If this option is enabled, these sites will be included in DoC calculations if there is coverage from neighboring reads.

boolean

### --maxBaseQuality / NA

Maximum quality of bases to count towards depth
Bases with quality scores higher than this threshold will be skipped. The default value is the largest number that can be represented as a byte.

byte  [ [ 0  127 ] ]

### --maxMappingQuality / NA

Maximum mapping quality of reads to count towards depth
Reads with mapping quality values higher than this threshold will be skipped. The default value is the largest number that can be represented as an integer by the program.

int  [ [ 0  2,147,483,647 ] ]

### --minBaseQuality / -mbq

Minimum quality of bases to count towards depth
Bases with quality scores lower than this threshold will be skipped. This is set to -1 by default to disable the evaluation and ignore this threshold.

byte  [ [ 0  127 ] ]

### --minMappingQuality / -mmq

Minimum mapping quality of reads to count towards depth
Reads with mapping quality values lower than this threshold will be skipped. This is set to -1 by default to disable the evaluation and ignore this threshold.

int  [ [ 0  2,147,483,647 ] ]

### --nBins / NA

Number of bins to use for granular binning
Sets the number of bins for granular binning

int  [ [ 0  [ 1  ∞ ] ]

### --omitDepthOutputAtEachBase / -omitBaseOutput

Do not output depth of coverage at each base
Disabling the tabulation of total coverage at every base should speed up processing.

boolean

### --omitIntervalStatistics / -omitIntervals

Do not calculate per-interval statistics
Disabling the tabulation of interval statistics (mean, median, quartiles AND # intervals by sample by coverage) should speed up processing. This option is required in order to use -nt parallelism.

boolean

### --omitLocusTable / -omitLocusTable

Do not calculate per-sample per-depth counts of loci
Disabling the tabulation of locus statistics (# loci covered by sample by coverage) should speed up processing.

boolean

### --omitPerSampleStats / -omitSampleSummary

Do not output the summary files per-sample
This option simply disables writing separate files for per-sample summary statistics (total, mean, median, quartile coverage per sample). These statistics are still calculated internally, so enabling this option will not improve runtime.

boolean

### --out / -o

An output file created by the walker. Will overwrite contents if file exists

Map[DoCOutputType,PrintStream]

### --outputFormat / NA

The format of the output file
Output file format (e.g. csv, table, rtable); defaults to r-readable table.

String

### --partitionType / -pt

Partition type for depth of coverage
By default, coverage is partitioning by sample, but it can be any combination of sample, readgroup and/or library.

Set[Partition]

### --printBaseCounts / -baseCounts

Add base counts to per-locus output
Instead of reporting depth, the program will report the base pileup at each locus

boolean

### --printBinEndpointsAndExit / NA

Print the bin values and exit immediately
Use this option to calibrate what bins you want before performing full calculations on your data.

boolean

### --start / NA

Starting (left endpoint) for granular binning
Sets the low-coverage cutoff for granular binning. All loci with depth < START are counted in the first bin.

int  [ [ 0  ∞ ] ]

### --stop / NA

Ending (right endpoint) for granular binning
Sets the high-coverage cutoff for granular binning. All loci with depth > STOP are counted in the last bin.

int  [ [ 1  ∞ ] ]

### --summaryCoverageThreshold / -ct

Coverage threshold (in percent) for summarizing statistics
For summary file outputs, report the percentage of bases covered to an amount equal to or greater than this number (e.g. % bases >= CT for each sample). Defaults to 15; can take multiple arguments.

int[]