Tagged with #depthofcoverage
2 documentation articles | 1 announcement | 0 forum discussions


Comments (2)

A new tool has been released!

Check out the documentation at DepthOfCoverage.

Comments (68)

See this announcement regarding our plans for support of DepthOfCoverage and DiagnoseTargets. If you find that there are functionalities missing in either tool, leave us a comment and we will consider adding them.

For a complete, detailed argument reference, refer to the GATK document page here.

Introduction

DepthOfCoverage is a coverage profiler for a (possibly multi-sample) bam file. It uses a granular histogram that can be user-specified to present useful aggregate coverage data. It reports the following metrics over the entire .bam file:

  • Total, mean, median, and quartiles for each partition type: aggregate
  • Total, mean, median, and quartiles for each partition type: for each interval
  • A series of histograms of the number of bases covered to Y depth for each partition type (granular; e.g. Y can be a range, like 16 to 22)
  • A matrix of counts of the number of intervals for which at least Y samples and/or read groups had a median coverage of at least X
  • A matrix of counts of the number of bases that were covered to at least X depth, in at least Y groups (e.g. # of loci with ≥15x coverage for ≥12 samples)
  • A matrix of proportions of the number of bases that were covered to at least X depth, in at least Y groups (e.g. proportion of loci with ≥18x coverage for ≥15 libraries)

Because the common question "What proportion of my targeted bases are well-powered to discover SNPs?" is answered by the last matrix on the above list, it is strongly recommended that this walker be run on all samples simultaneously.

For humans, DepthOfCoverage can also be configured to output these statistics aggregated over genes, by providing it with a RefSeq ROD.

DepthOfCoverage also outputs, by default, the total coverage at every locus, and the coverage per sample and/or read group. This behavior can optionally be turned off, or switched to base count mode, where base counts will be output at each locus, rather than total depth.

Coverage by Gene

To get a summary of coverage by each gene, you may supply a refseq (or alternative) gene list via the argument

-geneList /path/to/gene/list.txt

The provided gene list must be of the following format:

585     NM_001005484    chr1    +       58953   59871   58953   59871   1       58953,  59871,  0       OR4F5   cmpl    cmpl    0,
587     NM_001005224    chr1    +       357521  358460  357521  358460  1       357521, 358460, 0       OR4F3   cmpl    cmpl    0,
587     NM_001005277    chr1    +       357521  358460  357521  358460  1       357521, 358460, 0       OR4F16  cmpl    cmpl    0,
587     NM_001005221    chr1    +       357521  358460  357521  358460  1       357521, 358460, 0       OR4F29  cmpl    cmpl    0,
589     NM_001005224    chr1    -       610958  611897  610958  611897  1       610958, 611897, 0       OR4F3   cmpl    cmpl    0,
589     NM_001005277    chr1    -       610958  611897  610958  611897  1       610958, 611897, 0       OR4F16  cmpl    cmpl    0,
589     NM_001005221    chr1    -       610958  611897  610958  611897  1       610958, 611897, 0       OR4F29  cmpl    cmpl    0,

If you are on the broad network, the properly-formatted file containing refseq genes and transcripts is located at

/humgen/gsa-hpprojects/GATK/data/refGene.sorted.txt

If you supply the -geneList argument, DepthOfCoverage will output an additional summary file that looks as follows:

Gene_Name     Total_Cvg       Avg_Cvg       Sample_1_Total_Cvg    Sample_1_Avg_Cvg    Sample_1_Cvg_Q3       Sample_1_Cvg_Median      Sample_1_Cvg_Q1
SORT1    594710  238.27  594710  238.27  165     245     330
NOTCH2  3011542 357.84  3011542 357.84  222     399     >500
LMNA    563183  186.73  563183  186.73  116     187     262
NOS1AP  513031  203.50  513031  203.50  91      191     290

Note that the gene coverage will be aggregated only over samples (not read groups, libraries, or other types). The -geneList argument also requires specific intervals within genes to be given (say, the particular exons you are interested in, or the entire gene), and it functions by aggregating coverage from the interval level to the gene level, by referencing each interval to the gene in which it falls. Because by-gene aggregation looks for intervals that overlap genes, -geneList is ignored if -omitIntervals is thrown.

Comments (4)

We have decided to continue providing and supporting DepthOfCoverage and DiagnoseTargets for the foreseeable future. Going forward, we'll try to integrate them and develop their features to address the main needs of the community. To this end we welcome your continuing feedback, so please feel free to contribute comments and ideas in this thread.

To all who took the time to tell us what you find useful about DoC and DT (and what you wish it could do), a big thank you! This is always very useful to us because it helps us identify which features are most valuable to our users.

No posts found with the requested search criteria.