Tagged with #depthperallelebysample
1 documentation article | 0 announcements | 2 forum discussions

Created 2014-10-17 19:19:39 | Updated 2015-07-06 13:44:00 | Tags: coveragebysample depthofcoverage depthperallelebysample ad dp

Comments (7)


This document describes the proper use of metrics associated with depth of coverage for the purpose of evaluating variants.

The metrics involved are the following:

  • DepthPerAlleleBySample (AD): outputs the depth of coverage of each allele per sample.
  • Coverage (DP): outputs the filtered depth of coverage for each sample and the unfiltered depth of coverage across all samples.

For an overview of the tools and concepts involved in performing sequence coverage analysis, where the purpose is to answer the common question: "(Where) Do I have enough sequence data to be empowered to discover variants with reasonable confidence?", please see this document.

Coverage annotations: DP and AD

The variant callers generate two main coverage annotation metrics: the allele depth per sample (AD) and overall depth of coverage (DP, available both per sample and across all samples, with important differences), controlled by the following annotator modules:

  • DepthPerAlleleBySample (AD): outputs the depth of coverage of each allele per sample.
  • Coverage (DP): outputs the filtered depth of coverage for each sample and the unfiltered depth of coverage across all samples.

At the sample level, these annotations are highly complementary metrics that provide two important ways of thinking about the depth of the data available for a given sample at a given site. The key difference is that the AD metric is based on unfiltered read counts while the sample-level DP is based on filtered read counts (see tool documentation for a list of read filters that are applied by default for each tool). As a result, they should be interpreted differently.

The sample-level DP is in some sense reflective of the power I have to determine the genotype of the sample at this site, while the AD tells me how many times I saw each of the REF and ALT alleles in the reads, free of any bias potentially introduced by filtering the reads. If, for example, I believe there really is a an A/T polymorphism at a site, then I would like to know the counts of A and T bases in this sample, even for reads with poor mapping quality that would normally be excluded from the statistical calculations going into GQ and QUAL.

Note that because the AD includes reads and bases that were filtered by the caller (and in case of indels, is based on a statistical computation), it should not be used to make assumptions about the genotype that it is associated with. Ultimately, the phred-scaled genotype likelihoods (PLs) are what determines the genotype calls.


No articles to display.

Created 2015-06-29 17:57:31 | Updated 2015-06-29 17:58:38 | Tags: depthperallelebysample haplotypecaller asereadcounter

Comments (5)


I was running ASEReadCounter and came across a "stack trace" error after completion of 14.1%.

I am running the command like this:

java -jar ../GenomeAnalysisTK-3.4-0/GenomeAnalysisTK.jar -R /results2/indexes/genomes/mm9/fasta/mm9.fa -T ASEReadCounter -o filterReadCounts.chr13.58264978.txt -I dedupFolder/D4.indel.recal.dedup.reordered.bam -sites variants/gatk/D4.allBases.chr13.vcf -minDepth 20 --minMappingQuality 30 --minBaseQuality 20

The output is as follows:

INFO 13:27:40,125 HelpFormatter - -------------------------------------------------------------------------------- INFO 13:27:40,130 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41 INFO 13:27:40,130 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 13:27:40,131 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 13:27:40,136 HelpFormatter - Program Args: -R /results2/indexes/genomes/mm9/fasta/mm9.fa -T ASEReadCounter -o filterReadCounts.chr13.58264978.txt -I dedupFolder/D4.indel.recal.dedup.reordered.bam -sites variants/gatk/D4.allBases.chr13.vcf -minDepth 20 --minMappingQuality 30 --minBaseQuality 20 -U ALLOW_N_CIGAR_READS -rf BadCigar INFO 13:27:40,143 HelpFormatter - Executing as szimmerm@al2.aecom.yu.edu on Linux 2.6.18-371.12.1.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_09-b05. INFO 13:27:40,144 HelpFormatter - Date/Time: 2015/06/29 13:27:40 INFO 13:27:40,145 HelpFormatter - -------------------------------------------------------------------------------- INFO 13:27:40,145 HelpFormatter - -------------------------------------------------------------------------------- INFO 13:27:40,937 GenomeAnalysisEngine - Strictness is SILENT INFO 13:27:41,075 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 10000 INFO 13:27:41,091 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 13:27:41,186 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.09 INFO 13:27:41,474 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 13:27:42,051 GenomeAnalysisEngine - Done preparing for traversal INFO 13:27:42,052 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 13:27:42,053 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 13:27:42,054 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime INFO 13:28:12,062 ProgressMeter - chr10:61439995 7224296.0 30.0 s 4.0 s 2.3% 21.6 m 21.1 m INFO 13:28:42,066 ProgressMeter - chr10:107123365 1.4729978E7 60.0 s 4.0 s 4.0% 24.8 m 23.8 m INFO 13:29:12,078 ProgressMeter - chr11:29429929 2.1979505E7 90.0 s 4.0 s 6.0% 25.0 m 23.5 m INFO 13:29:42,085 ProgressMeter - chr11:67787354 2.8914135E7 120.0 s 4.0 s 7.4% 26.8 m 24.8 m INFO 13:30:12,090 ProgressMeter - chr11:88702416 3.4725261E7 2.5 m 4.0 s 8.2% 30.3 m 27.8 m INFO 13:30:42,095 ProgressMeter - chr11:110188026 4.0515208E7 3.0 m 4.0 s 9.0% 33.2 m 30.2 m INFO 13:31:12,100 ProgressMeter - chr12:44912778 4.8755304E7 3.5 m 4.0 s 11.2% 31.3 m 27.8 m INFO 13:31:42,107 ProgressMeter - chr12:104866935 5.6805392E7 4.0 m 4.0 s 13.4% 29.8 m 25.8 m INFO 13:32:12,113 ProgressMeter - chr12:121257414 5.9799881E7 4.5 m 4.0 s 14.1% 32.0 m 27.5 m

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 0 at org.broadinstitute.gatk.tools.walkers.rnaseq.ASEReadCounter.map(ASEReadCounter.java:216) at org.broadinstitute.gatk.tools.walkers.rnaseq.ASEReadCounter.map(ASEReadCounter.java:102) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-0-g7e26428):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR ------------------------------------------------------------------------------------------

Is there any reason for this error, or is it just a bug? My overall goal is to get the filtered count of reads that support a given allele for an individual sample. DepthPerAlleleBySample gives unfiltered counts instead of filtered ones. If there is a way to get the filtered counts besides using ASEReadCounter that would also be great. Thanks!

Created 2012-11-29 17:51:33 | Updated | Tags: unifiedgenotyper depthperallelebysample

Comments (12)


I found some strange entries for indels in my VCF file created by the Unified Genotyper. For example:

4 184513470 . TC T 4009 PASS AC=4;AF=0.250;AN=16;BaseQRankSum=1.972;DP=1315;DS;FS=3.466;HaplotypeScore=537.6937;MLEAC=4;MLEAF=0.250;MQ=52.55;MQ0=0;MQRankSum=-10.581;QD=4.55;ReadPosRankSum=-10.128;SB=-3.500e+01;set=variant2 GT:AD:DP:GQ:PL 0/1:230,0:239:99:282,0,5011 0/0:92,0:95:99:0,133,2435

The first sample has genotype 0/1 with a good GQ value. However, according the allele depth field, there is no read supporting the deletion. When I look at the reads using the IGV, I find some reads supporting the deletion for the first sample (and even some for the second one).

Moreover, when I looked at the AD values for SNPs, I noticed the the sum of all AD values is much less than the coverage shown in the IGV. I filtered duplicated reads in the IGV.

Can someone please give an explanation? This link http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_DepthPerAlleleBySample.html explains the difference between AD and DP, but does not help in my case.

Best greetings, Hans-Ulrich