I just noticed something odd about GATK read counts. Using a tiny test data set, I generated a BAM file with marked duplicates.
This is the output for samtools flagstat:
40000 + 0 in total (QC-passed reads + QC-failed reads) 63 + 0 duplicates 38615 + 0 mapped (96.54%:-nan%) 40000 + 0 paired in sequencing 20000 + 0 read1 20000 + 0 read2 37764 + 0 properly paired (94.41%:-nan%) 38284 + 0 with itself and mate mapped 331 + 0 singletons (0.83%:-nan%) 76 + 0 with mate mapped to a different chr 54 + 0 with mate mapped to a different chr (mapQ>=5)
This is what I get as part of GATK info stats when running RealignerTargetCreator:
INFO 14:42:05,815 MicroScheduler - 5175 reads were filtered out during traversal out of 276045 total (1.87%) INFO 14:42:05,816 MicroScheduler - -> 84 reads (0.03% of total) failing BadMateFilter INFO 14:42:05,816 MicroScheduler - -> 1014 reads (0.37% of total) failing DuplicateReadFilter INFO 14:42:05,816 MicroScheduler - -> 4077 reads (1.48% of total) failing MappingQualityZeroFilter
This is what I get as part of GATK info stats when running DepthOfCoverage (on the orignal BAM, not after realignment):
INFO 15:03:17,818 MicroScheduler - 2820 reads were filtered out during traversal out of 309863 total (0.91%) INFO 15:03:17,818 MicroScheduler - -> 1205 reads (0.39% of total) failing DuplicateReadFilter INFO 15:03:17,818 MicroScheduler - -> 1615 reads (0.52% of total) failing UnmappedReadFilter
Why are all of these so different? Why are there much more total reads and duplicate reads for GATK stats?
I have used GATK for multi sample SNP calling, the total depth DP at the variant site does not seem to be the same when the individual depths are summed up for few of the variants. For example, for the below shown variant site, DP=24 in the INFO column and from the FORMAT column depth= 3+3+8+2=16. chr1 3015369 . C A 39.22 AC=4;AF=1.00;AN=4;DP=24;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=4;MLEAF=1.00;MQ=6.96;MQ0=21;QD=2.45;SB=-2.321e+01 GT:AD:DP:GQ:PL ./. 1/1:3,3:6:6:48,6,0 1/1:8,2:10:3:25,3,0
Could someone give an explanation for differences in depths?