Tagged with #info field
0 documentation articles | 0 announcements | 2 forum discussions


No posts found with the requested search criteria.
No posts found with the requested search criteria.
Comments (8)

I just noticed something odd about GATK read counts. Using a tiny test data set, I generated a BAM file with marked duplicates.

This is the output for samtools flagstat:

40000 + 0 in total (QC-passed reads + QC-failed reads)
63 + 0 duplicates
38615 + 0 mapped (96.54%:-nan%)
40000 + 0 paired in sequencing
20000 + 0 read1
20000 + 0 read2
37764 + 0 properly paired (94.41%:-nan%)
38284 + 0 with itself and mate mapped
331 + 0 singletons (0.83%:-nan%)
76 + 0 with mate mapped to a different chr
54 + 0 with mate mapped to a different chr (mapQ>=5)

This is what I get as part of GATK info stats when running RealignerTargetCreator:

INFO  14:42:05,815 MicroScheduler - 5175 reads were filtered out during traversal out of 276045 total (1.87%) 
INFO  14:42:05,816 MicroScheduler -   -> 84 reads (0.03% of total) failing BadMateFilter 
INFO  14:42:05,816 MicroScheduler -   -> 1014 reads (0.37% of total) failing DuplicateReadFilter 
INFO  14:42:05,816 MicroScheduler -   -> 4077 reads (1.48% of total) failing MappingQualityZeroFilter 

This is what I get as part of GATK info stats when running DepthOfCoverage (on the orignal BAM, not after realignment):

INFO  15:03:17,818 MicroScheduler - 2820 reads were filtered out during traversal out of 309863 total (0.91%) 
INFO  15:03:17,818 MicroScheduler -   -> 1205 reads (0.39% of total) failing DuplicateReadFilter 
INFO  15:03:17,818 MicroScheduler -   -> 1615 reads (0.52% of total) failing UnmappedReadFilter 

Why are all of these so different? Why are there much more total reads and duplicate reads for GATK stats?

Comments (4)

I have a set of VCFs with identical positions in them:

VCF1: 1 10097 . T . 26 . AN=196;DP=1622;MQ=20.06;MQ0=456 GT:DP

VCF2: 1 10097 . T . 21.34 . AN=198;DP=2338;MQ=19.53;MQ0=633 GT:DP

VCF3: 1 10097 . T . 11.70 . AN=240;DP=3957;MQ=19.74;MQ0=1085 GT:DP

VCF4: 1 10097 . T . 15.56 . AN=134;DP=1348;MQ=18.22;MQ0=442 GT:DP

If I use all of them as input for VariantRecalibrator, which annotations will VariantRecalibrator use? Should I instead merge the VCFs with CombineVariants and run VariantAnnotator, before I run VariantRecalibrator?

I'm not sure if the forum is for asking technical questions only or you are allowed to ask for best practices as well. Feel free to delete my question, if it doesn't belong here. Thank you.