Hi all, I'm in a bit of a daze going through all the documentation and I wanted to do a sanity check on my workflow with the experts. I have ~120 WGS of a ~24Mb fungal pathogen. The end-product of my GATK workflow would be a high quality call set of SNPs, restricted to the sites for which we have confidence in the call across all samples (so sites which are not covered by sufficient high quality reads in one or more samples will be eliminated).
Therefore my workflow (starting from a sorted indexed BAM file of reads from a single sample, mapped to reference with
bwa mem) is this:
"QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"and we also filter out heterozygous positions using "isHet == 1".
--output_mode EMIT_ALL_SITES \and
--emitRefConfidence GVCF \
Does this sound like a reasonable thing to do? What options should I use in step 8 in order for
HC to tell me how confident it is, site-by-site about it's calls, including those that are homozygous reference? I notice that when using
--output_mode EMIT_ALL_CONFIDENT_SITES \ and
--emitRefConfidence GVCF \ I am missing a lot of the annotation I get when just outputting variant sites (e.g. QD).