I've got some question about some of the GATK tools and practices (and hope its okay to post them into a single thread)
Is there any additional background information about the UnifiedGenotyper available , especially in case of multi sample calling ? How this works in a bit more detailed way ? So far I could only find the slides from the last GATK Workshop. But if I remember this correctly, during the presentation it was mentioned one could ask if more information is required (unfortunately I wasn't at the workshop ;))
What's the definition of the Base Score Accuracy (Base Quality Score Recalibration Plots) ? Am I correct that this specifies how well the observed quality scores match the expected (empirical) quality scores ? I think I read it somewhere but couldn't find it any more.
I've read that the way to validate(check what and how much was done) the Realign around Indels step, is to count/search for the OC Tags in the alignment file. Is there any fast way to do so ? Or do I have to convert the BAM into a SAM and count by running through lines of the alignment ?
Fortunately there exist the "Recommended sets of known sites per tool" which I used so far. But is there any explanation why those sets are recommended ?
Tanks a lot !