Hi team, thanks for a great job developing this software!
I am planning to use the GATK in a class as a demo of how to do SNP detection and the VQSR in a non-model organism, but due to time constraints I have a very small dataset (12 samples of 100K reads each).
I am using a SNP Q>20 for an initial round of SNP detection, which I then use as a "true" training set for the VQSR and use a call set with Q>3 as my variants of interest.
I keep getting the error message "NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --percentBadVariants 0.05, for example) or lowering the maximum number of Gaussians to use in the model (via --maxGaussians 4, for example)"
which is not surprising, even though I have already set --maxGaussians 2 -percentBad 0.01 -minNumBad 50
to reiterate, this is for educational purposes - I am wondering if I can move past this error message and get an output file despite this error?
/Pierre De Wit