I have just run the base recalibration following GATK best practice. As I'm working on a non-model organism, I had to run a first round of haplotype caller and use the resulting variants (after filtration) to do the base recalibration as recommended by GATK best practrices.
Everything seems ok, the pipeline could be executed on my data without errors. However, when I checked for convergence after the base recalibrations (I ran a second round of BaseRecalibrator and then generated plots using AnalyzeCovariates), the reported base quality after the recalibration became so low... I had most of my bases with quality score higher than 20 but after the recalibration most of them became so low under 10 ! You can see in the attached file the plots generated by AnalyzeCovariates. The reported Q score after recalibration for the substitution is so low....
How could this happen? Does it just mean that I haven't yet reached the convergence and just need to conducts other rounds of recalibration? Could this be due to the data? The used variants may not be filtered with enough stringency and this results in messing up?
I would like to have your regards on this issue.