Hi, I am estimating SNP number for a genome of a speceis.
I found that the number of SNP in the fastq going through GATK is 10 times more than the first fastq. Interestingly, if I use picard to do duplicats-removomg again to the GATK bam and used samtools to convert the bam to fastq file. The SNP jumps back to 10 time fewer.
What can be the reason that the SNP number can be 10 time different between the two methods? Actually, I expect the GATK output file has fewer SNP given the effect of recaliraiton or relingement. But the result is opposite.