I've followed the suggested protocol for local realignment - first using RealignerTargetCreator and then IndelRealigner, but have unexpected results.
Let's call the two BAMs I'm realigning "normal" and "tumour" or N and T for short. Once realigned, I've split the resulting NT BAM file (using readgroup tags, although I see from the docs that it can create separate files natively) back into the original N and T BAM files and discovered something odd. I was expecting the pre-realignment N and T files to contain the same number of reads as the post-realignment files, only the coordinates that reads are mapped to would be different.
However, I notice that post-realignment files contain significantly fewer reads because unaligned reads and reads not aligned to the autosomes or sex chromosomes have been removed. However, these reads alone do not account for the difference; large numbers of reads aligned to the 24 chromosomes are now missing.
Can you tell me more about the reads that are removed? I suspect it to be an alignment quality issue, but cannot find direct reference to this behaviour in the documentation. I'm currently keeping both my pre and post-realignment bam files, but ultimately there will be space constraints and I'll have to choose and would like to make the most informed decision possible.