I'm quite new to SNP calling. I am trying to setup a pipeline which includes GATK IndelRealigner as a final step. My bam file (before realignment) is a little over 1GB. After running the indel realigner however, it's reduced to 18MB! I'm assuming its throwing out way too many reads or something has gone wrong.
I'm calling the indel realigner with the default options as follows:
java -Xmx16g -jar $GATK_DIR/GenomeAnalysisTK.jar \ -T IndelRealigner \ -R /path/to/my/ref \ -I input.bam.intervals \ -targetIntervals input.bam.intervals \ -o realn.bam \
I am generating the read groups using
AddOrReplaceReadGroups.jar (from picard tools) and interval file using GATK
RealignerTargetCreator with default options.
My bam file was generated off the raw reads of experiment
SRA181417 fetched from SRA (after cleaning adapters using cutadapt, mapping to reference using bwa-mem, and removing duplicate reads using picard tools)
I have tried this on other reads and do not have the same issue. Can anyone comment on why indel realigner could be throwing out so many reads.
I don't see any information from this post http://gatkforums.broadinstitute.org/discussion/1268/how-should-i-interpret-vcf-files-produced-by-the-gatk,
I was wondering if it's possible to get the number of forward/reverse reads in the final VCF outputted by HaplotypeCaller?
Hi, Does GATK2 provide a walker/option to summarize the read alignment in a given BAM file? The summary including total reads, reads mapped/%, reads uniquely mapped/%, reads uniquely mapped with 0mm/%, reads mapped on-target/%, reads uniquely mapped on-target%, etc is of great use to assess the mapping quality for whole genome or targeted analysis. Please advice me on how I can obtain this using any of the walkers available. Thanks, Raj