Compress the read data in order to minimize file sizes, which facilitates massively multisample processing.
Run the following GATK command:
java -jar GenomeAnalysisTK.jar \ -T ReduceReads \ -R reference.fa \ -I recal_reads.bam \ -L 20 \ -o reduced_reads.bam
This creates a file called
reduced_reads.bam containing only the sequence information that is essential for calling variants.
Note that ReduceReads is not meant to be run on multiple samples at once. If you plan on merging your sample bam files, you should run ReduceReads on individual samples before doing so.