BaseRecalibrator Out of Memory problem
Posted in Ask the team

I use GATK v2.5-2-gf57256b and ran into an our of memory problem when running the BaseRecalibrator.

** ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java**

I tried assigning increasingly large memory to the program and reduced the coverage to -dcov 40. The last try was with very large memory:

Command: java -Xmx47g -jar /cc/apps/GATK/2.5.2/GenomeAnalysisTK.jar -T BaseRecalibrator -I ${line}real_calmd.bam -R ${huref} -knownSites kgp_vcf/ALL.wholeGenome_wo_wgs.phase1_integrated_calls.20101123.snps_indels_svs.genotypes.vcf -cov ReadGroupCovariate -cov CycleCovariate -cov ContextCovariate -cov QualityScoreCovariate -o ${line}_recal.csv -dcov 40 &>output${line}_qual_recal1

The program managed to run longer and longer as I increased the memory and decreased the coverage each time. The last run, with the 47gig and -dcov 40, ran for 90 min (with 6 days remaining) before crashing.

My BAM files are quite large (around 150 gig each). I did the recalibration previously with and older version of GATK using CountCovariates and it worked fine for these big Bam files. Is there anything I can do to make BaseRecalibrator work on these files also - since I would like to use the newer version of GATK for my whole pipeline

/Thanks, casch

