We all know how HaplotypeCaller analyses can take a long time. IBM is now providing a native implementation of the PairHMM algorithm that leverages the new hardware available in their POWER8 systems. This optimization currently work on the following systems: Ubuntu14 and RHEL7 with POWER8.
To take advantage of this optimization, you need to do the following:
Here is an example for running on a P8 system with Ubuntu:
java -Xmx32g -Djava.library.path=/path/to/PairHMM_P8_Ubuntu -jar $GATK_PATH/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R $REFERENCE -I $INPUT_BAM --dbsnp $SNP_VCF \ -stand_emit_conf 10 -stand_call_conf 50 \ --pair_hmm_implementation VECTOR_LOGLESS_CACHING \ -o $OUTPUT_VCF
You can expect a speedup in the range of 1-1.7x depending on the hardware, OS and test cases.
If you have any questions or issues (aside from downloading the file), please contact Yinhue Cheng at IBM (email@example.com).
I'm running BaseRecalibrator on a full lane of HiSeq. I set the -nct switch to 8, but I see that the average load is <1. Why is this so? Doesn't the BaseRecalibrator use threads properly?
I've seen this thread http://gatkforums.broadinstitute.org/discussion/1263/baserecalibrator-trade-off-between-runtime-and-accuracy on how to speed up the process by downsampling. Is this a better option considering I have >200M reads?
BTW I'm using version 2.2-8 - do newer versions implement -num_threads?
I would really love to reduce the runtime from 36hrs.