We all know how HaplotypeCaller analyses can take a long time. IBM is now providing a native implementation of the PairHMM algorithm that leverages the new hardware available in their POWER8 systems. This optimization currently work on the following systems: Ubuntu14 and RHEL7 with POWER8.
To take advantage of this optimization, you need to do the following:
Here is an example for running on a P8 system with Ubuntu:
java -Xmx32g -Djava.library.path=/path/to/PairHMM_P8_Ubuntu -jar $GATK_PATH/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R $REFERENCE -I $INPUT_BAM --dbsnp $SNP_VCF \ -stand_emit_conf 10 -stand_call_conf 50 \ --pair_hmm_implementation VECTOR_LOGLESS_CACHING \ -o $OUTPUT_VCF
You can expect a speedup in the range of 1-1.7x depending on the hardware, OS and test cases.
If you have any questions or issues (aside from downloading the file), please contact Yinhue Cheng at IBM (firstname.lastname@example.org).
Disclaimer: Please note that these libraries are not an official IBM product. You use it entirely at your own risk, and neither IBM nor the author assumes any liability whatsoever, nor do they assume responsibility for maintenance. Please report comments and corrections to email@example.com.
I'm running BaseRecalibrator on a full lane of HiSeq. I set the -nct switch to 8, but I see that the average load is <1. Why is this so? Doesn't the BaseRecalibrator use threads properly?
I've seen this thread http://gatkforums.broadinstitute.org/discussion/1263/baserecalibrator-trade-off-between-runtime-and-accuracy on how to speed up the process by downsampling. Is this a better option considering I have >200M reads?
BTW I'm using version 2.2-8 - do newer versions implement -num_threads?
I would really love to reduce the runtime from 36hrs.