# Tagged with #runtime 0 documentation articles | 1 announcement | 9 forum discussions

No articles to display.

Created 2016-03-16 15:03:06 | Updated 2016-03-17 16:09:20 | Tags: profile parallelism performance runtime speed benchmarks

When you're setting up a variant discovery pipeline, you face two problems: deciding what tools to run (with what options), and how to run them efficiently so that it doesn't take forever. Between our documentation and our support forum, we can get you most if not all the way to solving the first problem, unless you're working with something really unusual.

However, the second problem is not something we've been able to help much with. We only benchmark computational requirements/performance for the purposes of our in-house pipelines, which are very specific to our particular infrastructure, and we don't have the resources to test different configurations. As a result it's been hard for us to give satisfying answers to questions like "How long should this take?" or "How much RAM do I need?" -- and we're aware this is a big point of pain.

So I'm really pleased to announce that a team of engineers at Intel have been developing a system to profile pipelines that implement our Best Practices workflows on a range of hardware configurations. This is a project we've been supporting by providing test data and implementation advice, and it's really gratifying to see it bear fruit: the team recently published their first round of profiling, done on the per-sample segment of the germline variation pipeline (from BWA to HaplotypeCaller; FASTQ to GVCF) on a trio of whole genomes.

The white paper is available from the GATK-specific page of Intel's Health-IT initiative website and contains some very useful insights regarding key bottlenecks in the pipeline. It also details the applicability of parallelizing options for each tool, as well as the effect of using different numbers of threads on performance, when run on a single 36-core machine. Spoiler alert: more isn't always better!

Read on for a couple of highlights of what I thought was especially interesting in the Intel team's white paper.

### Finding the thread count that's right for you: not just a Bed, Bath & Beyond question

First, this figure showing how parallelization affects turnaround time for each tool in the pipeline does a great job of identifying where parallelization makes a the biggest difference (hellooooo BWA), as well as where the addition of more parallel threads shows quickly diminishing returns.

Note that here the term "thread" refers both to parallelized execution that is achieved through multithreading approaches (in GATK, these are invoked using -nt and/or -nct depending on the tool) as well as through local scatter-gather, which consists of running the tools over slices of data generated using Queue, the companion program to GATK. In terms of computational resources the result is similar: a certain number of cores on the machine are dedicated to running the given component. Full details are given in the script that accompanies the white paper; in a nutshell, multithreading was used for BWA-mem and RealignerTargetCreator, whereas scatter-gather was used to parallelize IndelRealigner, BaseRecalibrator, PrintReads and HaplotypeCaller.

You can see that for the genome mapping step, done with BWA-mem, going from 1 to just 4 threads leads to a 4-fold decrease in runtime, which is yuuuge since it's the most time-consuming step in the pipeline. And beyond that you can still bring runtime down further by throwing more threads at the problem, until the relative gains bottom out somewhere between 8 and 36 -- at which point you're spending very little time on BWA mapping. Your mileage may vary, but for reference, in Broad's production pipeline we give BWA-mem 12 threads.

On the GATK end of the pipeline, HaplotypeCaller also shows an about 4-fold speedup when going from a single core to 4 scatter-gather jobs run on different cores, but beyond that the gains from additional parallelization tend to be progressively more modest. Multithreading with -nct is not used at all because it has proved fairly unstable in HaplotypeCaller, leading to occasional unpredictable crashes.

The remaining steps see less dramatic improvement, comparatively speaking, though BaseRecalibrator and PrintReads with -BQSR do show a decent 2-fold speedup when run on four cores instead of one. But this shows fairly clearly that there's little point to blindly throwing more parallelization at these tools.

### Are we there yet? Are we there yet? Are we there yet?

In this second figure we're looking at CPU utilization throughout the pipeline, i.e. how much computing the machine is doing at any given time -- as opposed to doing boring things like reading and writing data to and from files (I/O), which is like driving from Omaha to Denver (a long flat drive where nothing exciting happens, but things get fun once you get there).

Note that this figure corresponds specifically to an "optimized run", i.e. a specific configuration of the pipeline where each tool was parallelized optimally based on earlier results.

You can see that BWA-mem is a busy beaver, spending pretty much all of its time furiously calculating mapping scores and writing out results to the output SAM file as it goes. In contrast, if we look at the tools that write out BAM files, we see a flurry of activity up front, then the line goes flat during a long period spent just writing out results to the output BAM file.

In the case of BaseRecalibrator, the tool only outputs a recalibration table, which doesn't take very long at all. Then looking at PrintReads (run with -BQSR) you see a similar activity profile to BaseRecalibrator's, which corresponds to the on-the-fly recalibration done by the engine before the recalibrated data is written to the final pre-processed BAM file that will be fed to HaplotypeCaller.

Finally you see that HaplotypeCaller itself is the most compute-intensive tool in the GATK end of the pipeline; although this is not shown in the figure, I can tell you that much of its time is spent on graph assembly and pairHMM alignment of haplotypes. Note that here HaplotypeCaller is writing out a GVCF file; if you were to run HaplotypeCaller in regular mode (not using ERC GVCF) you would see a shorter period of I/O flatline because the variants-only VCF output amounts to a much smaller file.

### What's next?

If the above made you want to know more, head on over to Intel's Health-IT website and get the full white paper.

As I mentioned this is only the first pass in an ongoing project. The next step is going to involve implementing the joint genotyping and filtering with VQSR for the WGS trio, as well as profiling the equivalent exome pipeline on a cohort of ~30 exomes.

You may have noticed that this first pass was done one a single (albeit multi-core) machine; this was done on purpose to provide a baseline for end-to-end execution with the simplest configuration. Our friends at Intel will be looking at multi-machine setup in a future iteration, and for our part we'll have some new developments for you on the pipelining front soon -- so stay tuned to this blog or follow @gatk_dev on Twitter!

Created 2015-12-08 21:44:17 | Updated | Tags: runtime mutect2

MuTect2 took a few days to process one pair of whole exome samples while muTect 1.17 only needs a few hours. Is this expected? Any suggestion as how to speed things up besides paralleling by chromosomes? Thanks!

Created 2015-10-08 12:13:07 | Updated 2015-10-08 12:14:25 | Tags: variantrecalibrator runtime

Trying to decide if this is a job submission/network problem, or a job script problem!

Running VariantRecalibrator in SNP mode on 95 joint-called genomes. The ProgressMeter runs nicely until it hits 100% and then appears to keep looping through at the same position without writing any outputs. Here are the last few of lines of the .out file ...

## INFO 15:02:16,359 ProgressMeter - chrX:118789197 3.944378E7 3.3 h 5.0 m 99.8% 3.3 h 25.0 s INFO 15:03:32,883 ProgressMeter - chrX:122953970 3.9478259E7 3.3 h 5.0 m 100.0% 3.3 h 4.0 s INFO 15:03:33,513 VariantDataManager - DP: mean = 2891.94 standard deviation = 1541.35 INFO 15:03:34,380 VariantDataManager - QD: mean = 21.50 standard deviation = 5.52 INFO 15:03:35,231 VariantDataManager - MQRankSum: mean = -0.03 standard deviation = 0.52 INFO 15:03:36,173 VariantDataManager - ReadPosRankSum: mean = 0.30 standard deviation = 0.39 INFO 15:04:56,292 ProgressMeter - chrX:123869054 3.9488752E7 3.3 h 5.0 m 100.0% 3.3 h 0.0 s INFO 15:06:14,605 ProgressMeter - chrX:123869054 3.9488752E7 3.3 h 5.1 m 100.0% 3.3 h 0.0 s INFO 15:07:46,623 ProgressMeter - chrX:123869054 3.9488752E7 3.4 h 5.1 m 100.0% 3.4 h 0.0 s INFO 15:09:08,227 ProgressMeter - chrX:123869054 3.9488752E7 3.4 h 5.2 m 100.0% 3.4 h 0.0 s INFO 15:10:22,074 ProgressMeter - chrX:123869054 3.9488752E7 3.4 h 5.2 m 100.0% 3.4 h 0.0 s INFO 15:11:34,295 ProgressMeter - chrX:123869054 3.9488752E7 3.4 h 5.2 m 100.0% 3.4 h 0.0 s INFO 15:12:48,256 ProgressMeter - chrX:123869054 3.9488752E7 3.5 h 5.2 m 100.0% 3.5 h 0.0 s INFO 15:14:01,197 ProgressMeter - chrX:123869054 3.9488752E7 3.5 h 5.3 m 100.0% 3.5 h 0.0 s INFO 15:15:13,241 ProgressMeter - chrX:123869054 3.9488752E7 3.5 h 5.3 m 100.0% 3.5 h 0.0 s INFO 15:15:13,611 VariantDataManager - Annotations are now ordered by their information content: [DP, QD, ReadPosRankSum, MQRankSum] INFO 15:15:13,874 VariantDataManager - Training with 2384941 variants after standard deviation thresholding. INFO 15:15:13,878 GaussianMixtureModel - Initializing model with 100 k-means iterations... INFO 15:16:19,785 ProgressMeter - chrX:123869054 3.9488752E7 3.5 h 5.3 m 100.0% 3.5 h 0.0 s INFO 15:18:46,658 ProgressMeter - chrX:123869054 3.9488752E7 3.6 h 5.4 m 100.0% 3.6 h 0.0 s INFO 15:19:58,618 ProgressMeter - chrX:123869054 3.9488752E7 3.6 h 5.4 m 100.0% 3.6 h 0.0 s INFO 15:21:13,231 ProgressMeter - chrX:123869054 3.9488752E7 3.6 h 5.5 m 100.0% 3.6 h 0.0 s INFO 15:22:24,708 ProgressMeter - chrX:123869054 3.9488752E7 3.6 h 5.5 m 100.0% 3.6 h 0.0 s INFO 15:23:40,391 ProgressMeter - chrX:123869054 3.9488752E7 3.6 h 5.5 m 100.0% 3.6 h 0.0 s INFO 15:24:55,246 ProgressMeter - chrX:123869054 3.9488752E7 3.7 h 5.6 m 100.0% 3.7 h 0.0 s INFO 15:26:08,867 ProgressMeter - chrX:123869054 3.9488752E7 3.7 h 5.6 m 100.0% 3.7 h 0.0 s INFO 15:27:21,513 ProgressMeter - chrX:123869054 3.9488752E7 3.7 h 5.6 m 100.0% 3.7 h 0.0 s INFO 15:28:42,824 ProgressMeter - chrX:123869054 3.9488752E7 3.7 h 5.6 m 100.0% 3.7 h 0.0 s INFO 15:29:53,956 ProgressMeter - chrX:123869054 3.9488752E7 3.7 h 5.7 m 100.0% 3.7 h 0.0 s INFO 15:32:11,877 ProgressMeter - chrX:123869054 3.9488752E7 3.8 h 5.7 m 100.0% 3.8 h 0.0 s INFO 15:33:12,010 ProgressMeter - chrX:123869054 3.9488752E7 3.8 h 5.8 m 100.0% 3.8 h 0.0 s INFO 15:34:19,364 ProgressMeter - chrX:123869054 3.9488752E7 3.8 h 5.8 m 100.0% 3.8 h 0.0 s INFO 15:35:18,484 ProgressMeter - chrX:123869054 3.9488752E7 3.8 h 5.8 m 100.0% 3.8 h 0.0 s INFO 15:36:23,992 ProgressMeter - chrX:123869054 3.9488752E7 3.8 h 5.8 m 100.0% 3.8 h 0.0 s INFO 15:36:24,003 VariantRecalibratorEngine - Finished iteration 0. INFO 15:37:30,932 ProgressMeter - chrX:123869054 3.9488752E7 3.9 h 5.9 m 100.0% 3.9 h 0.0 s INFO 15:38:29,846 ProgressMeter - chrX:123869054 3.9488752E7 3.9 h 5.9 m 100.0% 3.9 h 0.0 s INFO 15:39:30,868 ProgressMeter - chrX:123869054 3.9488752E7 3.9 h 5.9 m 100.0% 3.9 h 0.0 s INFO 15:40:29,351 ProgressMeter - chrX:123869054 3.9488752E7 3.9 h 5.9 m 100.0% 3.9 h 0.0 s INFO 15:41:29,067 ProgressMeter - chrX:123869054 3.9488752E7 3.9 h 6.0 m 100.0% 3.9 h 0.0 s INFO 15:42:34,585 ProgressMeter - chrX:123869054 3.9488752E7 3.9 h 6.0 m 100.0% 3.9 h 0.0 s INFO 15:43:33,707 ProgressMeter - chrX:123869054 3.9488752E7 4.0 h 6.0 m 100.0% 4.0 h 0.0 s INFO 15:44:35,069 ProgressMeter - chrX:123869054 3.9488752E7 4.0 h 6.1 m 100.0% 4.0 h 0.0 s INFO 15:45:34,865 ProgressMeter - chrX:123869054 3.9488752E7 4.0 h 6.1 m 100.0% 4.0 h 0.0 s slurmstepd: JOB 3652350 CANCELLED AT 2015-10-07T15:45:56 DUE TO TIME LIMIT on cn0066

The job eventually gets cancelled due to wall time limits (4 hours in this case). So, the question is, is this looping at 100% completion "correct", therefore my problem is related to just giving it more computational time? Or is there some error happening and it shouldn't be cycling through like this to begin with?

Thank you!!

Created 2014-07-16 22:58:28 | Updated | Tags: runtime genotypegvcfs

I have noticed when using the GenotypeGVCFs walker (version 3.2), that the remaining runtime estimate is very poor. The estimate from CombineGVCFs and other walkers on the other hand is very accurate. This is not a critical bug. It is rather a feature enhancement. I just noticed that the "completed" percentage is also incorrect. It does not start at 0%. In fact it has stayed constant after walking over 2 million base pairs of 2000 samples in 2 hours. I am not using multi threading. Not that important to me, but I thought I would let you know.

Created 2013-12-17 10:21:02 | Updated | Tags: haplotypecaller runtime checkpoint runtime-error

I am using GATK 2.7 HC with Berkeley Lab Checkpoint/Restart (BLCR). After I checkpoint and restart my job on the cluster I get the stdout and stderr shown below. I think I got it, because the restart happened after midnight. Usually I just get incorrect estimates on total remaining runtime. Would it be possible to make GATK fully compatible with the use of BLCR? Thank you.

stdout: INFO 01:15:01,258 ProgressMeter - 1:28001706 0.00e+00 -4690921.0 s -9223372036.0 s 80.0% -5862401.0 s -1171480.0 s

stderr:

##### ERROR ------------------------------------------------------------------------------------------

Any Idea what is the solution??

Created 2012-12-12 15:40:37 | Updated | Tags: unifiedgenotyper runtime logs

I've noticed a change in the log files that now shows a period of "starting" before actually genotyping. This can be hours...or minutes... I looked in previous logs and saw that there aren't such line in the log, but the clock jumps a similar amount of time. I was wondering if this is a known phenomenon and what it the GATK doing during that time. Is there anything we can do to reduce that waiting period?

Here are examples:

old (calling with 4500 samples, glm BOTH):

INFO 12:21:37,515 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 1157.62 INFO 12:26:52,780 RMDTrackBuilder - Loading Tribble index from disk for file /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.dbsnp.vcf WARN 12:26:54,098 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has UNBOUNDED but standard is A INFO 12:26:54,457 GenomeAnalysisEngine - Processing 43138 bp from intervals INFO 12:26:54,473 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 12:26:54,473 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 21:37:08,951 ProgressMeter - 17:26944207 0.00e+00 9.2 h 54587.4 w 0.0% Infinity w Infinity w INFO 21:40:40,002 ProgressMeter - 17:26945957 1.82e+02 9.2 h 301.8 w 0.9% 6.2 w 6.1 w INFO 21:41:53,697 ProgressMeter - 17:26946820 6.49e+02 9.2 h 84.8 w 1.5% 3.6 w 3.5 w

and here newer (calling 1500 sample, glm SNP):

INFO 18:42:42,025 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 24.55 INFO 18:42:47,648 RMDTrackBuilder - Loading Tribble index from disk for file /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.dbsnp.vcf WARN 18:42:47,972 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has UNBOUNDED but standard is A INFO 18:42:48,192 GenomeAnalysisEngine - Processing 43138 bp from intervals INFO 18:42:48,206 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 18:42:48,206 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 18:43:18,301 ProgressMeter - starting 0.00e+00 30.1 s 49.8 w 100.0% 30.1 s 0.0 s INFO 18:43:48,310 ProgressMeter - starting 0.00e+00 60.1 s 99.4 w 100.0% 60.1 s 0.0 s INFO 18:44:18,344 ProgressMeter - starting 0.00e+00 90.1 s 149.0 w 100.0% 90.1 s 0.0 s INFO 18:44:48,353 ProgressMeter - starting 0.00e+00 2.0 m 198.7 w 100.0% 2.0 m 0.0 s INFO 18:45:18,363 ProgressMeter - starting 0.00e+00 2.5 m 248.3 w 100.0% 2.5 m 0.0 s INFO 18:45:48,373 ProgressMeter - starting 0.00e+00 3.0 m 297.9 w 100.0% 3.0 m 0.0 s INFO 18:46:18,382 ProgressMeter - starting 0.00e+00 3.5 m 347.5 w 100.0% 3.5 m 0.0 s INFO 18:46:48,391 ProgressMeter - starting 0.00e+00 4.0 m 397.1 w 100.0% 4.0 m 0.0 s INFO 18:47:18,401 ProgressMeter - starting 0.00e+00 4.5 m 446.7 w 100.0% 4.5 m 0.0 s INFO 18:47:48,411 ProgressMeter - starting 0.00e+00 5.0 m 496.4 w 100.0% 5.0 m 0.0 s INFO 18:48:18,426 ProgressMeter - starting 0.00e+00 5.5 m 546.0 w 100.0% 5.5 m 0.0 s INFO 18:48:48,436 ProgressMeter - starting 0.00e+00 6.0 m 595.6 w 100.0% 6.0 m 0.0 s INFO 18:49:18,446 ProgressMeter - starting 0.00e+00 6.5 m 645.2 w 100.0% 6.5 m 0.0 s INFO 18:49:48,455 ProgressMeter - starting 0.00e+00 7.0 m 694.9 w 100.0% 7.0 m 0.0 s INFO 18:50:18,465 ProgressMeter - starting 0.00e+00 7.5 m 744.5 w 100.0% 7.5 m 0.0 s INFO 18:50:48,475 ProgressMeter - starting 0.00e+00 8.0 m 794.1 w 100.0% 8.0 m 0.0 s INFO 18:51:18,484 ProgressMeter - starting 0.00e+00 8.5 m 843.7 w 100.0% 8.5 m 0.0 s INFO 18:51:48,510 ProgressMeter - starting 0.00e+00 9.0 m 893.4 w 100.0% 9.0 m 0.0 s INFO 18:52:18,520 ProgressMeter - starting 0.00e+00 9.5 m 943.0 w 100.0% 9.5 m 0.0 s INFO 18:52:48,530 ProgressMeter - starting 0.00e+00 10.0 m 992.6 w 100.0% 10.0 m 0.0 s INFO 18:53:18,541 ProgressMeter - starting 0.00e+00 10.5 m 1042.2 w 100.0% 10.5 m 0.0 s INFO 18:53:48,552 ProgressMeter - starting 0.00e+00 11.0 m 1091.8 w 100.0% 11.0 m 0.0 s INFO 18:54:18,561 ProgressMeter - starting 0.00e+00 11.5 m 1141.5 w 100.0% 11.5 m 0.0 s INFO 18:54:48,570 ProgressMeter - starting 0.00e+00 12.0 m 1191.1 w 100.0% 12.0 m 0.0 s INFO 18:55:18,579 ProgressMeter - starting 0.00e+00 12.5 m 1240.7 w 100.0% 12.5 m 0.0 s INFO 18:55:48,590 ProgressMeter - 18:12725582 1.00e+01 13.0 m 129.0 w 0.5% 45.4 h 45.2 h

I've also noticed that output seems to be quite wrong about how much work it has left to do...perhaps these two things are related?

I would be grateful for any information.