Tagged with #runtime
0 documentation articles | 0 announcements | 6 forum discussions


No posts found with the requested search criteria.
No posts found with the requested search criteria.
Comments (4)

I am using GATK 2.7 HC with Berkeley Lab Checkpoint/Restart (BLCR). After I checkpoint and restart my job on the cluster I get the stdout and stderr shown below. I think I got it, because the restart happened after midnight. Usually I just get incorrect estimates on total remaining runtime. Would it be possible to make GATK fully compatible with the use of BLCR? Thank you.

stdout: INFO 01:15:01,258 ProgressMeter - 1:28001706 0.00e+00 -4690921.0 s -9223372036.0 s 80.0% -5862401.0 s -1171480.0 s

stderr:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: runtime must be >= 0 but got -4690951902384272 at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.exceedsRuntimeLimit(GenomeAnalysisEngine.java:1200) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$ActiveRegionIterator.hasNext(TraverseActiveRegions.java:329) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:268) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:273) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.7-2-g6bda569):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: runtime must be >= 0 but got -4690951902384272
ERROR ------------------------------------------------------------------------------------------
Comments (10)

Hello there,

I ran GATK's UnifiedGenotyper of version 2.7-4 (SNP calling) with multiple data sets that where growing in size (10 sets ranging from 8.5 GB to 88GB) with 36 cpu threads and 8 GB heap assigned. I noticed that the UnifiedGenotyper performed much better than linear with a runtime increase of 4.6 times for the largest data set, although this was 10 times larger than the smallest. Now I wonder how this is achieved by GATK? What part of the processing makes it scale so well to be even better than linear here? Is it the SNP calling that is so performant, or how the data is preprocessed/filtered?

Best,

Cindy

Comments (6)

Hello there,

I have run GATK's UnifiedGenotyper tool with different thread sizes (9, 18, 36 on a 40-core machine) in order to see how well it scales for an increasing thread size. I first tried out UnifiedGenotyper of version 1.7 and set -nt to 9, 18, and 36 with 8GB heap assigned for a 88GB large BAM file, respectively. When looking at the results, I noticed that runtime performance increased when using more threads, so I updated GATK to version 2.7-4 in order to check if this behavior has vanished with the newer version. I also noticed then that there exists another thread-parameter for the number of cpu threads.

So, I ran GATK's UnifiedGenotyper version 2.7-4 for a 88GB large file on a 40-core linux machine with -nt 1, -nct 9/18/36, and assigned 8GB heap. The behavior in runtime performance remained the same. What am I missing here?

Best,

Cindy

Comments (5)

Hello,

I try to run the realigner target creator with the data divided on chromosomes. It seems to run smooth but the runtime for chromosome 1 seems to never end. I have checked my bam files and as expected the one with chr 1 is a little bit bigger than chr 2 but nothing proportional to the differences in predicted runtime. The version i use is GATK 2.3.0 and this is how i run it:

java -Xmx12g -jar programs/GenomeAnalysisTK-2.3-0/GenomeAnalysisTK.jar -l INFO -T VariantRecalibrator -R Homo_sapiens.GRCh37.57_dna_concat.fa -recalFile allchr_varrecal_BOTH_comb_ref.intervals -rscriptFile 15_allchr_varrecal_BOTH_comb_ref.intervals.plots.R -tranchesFile allchr_varrecal_BOTH_comb_ref.intervals.tranches -resource:hapmap,VCF,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.sites.vcf -resource:omni,VCF,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.b37.sites.vcf -resource:dbsnp,VCF,known=true,training=false,truth=false,prior=8.0 dbsnp_135.b37.vcf -resource:mills,VCF,known=true,training=true,truth=true,prior=12.0 Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ --mode BOTH -nt 8 -input allchr_real_recal_resrt_raw_BOTH_comb_ref.vcf -L Agilent_SureSelect.V4.GRCh37.70_targets_nochr.bed.pad100.interval_list --pedigreeValidationType SILENT --pedigree my_fam.fam

predicted runtime looks like:

INFO 15:11:33,778 ProgressMeter - 1:33587200 3.36e+07 2.3 h 4.1 m 1.1% 8.8 d 8.7 d INFO 15:11:33,778 ProgressMeter - 11:73495886 1.82e+09 2.3 h 4.5 s 61.0% 3.7 h 87.4 m INFO 15:11:33,778 ProgressMeter - 10:6623707 1.68e+09 2.3 h 4.9 s 54.5% 4.2 h 114.3 m INFO 15:11:33,778 ProgressMeter - 11:572298 1.82e+09 2.3 h 4.5 s 58.7% 3.9 h 96.4 m INFO 15:11:33,779 ProgressMeter - 11:122078003 1.82e+09 2.3 h 4.5 s 62.6% 3.6 h 81.7 m INFO 15:11:53,780 ProgressMeter - 11:10589211 1.82e+09 2.3 h 4.5 s 59.0% 3.9 h 95.3 m INFO 15:11:53,780 ProgressMeter - 11:105031869 1.82e+09 2.3 h 4.5 s 62.1% 3.7 h 83.9 m INFO 15:11:53,781 ProgressMeter - 11:46994477 1.82e+09 2.3 h 4.5 s 60.2% 3.8 h 90.8 m INFO 15:12:03,779 ProgressMeter - 11:8677862 1.82e+09 2.3 h 4.5 s 58.9% 3.9 h 95.7 m INFO 15:12:03,779 ProgressMeter - 11:130690257 1.82e+09 2.3 h 4.5 s 62.9% 3.6 h 81.1 m INFO 15:12:13,779 ProgressMeter - 11:84816026 1.82e+09 2.3 h 4.5 s 61.4% 3.7 h 86.4 m INFO 15:12:13,779 ProgressMeter - 10:17039143 1.68e+09 2.3 h 4.9 s 54.8% 4.2 h 113.3 m INFO 15:12:23,780 ProgressMeter - 11:18556991 1.82e+09 2.3 h 4.5 s 59.3% 3.9 h 94.6 m INFO 15:12:23,780 ProgressMeter - 11:113365440 1.82e+09 2.3 h 4.5 s 62.3% 3.7 h 83.2 m INFO 15:12:23,782 ProgressMeter - 11:55284794 1.82e+09 2.3 h 4.5 s 60.4% 3.8 h 90.1 m INFO 15:12:33,780 ProgressMeter - 1:33783808 3.38e+07 2.3 h 4.1 m 1.1% 8.8 d 8.7 d INFO 15:12:33,780 ProgressMeter - 12:3311608 1.95e+09 2.3 h 4.2 s 63.1% 3.6 h 80.5 m INFO 15:12:43,779 ProgressMeter - 11:93292307 1.82e+09 2.3 h 4.6 s 61.7% 3.7 h 85.8 m INFO 15:12:43,780 ProgressMeter - 10:24841443 1.68e+09 2.3 h 4.9 s 55.1% 4.2 h 112.5 m INFO 15:12:43,780 ProgressMeter - 11:19408689 1.82e+09 2.3 h 4.6 s 59.3% 3.9 h 94.8 m INFO 15:12:53,965 ProgressMeter - 11:26512308 1.82e+09 2.3 h 4.6 s 59.5% 3.9 h 94.0 m INFO 15:12:53,965 ProgressMeter - 11:121738354 1.82e+09 2.3 h 4.6 s 62.6% 3.7 h 82.6 m INFO 15:12:53,965 ProgressMeter - 11:63468191 1.82e+09 2.3 h 4.6 s 60.7% 3.8 h 89.4 m INFO 15:13:03,781 ProgressMeter - 12:11960820 1.95e+09 2.3 h 4.3 s 63.4% 3.6 h 79.8 m INFO 15:13:13,780 ProgressMeter - 11:101551879 1.82e+09 2.3 h 4.6 s 61.9% 3.7 h 85.1 m INFO 15:13:13,780 ProgressMeter - 10:32238974 1.68e+09 2.3 h 4.9 s 55.3% 4.2 h 111.8 m INFO 15:13:13,780 ProgressMeter - 11:27204534 1.82e+09 2.3 h 4.6 s 59.5% 3.9 h 94.1 m INFO 15:13:23,966 ProgressMeter - 11:71529440 1.82e+09 2.3 h 4.6 s 61.0% 3.8 h 88.8 m INFO 15:13:23,968 ProgressMeter - 11:34170083 1.82e+09 2.3 h 4.6 s 59.8% 3.9 h 93.4 m INFO 15:13:23,977 ProgressMeter - 11:129889715 1.82e+09 2.3 h 4.6 s 62.9% 3.7 h 81.9 m INFO 15:13:33,781 ProgressMeter - 1:33980416 3.40e+07 2.3 h 4.1 m 1.1% 8.8 d 8.7 d

Any suggestions?

Regards,

Måns

Comments (1)

INFO 17:40:38,765 ProgressMeter - chrX:147886444 3.21e+07 5.5 h 10.3 m 97.9% 5.6 h 7.1 m INFO 17:41:08,775 ProgressMeter - chrX:151653849 3.21e+07 5.5 h 10.3 m 98.0% 5.6 h 6.7 m INFO 17:41:38,785 ProgressMeter - chrX:153812787 3.22e+07 5.5 h 10.3 m 98.1% 5.6 h 6.5 m

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.NoClassDefFoundError: net/sf/samtools/util/CloserUtil at net.sf.picard.util.PeekableIterator.close(PeekableIterator.java:46) at net.sf.picard.sam.MergingSamRecordIterator.addIfNotEmpty(MergingSamRecordIterator.java:169) at net.sf.picard.sam.MergingSamRecordIterator.next(MergingSamRecordIterator.java:125) at net.sf.picard.sam.MergingSamRecordIterator.next(MergingSamRecordIterator.java:39) at org.broadinstitute.sting.gatk.iterators.PrivateStringSAMCloseableIterator.next(StingSAMIteratorAdapter.java:100) at org.broadinstitute.sting.gatk.iterators.PrivateStringSAMCloseableIterator.next(StingSAMIteratorAdapter.java:84) at org.broadinstitute.sting.gatk.datasources.reads.SAMDataSource$ReleasingIterator.next(SAMDataSource.java:1091) at org.broadinstitute.sting.gatk.datasources.reads.SAMDataSource$ReleasingIterator.next(SAMDataSource.java:1057) at org.broadinstitute.sting.gatk.filters.CountingFilteringIterator.getNextRecord(CountingFilteringIterator.java:105) at org.broadinstitute.sting.gatk.filters.CountingFilteringIterator.next(CountingFilteringIterator.java:81) at org.broadinstitute.sting.gatk.filters.CountingFilteringIterator.next(CountingFilteringIterator.java:41) at org.broadinstitute.sting.gatk.iterators.PrivateStringSAMCloseableIterator.next(StingSAMIteratorAdapter.java:100) at org.broadinstitute.sting.gatk.iterators.PrivateStringSAMCloseableIterator.next(StingSAMIteratorAdapter.java:84) at org.broadinstitute.sting.gatk.iterators.VerifyingSamIterator.next(VerifyingSamIterator.java:24) at org.broadinstitute.sting.gatk.iterators.VerifyingSamIterator.next(VerifyingSamIterator.java:12) at org.broadinstitute.sting.utils.baq.ReadTransformingIterator.next(ReadTransformingIterator.java:36) at org.broadinstitute.sting.utils.baq.ReadTransformingIterator.next(ReadTransformingIterator.java:15) at net.sf.picard.util.PeekableIterator.advance(PeekableIterator.java:71) at net.sf.picard.util.PeekableIterator.next(PeekableIterator.java:57) at org.broadinstitute.sting.gatk.datasources.reads.ReadShard.fill(ReadShard.java:135) at org.broadinstitute.sting.gatk.datasources.reads.ReadShardBalancer$1.advance(ReadShardBalancer.java:153) at org.broadinstitute.sting.gatk.datasources.reads.ReadShardBalancer$1.next(ReadShardBalancer.java:116) at org.broadinstitute.sting.gatk.datasources.reads.ReadShardBalancer$1.next(ReadShardBalancer.java:75) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:65) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91) Caused by: java.lang.ClassNotFoundException: net.sf.samtools.util.CloserUtil at java.net.URLClassLoader$1.run(URLClassLoader.java:199) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 29 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1200(ZipFile.java:29) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:447) at java.util.zip.ZipFile$1.fill(ZipFile.java:230) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141) at java.util.jar.Manifest$FastInputStream.fill(Manifest.java:422) at java.util.jar.Manifest$FastInputStream.readLine(Manifest.java:358) at java.util.jar.Manifest$FastInputStream.readLine(Manifest.java:390) at java.util.jar.Attributes.read(Attributes.java:359) at java.util.jar.Manifest.read(Manifest.java:182) at java.util.jar.Manifest.(Manifest.java:52) at java.util.jar.JarFile.getManifestFromReference(JarFile.java:167) at java.util.jar.JarFile.getManifest(JarFile.java:148) at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:696) at java.net.URLClassLoader.defineClass(URLClassLoader.java:228) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) ... 34 more

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: net/sf/samtools/util/CloserUtil
ERROR ------------------------------------------------------------------------------------------

Any Idea what is the solution??

Comments (1)

I've noticed a change in the log files that now shows a period of "starting" before actually genotyping. This can be hours...or minutes... I looked in previous logs and saw that there aren't such line in the log, but the clock jumps a similar amount of time. I was wondering if this is a known phenomenon and what it the GATK doing during that time. Is there anything we can do to reduce that waiting period?

Here are examples:

old (calling with 4500 samples, glm BOTH):

INFO 12:21:37,515 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 1157.62 INFO 12:26:52,780 RMDTrackBuilder - Loading Tribble index from disk for file /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.dbsnp.vcf WARN 12:26:54,098 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has UNBOUNDED but standard is A INFO 12:26:54,457 GenomeAnalysisEngine - Processing 43138 bp from intervals INFO 12:26:54,473 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 12:26:54,473 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 21:37:08,951 ProgressMeter - 17:26944207 0.00e+00 9.2 h 54587.4 w 0.0% Infinity w Infinity w INFO 21:40:40,002 ProgressMeter - 17:26945957 1.82e+02 9.2 h 301.8 w 0.9% 6.2 w 6.1 w INFO 21:41:53,697 ProgressMeter - 17:26946820 6.49e+02 9.2 h 84.8 w 1.5% 3.6 w 3.5 w

and here newer (calling 1500 sample, glm SNP):

INFO 18:42:42,025 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 24.55 INFO 18:42:47,648 RMDTrackBuilder - Loading Tribble index from disk for file /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.dbsnp.vcf WARN 18:42:47,972 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has UNBOUNDED but standard is A INFO 18:42:48,192 GenomeAnalysisEngine - Processing 43138 bp from intervals INFO 18:42:48,206 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 18:42:48,206 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 18:43:18,301 ProgressMeter - starting 0.00e+00 30.1 s 49.8 w 100.0% 30.1 s 0.0 s INFO 18:43:48,310 ProgressMeter - starting 0.00e+00 60.1 s 99.4 w 100.0% 60.1 s 0.0 s INFO 18:44:18,344 ProgressMeter - starting 0.00e+00 90.1 s 149.0 w 100.0% 90.1 s 0.0 s INFO 18:44:48,353 ProgressMeter - starting 0.00e+00 2.0 m 198.7 w 100.0% 2.0 m 0.0 s INFO 18:45:18,363 ProgressMeter - starting 0.00e+00 2.5 m 248.3 w 100.0% 2.5 m 0.0 s INFO 18:45:48,373 ProgressMeter - starting 0.00e+00 3.0 m 297.9 w 100.0% 3.0 m 0.0 s INFO 18:46:18,382 ProgressMeter - starting 0.00e+00 3.5 m 347.5 w 100.0% 3.5 m 0.0 s INFO 18:46:48,391 ProgressMeter - starting 0.00e+00 4.0 m 397.1 w 100.0% 4.0 m 0.0 s INFO 18:47:18,401 ProgressMeter - starting 0.00e+00 4.5 m 446.7 w 100.0% 4.5 m 0.0 s INFO 18:47:48,411 ProgressMeter - starting 0.00e+00 5.0 m 496.4 w 100.0% 5.0 m 0.0 s INFO 18:48:18,426 ProgressMeter - starting 0.00e+00 5.5 m 546.0 w 100.0% 5.5 m 0.0 s INFO 18:48:48,436 ProgressMeter - starting 0.00e+00 6.0 m 595.6 w 100.0% 6.0 m 0.0 s INFO 18:49:18,446 ProgressMeter - starting 0.00e+00 6.5 m 645.2 w 100.0% 6.5 m 0.0 s INFO 18:49:48,455 ProgressMeter - starting 0.00e+00 7.0 m 694.9 w 100.0% 7.0 m 0.0 s INFO 18:50:18,465 ProgressMeter - starting 0.00e+00 7.5 m 744.5 w 100.0% 7.5 m 0.0 s INFO 18:50:48,475 ProgressMeter - starting 0.00e+00 8.0 m 794.1 w 100.0% 8.0 m 0.0 s INFO 18:51:18,484 ProgressMeter - starting 0.00e+00 8.5 m 843.7 w 100.0% 8.5 m 0.0 s INFO 18:51:48,510 ProgressMeter - starting 0.00e+00 9.0 m 893.4 w 100.0% 9.0 m 0.0 s INFO 18:52:18,520 ProgressMeter - starting 0.00e+00 9.5 m 943.0 w 100.0% 9.5 m 0.0 s INFO 18:52:48,530 ProgressMeter - starting 0.00e+00 10.0 m 992.6 w 100.0% 10.0 m 0.0 s INFO 18:53:18,541 ProgressMeter - starting 0.00e+00 10.5 m 1042.2 w 100.0% 10.5 m 0.0 s INFO 18:53:48,552 ProgressMeter - starting 0.00e+00 11.0 m 1091.8 w 100.0% 11.0 m 0.0 s INFO 18:54:18,561 ProgressMeter - starting 0.00e+00 11.5 m 1141.5 w 100.0% 11.5 m 0.0 s INFO 18:54:48,570 ProgressMeter - starting 0.00e+00 12.0 m 1191.1 w 100.0% 12.0 m 0.0 s INFO 18:55:18,579 ProgressMeter - starting 0.00e+00 12.5 m 1240.7 w 100.0% 12.5 m 0.0 s INFO 18:55:48,590 ProgressMeter - 18:12725582 1.00e+01 13.0 m 129.0 w 0.5% 45.4 h 45.2 h

I've also noticed that output seems to be quite wrong about how much work it has left to do...perhaps these two things are related?

I would be grateful for any information.