Test that the GATK is correctly installed, and that the supporting tools like Java are in your path.


  • Basic familiarity with the command-line environment
  • Understand what is a PATH variable
  • GATK downloaded and placed on path


  1. Invoke the GATK usage/help message
  2. Troubleshooting

1. Invoke the GATK usage/help message

The command we're going to run is a very simple command that asks the GATK to print out a list of available command-line arguments and options. It is so simple that it will ALWAYS work if your GATK package is installed correctly.

Note that this command is also helpful when you're trying to remember something like the right spelling or short name for an argument and for whatever reason you don't have access to the web-based documentation.


Type the following command:

java -jar <path to GenomeAnalysisTK.jar> --help

replacing the <path to GenomeAnalysisTK.jar> bit with the path you have set up in your command-line environment.

Expected Result

You should see usage output similar to the following:

usage: java -jar GenomeAnalysisTK.jar -T <analysis_type> [-I <input_file>] [-L 
        <intervals>] [-R <reference_sequence>] [-B <rodBind>] [-D <DBSNP>] [-H 
        <hapmap>] [-hc <hapmap_chip>] [-o <out>] [-e <err>] [-oe <outerr>] [-A] [-M 
        <maximum_reads>] [-sort <sort_on_the_fly>] [-compress <bam_compression>] [-fmq0] [-dfrac 
        <downsample_to_fraction>] [-dcov <downsample_to_coverage>] [-S 
        <validation_strictness>] [-U] [-P] [-dt] [-tblw] [-nt <numthreads>] [-l 
        <logging_level>] [-log <log_to_file>] [-quiet] [-debug] [-h]
-T,--analysis_type <analysis_type>                     Type of analysis to run
-I,--input_file <input_file>                           SAM or BAM file(s)
-L,--intervals <intervals>                             A list of genomic intervals over which 
                                                       to operate. Can be explicitly specified 
                                                       on the command line or in a file.
-R,--reference_sequence <reference_sequence>           Reference sequence file
-B,--rodBind <rodBind>                                 Bindings for reference-ordered data, in 
                                                       the form <name>,<type>,<file>
-D,--DBSNP <DBSNP>                                     DBSNP file
-H,--hapmap <hapmap>                                   Hapmap file
-hc,--hapmap_chip <hapmap_chip>                        Hapmap chip file
-o,--out <out>                                         An output file presented to the walker. 
                                                       Will overwrite contents if file exists.
-e,--err <err>                                         An error output file presented to the 
                                                       walker. Will overwrite contents if file 
-oe,--outerr <outerr>                                  A joint file for 'normal' and error 
                                                       output presented to the walker. Will 
                                                       overwrite contents if file exists.


If you see this message, your GATK installation is ok. You're good to go! If you don't see this message, and instead get an error message, proceed to the next section on troubleshooting.

2. Troubleshooting

Let's try to figure out what's not working.


First, make sure that your Java version is at least 1.7, by typing the following command:

java -version

Expected Result

You should see something similar to the following text:

java version "1.7.0_12"
Java(TM) SE Runtime Environment (build 1.7.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)  

Remedial actions

If the version is less then 1.7, install the newest version of Java onto the system. If you instead see something like

java: Command not found  

make sure that java is installed on your machine, and that your PATH variable contains the path to the java executables.

first of all, yes, I am aware of this post ( http://gatkforums.broadinstitute.org/discussion/comment/20469/#Comment_20469 ) and its recommendation of using -XX:ParallelGCThreads.

I have been testing my pipeline to run HaplotypeCaller and, through testing, I have found java/GATK opening too many ¿unnecessary? threads. This leads to increased CPU metrics on our cluster and a not-so-happy sysadmin.

An example of my command is as follows:

java -Xmx20G -XX:ParallelGCThreads=${gct} -Djava.io.tmpdir=${TMPDIR} -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R ${ref} \ -I ${input} \ -L ${reg} \ -ERC GVCF \ -nct ${nct} \ --genotyping_mode DISCOVERY \ -stand_emit_conf 10 \ -stand_call_conf 30 \ -o ${name}.raw_variants.annotated.g.vcf \ -A QualByDepth -A RMSMappingQuality -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A StrandOddsRatio -A Coverage -A InbreedingCoeff

For example, when running this in the cluster with nct=1 and ParallelGCThreads=4, top shows the following:

$ top -c -u theredia -H -b -n1
top - 20:25:40 up 15 days, 7:00, 1 user, load average: 5.95, 5.41, 5.15 Tasks: 520 total, 7 running, 513 sleeping, 0 stopped, 0 zombie Cpu(s): 11.3%us, 0.1%sy, 0.1%ni, 88.4%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 99192688k total, 10184220k used, 89008468k free, 886776k buffers Swap: 10239992k total, 0k used, 10239992k free, 800664k cached

29376 theredia 20 0 21.9g 657m 12m R 97.0 0.7 2:13.99 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29377 theredia 20 0 21.9g 657m 12m S 1.9 0.7 0:00.79 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29378 theredia 20 0 21.9g 657m 12m S 1.9 0.7 0:00.79 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29379 theredia 20 0 21.9g 657m 12m S 1.9 0.7 0:00.79 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 30210 theredia 20 0 17424 1504 868 R 1.9 0.0 0:00.01 top -c -u theredia -H -b -n1
29375 theredia 20 0 21.9g 657m 12m S 0.0 0.7 0:00.00 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29380 theredia 20 0 21.9g 657m 12m S 0.0 0.7 0:00.77 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29381 theredia 20 0 21.9g 657m 12m S 0.0 0.7 0:00.35 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29382 theredia 20 0 21.9g 657m 12m S 0.0 0.7 0:00.03 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29383 theredia 20 0 21.9g 657m 12m S 0.0 0.7 0:00.05 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29384 theredia 20 0 21.9g 657m 12m S 0.0 0.7 0:00.00 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29385 theredia 20 0 21.9g 657m 12m S 0.0 0.7 0:12.40 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29386 theredia 20 0 21.9g 657m 12m S 0.0 0.7 0:10.34 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29387 theredia 20 0 21.9g 657m 12m S 0.0 0.7 0:00.00 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29388 theredia 20 0 21.9g 657m 12m S 0.0 0.7 0:00.06 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29429 theredia 20 0 21.9g 657m 12m S 0.0 0.7 0:00.00 java -Xmx20G -XX:ParallelGCThreads=4 -Djava.io.tmpdir=/tmp/16535.1.lab_dcexs.q -jar /aplic/noarch/GATK/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller 29747 theredia 20 0 112m 1968 888 S 0.0 0.0 0:00.02 sshd: theredia@pts/0
29748 theredia 20 0 114m 6060 1460 S 0.0 0.0 0:00.05 -bash

All in all, a count of the number of threads tested is:

12-core node: nct=1 GC=NO : 21 nct=1 GC=4 : 15 nct=1 GC=8 : 19 nct=1 GC=12 : 23 nct=4 GC=NO : 26 nct=4 GC=4 : 20

20-core node: nct=1 GC=NO : 26 nct=1 GC=4 : 15 nct=1 GC=8 : 19 nct=1 GC=12 : 23 nct=1 GC=15 : 26 nct=1 GC=20 : 31 nct=4 GC=NO : 31 nct=4 GC=4 : 20 nct=4 GC=8 : 24 nct=8 GC=4 : 24 nct=8 GC=8 : 28

In order to run the calling pipeline for the chromosome 1 of a single 40x sample, I have to request -Xmx20G and ask SunGridEngine to reserve 4 threads for me, even when requesing -nct=1 and parallelGCThreads=4. Any request lower than this ends up with the job stalled at some point with progress reporter stuck at the same point for 3 days before I have to kill the job.

Our cluster has core-binding enabled. Thus, if I request only 1 thread/core, but GATK opens 15/20 threads, all these 19 extra threads have to fight the main thread for cpu time and the main process slows down a lot.

Does GATK really need this amount of resources? I there a way I can further reduce the amount of threads java/GATK is spawning? Our sysadmins are not happy because this rises the cpu load levels of the involved nodes and rises several alarms with no need to.

Thanks in advance,


Dear GATK,

I used the HaplotypeCaller with "-dcov 500 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000" to produce 60 gvcf files, that worked fine. However, GenotypeGVCFs gets stuck on a position and runs out of memory after about 24hours, even when I allocate 240Gb. Testing a short region of 60kb does not help. Here was my command line: software/jre1.7.0_25/bin/java -Xmx240g -jar GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T GenotypeGVCFs -R Reference.fasta -L chrom14:2240000-2300000 --variant 60samples_gvcf.list -o output.vcf

If I split my list of 60 gvcf files into two lists of 30 samples each, GenotypeGVCFs works fine for both batches within 15 minutes (~10Gb of memory).
I tested with 47 samples, it took 8 hours (31gb of memory) for a 60kb region. Once I use more than ~55 samples, it takes forever and crashes.

Any help will be much appreciated! Thanks,


after running 5 exomes with GATK-v3.3 and HaplotypeCaller, I encountered a very low titv ration in my samples (~2.1) as VaraintEval report indicated. I tried running varaint filtration in these samples but I didn't see any imporvement in titv ratio nor any filtering done. therefore I filtered these with bcftools, after which the titv ratio improved to 2.5. Then when I tried running GenotypeGVCFs on these samples filtered with bcftools, I encountered the following error:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Double at java.lang.Double.compareTo(Double.java:49) at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:290) at java.util.ComparableTimSort.sort(ComparableTimSort.java:157) at java.util.ComparableTimSort.sort(ComparableTimSort.java:146) at java.util.Arrays.sort(Arrays.java:472) at java.util.Collections.sort(Collections.java:155) at org.broadinstitute.gatk.utils.MathUtils.median(MathUtils.java:999) at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.combineAnnotationValues(ReferenceConfidenceVariantContextMerger.java:73) at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:158) at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:202) at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:121) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:310) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version nightly-2014-11-17-g58cfab1):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: java.lang.Integer cannot be cast to java.lang.Double
ERROR ------------------------------------------------------------------------------------------

any advice on solving this incident will be much appreciated


Uh..Does the GATK read the reference such as hg19 from the disk each time it runs some tools? As the GATK will run many tools such as RealignerTargetCreator, IndelRealigner, BaseRecalibrator in a variant detection pipeline, if it read hg19 from the disk every time, it will be time-consuming...

I'm running the HaplotypeCaller on a series of samples using a while loop in a bash script and for some samples the HaplotypeCaller is stopping part way through the file. My command was: java -Xmx18g -jar $Gpath/GenomeAnalysisTK.jar \ -nct 8 \ -l INFO \ -R $ref \ -log $log/$plate.$prefix.HaplotypeCaller.log \ -T HaplotypeCaller \ -I $bam/$prefix.realign.bam \ --emitRefConfidence GVCF \ -variant_index_type LINEAR \ -variant_index_parameter 128000 \ -o $gvcf/$prefix.GATK.gvcf.vcf

Most of the samples completed and the output looks good, but for some I only have a truncated gvcf file with no index. When I look at the log it looks like this:

INFO  17:25:15,289 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:25:15,291 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21
INFO  17:25:15,291 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  17:25:15,291 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  17:25:15,294 HelpFormatter - Program Args: -nct 8 -l INFO -R /home/owens/ref/Gasterosteus_aculeatus.BROADS1.73.dna.toplevel.fa -log /home/owens/SB/C31KCACXX05.log/C31KCACXX05.sb1Pax102L-S2013.Hap
INFO  17:25:15,296 HelpFormatter - Executing as owens@GObox on Linux 3.2.0-63-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_17-b02.
INFO  17:25:15,296 HelpFormatter - Date/Time: 2014/06/10 17:25:15
INFO  17:25:15,296 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:25:15,296 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:25:15,722 GenomeAnalysisEngine - Strictness is SILENT
INFO  17:25:15,892 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250
INFO  17:25:15,898 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  17:25:15,942 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04
INFO  17:25:15,948 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO  17:25:15,993 MicroScheduler - Running the GATK in parallel mode with 8 total threads, 8 CPU thread(s) for each of 1 data thread(s), of 12 processors available on this machine  
INFO  17:25:16,097 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO  17:25:16,114 GenomeAnalysisEngine - Done preparing for traversal
INFO  17:25:16,114 ProgressMeter -        Location processed.active regions  runtime per.1M.active regions completed total.runtime remaining
INFO  17:25:16,114 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
INFO  17:25:16,116 HaplotypeCaller - All sites annotated with PLs force to true for reference-model confidence output
INFO  17:25:16,278 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO  17:25:46,116 ProgressMeter - scaffold_1722:1180        1.49e+05   30.0 s        3.3 m      0.0%        25.6 h    25.6 h
INFO  17:26:46,117 ProgressMeter - scaffold_279:39930        1.37e+07   90.0 s        6.0 s      3.0%        50.5 m    49.0 m
INFO  17:27:16,118 ProgressMeter - scaffold_139:222911        2.89e+07  120.0 s        4.0 s      6.3%        31.7 m    29.7 m
INFO  17:27:46,119 ProgressMeter - scaffold_94:517387        3.89e+07    2.5 m        3.0 s      8.5%        29.2 m    26.7 m
INFO  17:28:16,121 ProgressMeter - scaffold_80:591236        4.06e+07    3.0 m        4.0 s      8.9%        33.6 m    30.6 m
INFO  17:28:46,123 ProgressMeter - groupXXI:447665        6.07e+07    3.5 m        3.0 s     13.3%        26.4 m    22.9 m
INFO  17:29:16,395 ProgressMeter -  groupV:8824013        7.25e+07    4.0 m        3.0 s     17.6%        22.7 m    18.7 m
INFO  17:29:46,396 ProgressMeter - groupXIV:11551262        9.93e+07    4.5 m        2.0 s     24.0%        18.7 m    14.2 m
WARN  17:29:52,732 ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at groupX:1516679 has 8 alternate alleles so only the top alleles
INFO  17:30:19,324 ProgressMeter - groupX:14278234        1.15e+08    5.1 m        2.0 s     27.9%        18.1 m    13.0 m
INFO  17:30:49,414 ProgressMeter - groupXVIII:5967453        1.46e+08    5.6 m        2.0 s     33.0%        16.8 m    11.3 m
INFO  17:31:19,821 ProgressMeter - groupXI:15030145        1.63e+08    6.1 m        2.0 s     38.5%        15.7 m     9.7 m
INFO  17:31:50,192 ProgressMeter - groupVI:5779653        1.96e+08    6.6 m        2.0 s     43.8%        15.0 m     8.4 m
INFO  17:32:20,334 ProgressMeter - groupXVI:18115788        2.13e+08    7.1 m        1.0 s     50.1%        14.1 m     7.0 m
INFO  17:32:50,335 ProgressMeter - groupVIII:4300439        2.50e+08    7.6 m        1.0 s     55.1%        13.7 m     6.2 m
INFO  17:33:30,336 ProgressMeter - groupXIII:2378126        2.89e+08    8.2 m        1.0 s     63.1%        13.0 m     4.8 m
INFO  17:34:02,099 GATKRunReport - Uploaded run statistics report to AWS S3

It seems like it got half way through and stopped. I think it's a memory issue because when I increased the available ram to java, the problem happens less, although I can't figure out why some samples work and others don't (there isn't anything else running on the machine using ram and the biggest bam files aren't failing). It's also strange to me that there doesn't seem to be an error message. Any insight into why this is happening and how to avoid it would be appreciated.

I was wondering if there is a nice way to apply multiple processing steps to each variant (or a group of variants) as they are read so that the variant file is not read again and again. My understanding is that even if I use Queue, each script would read the vcf again. Is that correct?

I became this error and wonder what could be the reason for it:

ERROR stack trace

java.lang.NullPointerException at org.broadinstitute.sting.utils.recalibration.covariates.RepeatCovariate.keyForRepeat(RepeatCovariate.java:225) at org.broadinstitute.sting.utils.recalibration.covariates.RepeatCovariate.recordValues(RepeatCovariate.java:100) at org.broadinstitute.sting.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:921) at org.broadinstitute.sting.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:901) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:263) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:132) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:228) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:216) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version nightly-2014-01-31-gc6765ad):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: Code exception (see stack trace for error itself)

Hi (once more) I am attempting to run Queue with a scala script and scheduling it with jobrunner. The script works nicely, but when I run it with jobRunner I get the error

"Exception in thread "main" java.lang.UnsatisfiedLinkError: Unable to load library 'drmaa':libdrmaa.so: cannot open shared object file: No such file or directory."

When I try to pass the location of the libdrmaa.so file (-Djava.library.path=/opt/sge625/sge/lib/lx24-amd64/) the result is the same.

How would I point jobRunner to the correct path for the Drmaa.so library?

Which JRE version is needed for GATK 2.8? Which version of GATK will work with JRE 1.6.0_45?

Thank you, Maya

Hi All We are running into some random weirdness when running jobs using SGE, GATK version 2.7-2-g6bda569, pretty much all GATK tools - but mostly IndelRealigner abd UnifiedGenotyper, we often get the following error:-

ERROR MESSAGE: Couldn't read file /scratch/project/pipelines/novorecal.bam because java.io.FileNotFoundException: /scratch/project/pipelines/novorecal.bam (No such file or directory)

This also happens for supplied reference genomes and vcf files. The GATK tool cant find them.

These "missing" files do exist, and have often even been created by the previous tool/step in the pipeline.

When we re-run the pipeline on a failed sample, it works. So we end up having to re-run our pipeline on the same set of samples multiple times and are beginning to find this very frustrating. These errors seem to be random, I cant find any pattern, and as I mentioned, when we re-run the pipeline on a failed run, it work without a hitch.

Has anyone experienced this? And if so, any recommendations?

Please help


I am working on a Queue script that uses the selectVariants walker. Two of the arguments that I am trying to use both use an enumerated type: restrictAllelesTo and selectTypeToInclude. I have tried passing these as strings however I get java type mismatch errors. What is the simplest way to pass these parameters to the selectVariant walker in the qscript?

Hi team,
I have Java 1.6 installed in my system. I know that GATK works now with Java 1.7, but I work in a shared system and I can not change the default java version so I downloaded Java 1.7 and I work asigning the java version at the call:

If I run:
/jre1.7.0/bin/java -Djava.io.tmpdir=tmp -jar Queue.jar --help
works fine, but if I try:
/jre1.7.0/bin/java -Djava.io.tmpdir=tmp -jar Queue.jar -S myScala.file ....
I get the following error:

DEBUG 13:42:50,953 FunctionEdge - Starting: /Qscripts > 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/Qscripts/tmp' '-cp' 'Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'HaplotypeCaller' '-I' '/resources/exampleBAM.bam' '-R' '/resources/exampleFASTA.fasta' '-nct' '1' '-o' '/raw1.vcf' '-hets' '0.005'
INFO 13:42:50,954 FunctionEdge - Output written to /raw1.vcf.out DEBUG 13:42:50,954 IOUtils - Deleted /raw1.vcf.out Exception in thread "main" java.lang.UnsupportedClassVersionError: org/broadinstitute/sting/gatk/CommandLineGATK : Unsupported major.minor version 51.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: org.broadinstitute.sting.gatk.CommandLineGATK. Program will exit.

Seems like if HaplotypeCaller call java again and use the default java in the system, that is Java 1.6

Can I change the "java" version parameter?

Thanks in advance,

I'm getting repeated errors relating to java, when using GATK in a shell script on my university cluster. The message is 'NoClassDefFoundError'. It has worked previously using GATK version 2.1.9 (on a different node). My memory allocation for GATK on the cluster is set to 16GB. As I'm using GATK as part of a shell script which invokes java, then the 'java', '-2xmg' and '.jar' parts of the command are not required (at least they weren't previously). Attempting to invoke GATK from the command line in cluster using -bash: /cm/shared/apps/GenomeAnalysisTK/2.1.9/GenomeAnalysisTK.jar ...returns :Permission denied

The first question is 'am I coding it correctly?', the second is 'could GATK not be happy with java?' and the third is 'do you have any tips to resolve this?'

I have read the several of the answers relating to java-GATK problems but am none the wiser. Any help would be most appreciated.



The shell script commands are:

#$ -N ReadMapper_H001
#$ -S /bin/sh 
    # Reference file IO (.e and .o files) to the current working directory 
#$ -cwd 
#Merge the standard out and standard error to one file 
#$ -j y 
#Use the bioinf.q 
#$ -q bioinf.q 
. /etc/profile.d/modules.sh 

# Load modules 
module load bio/1.15 gatk/2.4.9 jdk/1.6.0_24 picard-tools/1.77 R/2.14.0

# All the other stuff (scythe, sickle, bwa, picard etc for cleaning, read mapping etc)

# Identify indels from reference genome
GenomeAnalysisTK \
    -I /mnt/lustre/scratch/bioenv/wg39/tmp/lhm_read_mapping/H001/H001_CRSN.bam \
    -R /home/w/wg/wg39/accessory_data_130613/Dmel_ref_seqs_051112/dmel-all-chromosome-r5.9.fasta \
    -T RealignerTargetCreator \
    -o /mnt/lustre/scratch/bioenv/wg39/tmp/lhm_read_mapping/H001/H001_realignment.intervals \

The error message is:

Exception in thread "main" java.lang.NoClassDefFoundError: java
Caused by: java.lang.ClassNotFoundException: java
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
Could not find the main class: java.  Program will exit.

While I have been able to work around this issue, I am still curious as to what is causing it in the first place.

I am currently engaged in variant calling on a rather large set of exome sequenced samples (prepared according to GATK best practices 4), but I have run into some issues with UnifiedGenotyper.

Put simply, I am able to run any of our input intervals through UnfiedGenotyper with a single data thread, and either 1 or 24 CPU threads with no trouble. Upon looking at the actual memory usage of the process while running, I decided to try increasing the number of data threads as it appeared that I could easily run 4 threads in the heap I have available. However, regardless of the thread or memory settings, when I raise the number of data threads beyond 1, once the initialization phase has completed, I immediately get a Java core dump. The GATK invocation I use is as follows:

java -Djava.io.tmpdir=$TMP -Xmx48G -jar $GATK -T UnifiedGenotyper \
            -R $REF \
            --dbsnp $ROD \
            ${GATKArg} \
            -o ${TMP}/${BASENAME}/${BASENAME}.${INTERVAL}.raw.variants.vcf \
            -L ${INTERVAL} \
                            -nt 4 \
            -nct 8 \
            -glm BOTH \
            -A DepthOfCoverage \
            -A AlleleBalance \
            -A HaplotypeScore \
            -A HomopolymerRun \
            -A MappingQualityZero \
            -A QualByDepth \
            -A RMSMappingQuality \
            -A SpanningDeletions \
            -A MappingQualityRankSumTest \
            -A ReadPosRankSumTest \
            -A FisherStrand \
            -A InbreedingCoeff

For reference, this is GATK version 2.3-9, JRE 1.6.

The error I receive is shown below. Any insight into this issue would be greatly appreciated.

Sincerely, Jason Kost

# A fatal error has been detected by the Java Runtime Environment:
#  Internal Error (exceptions.cpp:364), pid=14830, tid=1113614656
#  Error: ExceptionMark destructor expects no pending exceptions
# JRE version: 6.0_16-b01
# Java VM: Java HotSpot(TM) 64-Bit Server VM (14.2-b01 mixed mode linux-amd64 )
# Can not save log file, dump to screen..
# A fatal error has been detected by the Java Runtime Environment:
#  Internal Error (exceptions.cpp:364), pid=14830, tid=1113614656
#  Error: ExceptionMark destructor expects no pending exceptions
# JRE version: 6.0_16-b01
# Java VM: Java HotSpot(TM) 64-Bit Server VM (14.2-b01 mixed mode linux-amd64 )
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp

---------------  T H R E A D  ---------------

Current thread (0x00000000490c3800):  JavaThread "S3Put-Thread" daemon [_thread_in_vm, id=14864, stack(0x0000000042506000,0x0000000042607000)]

Stack: [0x0000000042506000,0x0000000042607000],  sp=0x0000000042602bd0,  free space=1010k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x6bd1ef]
V  [libjvm.so+0x2be556]
V  [libjvm.so+0x3082dd]
V  [libjvm.so+0x2592d0]
V  [libjvm.so+0x258922]
V  [libjvm.so+0x2589a6]
V  [libjvm.so+0x25a37e]
V  [libjvm.so+0x650eab]
V  [libjvm.so+0x64f222]
V  [libjvm.so+0x64e101]
V  [libjvm.so+0x64dd63]
V  [libjvm.so+0x42c8e9]
V  [libjvm.so+0x43310e]
V  [libjvm.so+0x411d68]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  java.lang.ClassLoader.findBootstrapClass(Ljava/lang/String;)Ljava/lang/Class;+0
j  java.lang.ClassLoader.findBootstrapClass0(Ljava/lang/String;)Ljava/lang/Class;+23
j  java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+32
j  java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+23
j  sun.misc.Launcher$AppClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+41
j  java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3
j  java.lang.ClassLoader.loadClassInternal(Ljava/lang/String;)Ljava/lang/Class;+2
v  ~StubRoutines::call_stub
j  org.broadinstitute.sting.gatk.phonehome.GATKRunReport$S3PutRunnable.run()V+6
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub

---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )
=>0x00000000490c3800 JavaThread "S3Put-Thread" daemon [_thread_in_vm, id=14864, stack(0x0000000042506000,0x0000000042607000)]
  0x0000000049207000 JavaThread "Thread-3" [_thread_in_native, id=14863, stack(0x0000000042405000,0x0000000042506000)]
  0x0000000049384800 JavaThread "Thread-2" [_thread_blocked, id=14862, stack(0x0000000042304000,0x0000000042405000)]
  0x00000000493fe000 JavaThread "ProgressMeterDaemon" daemon [_thread_blocked, id=14858, stack(0x0000000042203000,0x0000000042304000)]
  0x00002ab6c0026000 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=14856, stack(0x0000000042001000,0x0000000042102000)]
  0x00002ab6c0023000 JavaThread "CompilerThread1" daemon [_thread_blocked, id=14855, stack(0x0000000041bd6000,0x0000000041cd7000)]
  0x00002ab6c0020800 JavaThread "CompilerThread0" daemon [_thread_blocked, id=14854, stack(0x0000000041ad5000,0x0000000041bd6000)]
  0x00002ab6c001e800 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=14853, stack(0x00000000419d4000,0x0000000041ad5000)]
  0x00000000484b3000 JavaThread "Finalizer" daemon [_thread_blocked, id=14852, stack(0x0000000041f00000,0x0000000042001000)]
  0x00000000484ab800 JavaThread "Reference Handler" daemon [_thread_blocked, id=14851, stack(0x0000000041dff000,0x0000000041f00000)]
  0x00000000481a8800 JavaThread "main" [_thread_blocked, id=14831, stack(0x0000000040450000,0x0000000040551000)]

Other Threads:
  0x00000000484a6800 VMThread [stack: 0x0000000041cfe000,0x0000000041dff000] [id=14850]
  0x00002ab6c0029000 WatcherThread [stack: 0x0000000042102000,0x0000000042203000] [id=14857]

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

 PSYoungGen      total 63360K, used 3359K [0x00002ab2b3440000, 0x00002ab2ba990000, 0x00002ab6b3440000)
  eden space 16384K, 20% used [0x00002ab2b3440000,0x00002ab2b3787fb8,0x00002ab2b4440000)
  from space 46976K, 0% used [0x00002ab2b7550000,0x00002ab2b7550000,0x00002ab2ba330000)
  to   space 50240K, 0% used [0x00002ab2b4440000,0x00002ab2b4440000,0x00002ab2b7550000)
 PSOldGen        total 353536K, used 271234K [0x00002aaab3440000, 0x00002aaac8d80000, 0x00002ab2b3440000)
  object space 353536K, 76% used [0x00002aaab3440000,0x00002aaac3d20bd8,0x00002aaac8d80000)
 PSPermGen       total 21504K, used 16865K [0x00002aaaae040000, 0x00002aaaaf540000, 0x00002aaab3440000)
  object space 21504K, 78% used [0x00002aaaae040000,0x00002aaaaf0b8538,0x00002aaaaf540000)

Dynamic libraries:
Can not get library information for pid = 14864

VM Arguments:
jvm_args: -Djava.io.tmpdir=/home/kostj/scratch/temp/ -Xmx48G 
java_command: /home/kostj/nearline/gatk2/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /home/kostj/nearline/hg19/hg19.fasta --dbsnp /home/kostj/nearline/hg19/gatk2/dbsnp_137.hg19.vcf -I /home/kostj/scratch/temp//SPr0059/SPr0059.reduced.bam -I /home/kostj/scratch/temp//SP3608/SP3608.reduced.bam -I /home/kostj/scratch/temp//SP3528/SP3528.reduced.bam -I /home/kostj/scratch/temp//SP3508/SP3508.reduced.bam -I /home/kostj/scratch/temp//SP3474/SP3474.reduced.bam -I /home/kostj/scratch/temp//SP3277/SP3277.reduced.bam -I /home/kostj/scratch/temp//SP3255/SP3255.reduced.bam -I /home/kostj/scratch/temp//SP3148/SP3148.reduced.bam -I /home/kostj/scratch/temp//sp3127/sp3127.reduced.bam -I /home/kostj/scratch/temp//SP3070/SP3070.reduced.bam -I /home/kostj/scratch/temp//sp3035/sp3035.reduced.bam -I /home/kostj/scratch/temp//SP0618/SP0618.reduced.bam -I /home/kostj/scratch/temp//SNc0063/SNc0063.reduced.bam -I /home/kostj/scratch/temp//SNc0036/SNc0036.reduced.bam -I /home/kostj/scratch/temp//SMa0078/SMa0078.reduced.bam -I /home/kostj/scratch/temp//SMa0020/SMa0020.reduced.bam -I /home/kostj/scratch/temp//SLA969/SLA969.reduced.bam -I /home/kostj/scratch/temp//SLA966/SLA966.reduced.bam -I /home/kostj/scratch/temp//SLA956/SLA956.reduced.bam -I /home/kostj/scratch/temp//SLA929/SLA929.reduced.bam -I /home/kostj/scratch/temp//SLA922/SLA922.reduced.bam -I /home/kostj/scratch/temp//SLA885/SLA885.reduced.bam -I /home/kostj/scratch/temp//SLA877/SLA877.reduced.bam -I /home/kostj/scratch/temp//SLA863/SLA863.reduced.bam -I /home/kostj/scratch/temp//SLA861/SLA861.reduced.bam -I /home/kostj/scratch/temp//SLA833/SLA833.reduced.bam -I /home/kostj/scratch/temp//SLA809/SLA809.reduced.bam -I /home/kostj/scratch/temp//SLA808/SLA808.reduced.bam -I /home/kostj/scratch/temp//SLA807/SLA807.reduced.bam -I /home/kostj/scratch/temp//SLA798/SLA798.reduced.bam -I /home/kostj/scratch/temp//SLA791/SLA791.reduced.bam -I /home/kostj/scratch/temp//SLA783/SLA783.reduced.bam -I /home/kostj/scratch/temp//SLA767/SLA767.redu
Launcher Type: SUN_STANDARD

Environment Variables:

Signal Handlers:
SIGSEGV: [libjvm.so+0x6bddc0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGBUS: [libjvm.so+0x6bddc0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGFPE: [libjvm.so+0x594f90], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGPIPE: [libjvm.so+0x594f90], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGXFSZ: [libjvm.so+0x594f90], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGILL: [libjvm.so+0x594f90], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGUSR1: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGUSR2: [libjvm.so+0x597750], sa_mask[0]=0x00000000, sa_flags=0x10000004
SIGHUP: [libjvm.so+0x5974a0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGINT: [libjvm.so+0x5974a0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGTERM: [libjvm.so+0x5974a0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGQUIT: [libjvm.so+0x5974a0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004

---------------  S Y S T E M  ---------------

uname:Linux 2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64
libc:glibc 2.5 NPTL 2.5 
rlimit: STACK infinity, CORE infinity, NPROC 399360, NOFILE 1024, AS infinity
load average:0.00 0.00 0.00

CPU:total 24 (16 cores per cpu, 2 threads per core) family 6 model 44 stepping 2, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, ht

Memory: 4k page, physical 49451764k(23657140k free), swap 16779884k(16697572k free)

vm_info: Java HotSpot(TM) 64-Bit Server VM (14.2-b01) for linux-amd64 JRE (1.6.0_16-b01), built on Jul 31 2009 05:52:33 by "java_re" with gcc 3.2.2 (SuSE Linux)

time: Mon Mar 11 16:00:00 2013
elapsed time: 923 seconds

# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
/sge/default/spool/pem610-040/job_scripts/8474173: line 83: 14830 Aborted                 (core dumped) java -Djava.io.tmpdir=$TMP -Xmx48G -jar $GATK -T UnifiedGenotyper -R $REF --dbsnp $ROD ${GATKArg} -o ${TMP}/${BASENAME}/${BASENAME}.${INTERVAL}.raw.variants.vcf -L ${INTERVAL} -nt 2 -nct 12 -glm BOTH -A DepthOfCoverage -A AlleleBalance -A HaplotypeScore -A HomopolymerRun -A MappingQualityZero -A QualByDepth -A RMSMappingQuality -A SpanningDeletions -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A InbreedingCoeff

I am facing a "fatal error by java runtime enviormnet" after using GATK DataProcessingPipeline, my java version is

> java version "1.6.0_35"
> Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
> Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)

and I am using GATK 2.3-6-gebbba25 The pipeline spit out the following error,

> lelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/medpop/mpg-psrl/Parabase/tmp'  '-cp' '/medpop/mpg-psrl/Parabase/Tools/Queue-2.3-5-g49ed93c/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'RealignerTargetCreator'  '-I' '/medpop/mpg-psrl/Parabase/MyRuns/EX_c1004CARa_1ln_hg19._new.1.realigned.rg.bam'  '-R' '/medpop/mpg-psrl/Parabase/hg19/ucsc.hg19.fasta'  '-o' '/medpop/mpg-psrl/Parabase/MyRuns/EX_c1004CARa_1ln_hg19.EX_c1004CARa_1ln_hg19.intervals'  '-known' '/medpop/mpg-psrl/Parabase/hg19/dbsnp_137.hg19.vcf'  '-mismatch' '0.0'  
> INFO  11:32:30,896 FunctionEdge - Output written to /medpop/mpg-psrl/Parabase/MyRuns/EX_c1004CARa_1ln_hg19.EX_c1004CARa_1ln_hg19.intervals.out 
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00002ae7a5a50380, pid=562, tid=47174397446800
> #
> # JRE version: 6.0_35-b10
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.10-b01 mixed mode linux-amd64 compressed oops)
> # Problematic frame:
> # V  [libjvm.so+0x712380]  SR_handler(int, siginfo*, ucontext*)+0x30
> #
> # An error report file with more information is saved as:
> # /medpop/mpg-psrl/Parabase/MyRuns/hs_err_pid562.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #

the log file is lengthy; but If you like to have a look I can paste here it later on, thanks for your support,

During running of the depthOfCoverage tool, I get the error: /tmp/RsQHCt1W: No space left on device

I have tried changing the TMPDIR environment variable (and exporting) but eventually I get the same error. Is there a way to change the temporary directory that GATK uses?

I'm running GATK v2.1-8-g5efb575 on a Linux system.

Thanks, Rick