Tagged with #commandlinegatk
2 documentation articles | 1 announcement | 24 forum discussions

Created 2014-10-02 17:29:07 | Updated 2014-10-02 18:22:46 | Tags: commandlinegatk commandline argument syntax
Comments (0)


This document describes how GATK commands are structured and how to add arguments to basic command examples.

Basic java syntax

Commands for GATK always follow the same basic syntax:

java [Java arguments] -jar GenomeAnalysisTK.jar [GATK arguments]

The core of the command is java -jar GenomeAnalysisTK.jar, which starts up the GATK program in a Java Virtual Machine (JVM). Any additional java-specific arguments (such as -Xmx to increase memory allocation) should be inserted between java and -jar, like this:

java -Xmx4G -jar GenomeAnalysisTK.jar [GATK arguments]

The order of arguments between java and -jar is not important.

GATK arguments

There are two universal arguments that are required for every GATK command (with very few exceptions, the clp-type utilities), -R for Reference (e.g. -R human_b37.fasta) and -T for Tool name (e.g. -T HaplotypeCaller).

Additional arguments fall in two categories:

  • Engine arguments like -L (for specifying a list of intervals) which can be given to all tools and are technically optional but may be effectively required at certain steps for specific analytical designs (e.g. the -L argument for calling variants on exomes);

  • Tool-specific arguments which may be required, like -I (to provide an input file containing sequence reads to tools that process BAM files) or optional, like -alleles (to provide a list of known alleles for genotyping).

The ordering of GATK arguments is not important, but we recommend always passing the tool name (-T) and reference (-R) first for consistency. It is also a good idea to consistently order arguments by some kind of logic in order to make it easy to compare different commands over the course of a project. It’s up to you to choose what that logic should be.

All available engine and tool-specific arguments are listed in the tool documentation section. Arguments typically have both a long name (prefixed by --) and a short name (prefixed by -). The GATK command line parser recognizes both equally, so you can use whichever you prefer, depending on whether you prefer commands to be more verbose or more succinct.

Finally, a note about flags. Flags are arguments that have boolean values, i.e. TRUE or FALSE. They are typically used to enable or disable specific features; for example, --keep_program_records will make certain GATK tools output additional information in the BAM header that would be omitted otherwise. In GATK, all flags are set to FALSE by default, so if you want to set one to TRUE, all you need to do is add the flag name to the command. You don't need to specify an actual value.

Examples of complete GATK command lines

This is a very simple command that runs HaplotypeCaller in default mode on a single input BAM file containing sequence data and outputs a VCF file containing raw variants.

java -Xmx4G -jar GenomeAnalysisTK.jar -R human_b37.fasta -T HaplotypeCaller -I sample1.bam -o raw_variants.vcf

If the data is from exome sequencing, we should additionally provide the exome targets using the -L argument:

java -Xmx4G -jar GenomeAnalysisTK.jar -R human_b37.fasta -T HaplotypeCaller -I sample1.bam -o raw_variants.vcf -L exome_intervals.list

If we just want to genotype specific sites of interest using known alleles based on results from a previous study, we can change the HaplotypeCaller’s genotyping mode using -gt_mode, provide those alleles using -alleles, and restrict the analysis to just those sites using -L:

java -Xmx4G -jar GenomeAnalysisTK.jar -R human_b37.fasta -T HaplotypeCaller -I sample1.bam -o raw_variants.vcf -L known_alleles.vcf -alleles known_alleles.vcf -gt_mode GENOTYPE_GIVEN_ALLELES

For more examples of commands and for specific tool commands, see the tool documentation section.

Created 2012-07-23 23:55:11 | Updated 2012-07-23 23:55:11 | Tags: commandlinegatk gatkdocs
Comments (0)

A new tool has been released!

Check out the documentation at CommandLineGATK.

Created 2014-10-02 18:27:29 | Updated | Tags: commandlinegatk commandline syntax
Comments (0)

I'm not sure why it hadn't occurred to us to do this before, but we've finally done it: an FAQ article that formally explains how GATK commands are structured, what are the basic types of arguments, and how to string them all together.

We realized that command structure requirements can be confusing, if you are new to command line programs, if only because so many toolkits use fairly different ones. For example, Picard tools (which are also developed at the Broad!) have separate jar files for each tool in the toolkit, while GATK has one jar file containing all the tools. The Picard syntax for passing argument values is also different; they use = to join the argument name and value, while GATK commands just take a space.

So if that's something you need help with, check out the doc! We'd love to hear from people who are new to GATK about whether this is helpful and how we can improve it further.

Created 2015-09-14 15:21:42 | Updated | Tags: commandlinegatk selectvariants runtime-error gatk-runtime-error
Comments (3)

Hi, I'm trying to extract a few samples from a large VCF file with many samples using SelectVariants and I keep running into this error.

Command: java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T SelectVariants -R /Volumes/odin/reference/16484/mafa5/mafa5. -V 16557.all.exons.mafa5.ann.vcf.gz -o 16580.mafa5.M1M1.vcf -sn CY0320 -sn CY0321 -sn CY0322 -sn CY0323 -sn CY0324 -sn CY0325

ERROR stack trace

java.lang.IllegalArgumentException at java.nio.ByteBuffer.allocate(ByteBuffer.java:334) at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:195) at org.broadinstitute.gatk.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:329) at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.initializeReferenceSequence(LocusReferenceView.java:150) at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.(LocusReferenceView.java:126) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:90) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------

Any thoughts as to why this might be happening?

Created 2015-07-31 22:11:45 | Updated | Tags: commandlinegatk haplotypecaller gatk error
Comments (4)


I am receiving the following error. I am working with SAM files that were exported from CLC, then edited with Picard-tools to addReadGroups. I am not sure if I need to add an additional step to solve this problem, I cannot find any documentation regarding this error.

Please let me know what I need to do to correct this issue.

Thank you!

gatk -T HaplotypeCaller -R spinach_assembly-repeatdetect_PACBIO_V1.3_formated_60.fa -I .sam.list -drf DuplicateRead --alleles Unfiltered_Spinach_PacBio_Reseq_12_Geno_Assay_SNP.fixed.noblanks.vcf --genotyping_mode GENOTYPE_GIVEN_ALLELES --output_mode EMIT_ALL_SITES -o output_raw_unfiltered_spinach_snps_gbs.vcf INFO 14:48:44,450 HelpFormatter - --------------------------------------------------------------------------------- INFO 14:48:44,453 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12 INFO 14:48:44,454 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 14:48:44,454 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 14:48:44,458 HelpFormatter - Program Args: -T HaplotypeCaller -R spinach_assembly-repeatdetect_PACBIO_V1.3_formated_60.fa -I .sam.list -drf DuplicateRead --alleles Unfiltered_Spinach_PacBio_Reseq_12_Geno_Assay_SNP.fixed.noblanks.vcf --genotyping_mode GENOTYPE_GIVEN_ALLELES --output_mode EMIT_ALL_SITES -o output_raw_unfiltered_spinach_snps_gbs.vcf INFO 14:48:44,468 HelpFormatter - Executing as ahulse@jalapeno.genomecenter.ucdavis.edu on Linux 2.6.18-348.12.1.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_05-b13. INFO 14:48:44,469 HelpFormatter - Date/Time: 2015/07/31 14:48:44 INFO 14:48:44,469 HelpFormatter - --------------------------------------------------------------------------------- INFO 14:48:44,470 HelpFormatter - --------------------------------------------------------------------------------- INFO 14:48:45,102 GenomeAnalysisEngine - Strictness is SILENT INFO 14:48:45,385 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500 INFO 14:48:45,394 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 14:48:48,432 SAMDataSource$SAMReaders - Init 50 BAMs in last 3.04 s, 50 of 80 in 3.04 s / 0.05 m (16.46 tasks/s). 30 remaining with est. completion in 1.82 s / 0.03 m INFO 14:48:50,052 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 4.66 INFO 14:48:50,164 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 14:48:54,742 RMDTrackBuilder - Writing Tribble index to disk for file /local/scratch/scratch/Amanda/Spinach_GBS/Unfiltered_Spinach_PacBio_Reseq_12_Geno_Assay_SNP.fixed.noblanks.vcf.idx INFO 14:48:58,784 GenomeAnalysisEngine - Preparing for traversal over 80 BAM files INFO 14:49:00,054 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A BAM ERROR has occurred (version 3.4-46-gbc02625):
ERROR This means that there is something wrong with the BAM file(s) you provided.
ERROR The error message below tells you what is the problem.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR Please do NOT post this error to the GATK forum until you have followed these instructions:
ERROR - Make sure that your BAM file is well-formed by running Picard's validator on it
ERROR (see http://picard.sourceforge.net/command-line-overview.shtml#ValidateSamFile for details)
ERROR - Ensure that your BAM index is not corrupted: delete the current one and regenerate it with 'samtools index'
ERROR MESSAGE: Cannot retrieve file pointers within SAM text files.
ERROR ------------------------------------------------------------------------------------------

Created 2015-03-04 17:41:22 | Updated | Tags: unifiedgenotyper commandlinegatk
Comments (2)

I'm encountering a problem similar to what I experienced with a mismatch between reference and cosmic files. Specifically, I created .bam files using human_g1k_v3.fasta, which reference coordinates as 1...22,X,Y etc. When I try to run UnifiedGenotyper with this reference file, the .bam file I created, and the dbsnp_137.b37.vcf, I get an error on account of the fact that snp coordinates are listed as chr1...chr22,chrX,chrY etc. Other than writing a script to remove all occurrences of "chr" is there another way to get around this problem, i.e. a dbsnp reference file that has the desired coordinates without the "chr"?

I resolved the problem with reference/cosmic by finding a cosmic file with consistent notation, but can't find a similar fix for this one. I'd appreciate any suggestions.

Created 2015-02-24 23:21:25 | Updated | Tags: commandlinegatk intervals
Comments (3)

Hi, I hope this is a quick question; but does using the 'include intervals' command line option '-L' only include the region specified?

For instance if I have an file that includes reads for chromosomes 1,2,6,X,and Y and I specifiy "-L 6", will the walker only process chromosome 6, or will it include the rest of my data as well?

Thank you for the clarification!

Created 2014-12-31 10:00:00 | Updated | Tags: commandlinegatk haplotypecaller-and-gpu
Comments (3)

Hi everyone,

I'm using GATK Haplotype Caller and recently I read a document about optimization and GATK evoking GPU, IBM Power8, etc... Conclusions are that GATK could run faster than the actual implementation. In this document you suggest a C or C++ development in parallel of the actual Java implementation. And I wonder if you have made any progress so far and if you have planned a release? I'm also interested to know about the technology you're considering for GATK HC : C, C++, MPI, OpenMP, GPU...?



Created 2014-11-13 20:22:25 | Updated 2014-11-13 20:27:19 | Tags: commandlinegatk depthofcoverage
Comments (2)


I have used the following commands using DepthOfCoverage tool with two different bed files:

  java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R ucsc_hg19.fa -I WT_recalibrated.bam -L coverage_summary.bed -ct 1 -ct 10 -ct 20 -ct 30 -ct 50 -ct 100 -o WT_cov

The line count for the input and output:

$wc -l WT_cov.sample_interval_summary
4988 WT_cov.sample_interval_summary
$ wc -l coverage_summary.bed 
10585 coverage_summary.bed

In the other case:

  java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R ucsc_hg19.fa -I WT_recalibrated.bam -L exon.bed -ct 1 -ct 10 -ct 20 -ct 30 -ct 50 -ct 100 -o WT_exon

Line count for the input and output:

 $ wc -l WT_exon.sample_interval_summary 
 5065 WT_exon.sample_interval_summary
 $ wc -l exon.bed 
 5065 exon.bed

The input in both the cases is of the standard format as shown below:

 chr1    6529578 6529755  
 chr1    6530273 6530442
 chr1    6530543 6530721
 chr1    6530773 6530980 
 chr1    6531028 6531730
 chr1    6531768 6531914
 chr1    6532563 6532713
 chr1    6533023 6533273

Could anyone help to interpret the discrepancy between number of target regions in bed file and _interval_summary file in the above two cases?

Created 2014-10-29 10:11:15 | Updated | Tags: commandlinegatk variantfiltration filterexpression
Comments (2)

Hi all, I tried to apply the following command to my raw vcf file to filter it with the command: java -Xmx30g -jar ../GATK/GenomeAnalysisTK.jar -R ../ref.fa -T VariantFiltration --filterExpression " QD < 20.0 || ReadPosRankSum < -8.0 || FS > 10.0 || QUAL < $MEANQUAL || MQ <30.0 || DP< 10.0 " --filterName LowQualFilter --missingValuesInExpressionsShouldEvaluateAsFailing --variant ../s1.raw.vcf --logging_level ERROR -o ../s1.makered.raw.vcf

grep -v "Filter" s1.makered.raw.vcf >s1.flt.vcf

After that, I checked the result file s1.flt.vcf and found the following makered "PASS" .Obviously, the command doesn't work as ‘DP=8“ should be makered "LowQualFiter".

Chr01 231575 . A G 241.78 PASS AC=2;AF=1.00;AN=2;DP=8;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=29.00;MQ0=0;QD=30.22 GT:AD:DP:GQ:PL 1/1:0,8:8:24:270,24,0 Chr01 237476 . T C 238.78 PASS AC=2;AF=1.00;AN=2;DP=8;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=29.00;MQ0=0;QD=29.85 GT:AD:DP:GQ:PL 1/1:0,8:8:24:267,24,0

There is no error reported.Any suggestion will be appreciated.

Created 2014-09-15 08:35:02 | Updated | Tags: commandlinegatk haplotypecaller runtime-error
Comments (1)


I'm trying to call variants on WGS data using the following command (on a high-performance cluster using 4 cores per job) :

java -Xmx6G -jar $CLASSPATH -T HaplotypeCaller --dbsnp GRCh37-lite.vcf -nct 4 -R GRCh37-lite.fa -I /user/data/gent/gvo000/gvo00027/vsc40035/StJude/001/001_D.bam -maxAltAlleles 10 --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o 001_D.vcf

$CLASSPATH contains the location of the .jar file.

For 20 out of 70 samples the script ended without a problem, for the other 50 samples the same error messages was returned, given below. Can you tell what's going wrong?

Kind regards, Steve

INFO 07:28:05,514 ProgressMeter - 2:92269030 2.49e+08 3.1 h 44.0 s 11.0% 27.9 h 24.8 h INFO 07:29:05,515 ProgressMeter - 2:92309304 2.49e+08 3.1 h 44.0 s 11.0% 28.1 h 25.0 h INFO 07:30:05,516 ProgressMeter - 2:92320228 2.49e+08 3.1 h 44.0 s 11.0% 28.2 h 25.1 h INFO 07:30:06,052 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.NullPointerException at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:443) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:417) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.calculateGLsForThisEvent(GenotypingEngine.java:385) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:222) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:880) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------

Created 2014-09-09 11:11:57 | Updated 2014-09-09 17:13:58 | Tags: combinevariants commandlinegatk
Comments (19)


I have used CombineVariants to combine variants from GATK and samtools as shown below:

java -jar GenomeAnalysisTK.jar -T CombineVariants -R ref.fa --variant:GatkSNP GATKsnp.vcf --variant:GatkINDEL GATKind.vcf --variant:SamSNP Samsnp.vcf --variant:SamINDEL Samind.vcf -o allvar.vcf -genotypeMergeOptions PRIORITIZE -priority GatkSNP,GatkINDEL,SamSNP,SamINDEL --filteredrecordsmergetype KEEP_UNCONDITIONAL

This merges all the variants. However, with the above command, i do get the variants present in both GATK and samtools emitted from samtools.

I would like to get all the variants such that:

  • variants present in both GATK and samtools emitted from GATK vcf files
  • variants in only GATK
  • variants in only samtools

could someone suggest any ideas or of there is something to be fixed in the command.


Created 2014-08-09 01:34:03 | Updated | Tags: indelrealigner commandlinegatk
Comments (3)

Dear GATK help team,

I have a cut chromosome file (cur 17) in which I have processed through sorting, alignment, adding headers, and even through the realignertargetcreator. Yet, when I would like to call my indels from the Indel realigned. I have received an error. ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The contig index 0 is bad, doesn't equal the contig index 17 of the contig from a string chr17

I have cut these chromosomes and processed the chr17 first since that is my region of interest, and did it since I thought it might save memory issues.

I am currently a newbie, have check the forum for help, yet only found one similar post with no solution. Please help-- stuck at this stage. My code for the index realigned is the following: java -jar /Users/yotsukurasohiya/build/softwares/GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar -T IndelRealigner -R /Volumes/Pegasus/broadref/ucsc.hg19.fasta -I /Volumes/Pegasus/tmp/mardup.pregatk.bam -targetIntervals 2_target_intervals.list -known /Volumes/Pegasus/broadref/Mills_and_1000G_gold_standard.indels.hg19.vcf -known /Volumes/Pegasus/broadref/dbsnp_138.hg19.vcf -known /Volumes/Pegasus/broadref/1000G_phase1.snps.high_confidence.hg19.vcf -o 2_realigned_reads.bam

The heads that I have added are through picard softwares addorreplacereadgroups. SO=coordinate CREATE_INDEX=true SM=temp PL=Illumina PU=barcode LB=bar ID=id

Created 2014-04-11 04:25:31 | Updated | Tags: commandlinegatk catvariants
Comments (2)

Using GATK on command-line the CatVariants command fails.

Program version: GATK 3.1-1-g07a4bf8.

ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: CatVariants

Code to invoke:

java -jar GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T CatVariant -R file.fasta

Note that in the current documentation for CatVariants the example lists the name as 'org.broadinstitute.sting.tools.CatVariants' rather than just CatVariants. Trying the listed string fails with the same error.

Created 2013-10-30 17:03:49 | Updated | Tags: commandlinegatk catvariants
Comments (1)

Below is the command:

java -cp $CLASSPATH/GenomeAnalysisTK.jar org.broadinstitute.sting.tools.CatVariants \
-R GATK_ref/hg19.fasta \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-1.vcf \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-2.vcf \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-3.vcf \
-out ../GATK/VQSR/parallel_batch/combined_raw.snps_indels.vcf \
-log ../GATK/VQSR/parallel_batch/log/combined.log \

After this, the combined_raw.snps_indels.vcf file only contains the header from raw.snps_indels-1.vcf, what might be wrong?

Created 2013-10-24 17:59:58 | Updated | Tags: commandlinegatk
Comments (9)

I'm running the latest GATK nightly build to process human exome-seq data (has 12 samples). It seemed be faster than the older version until I run the HaplotypeCaller. The run summary shows it will take 14 days to finish. I am wondering if there's anything in my below command: How to make it faster without losing data in the output?

java -Xmx10g -Djava.io.tmpdir=/temp/GATK_temp
-jar $CLASSPATH/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R ../GATK_ref/hg19.fasta \
-I ./compressedbam.list \
-L ../GATK_ref/hg19knownGene_UCSC_sorted.bed \
-log ../GATK/VQSR/log/HaplotypeCaller_20131018.log \
-o ../GATK/VQSR/raw.snps_indels.vcf

Created 2013-07-31 16:25:00 | Updated | Tags: commandlinegatk
Comments (2)

If I put my input files as a list in the file named "input.list", how do I set the output names? or do I just need to set the output folder and the output file names will be automatically named?

Created 2013-07-10 22:31:37 | Updated 2013-07-10 22:34:34 | Tags: realignertargetcreator commandlinegatk
Comments (5)

I started with BWA-MEM to do alignment, used Picard to process the .SAM files (converted to bam, reorder, addorreplacegroup, etc). The GATK version I'm using is version 2.5-2-gf57256b, I cannot run 2.6 because the server only has Java 6 and I cannot upgrade it to Java 7.

I got a huge stack of error message when I run this command line (RealignerTargetCrator):

java -Xmx2g -jar $CLASSPATH/GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R /Volumes/files/Users/user1/GATK_ref/hg19.fasta \ -I sorted_Deduped_reorder_grp.bam \ -o ./GATK/forIndelRealigner.intervals>

The error messages are these (sorry, a lot): I don't know why GATK needs to connect to window server? what permission problem? I am using a Mac OS X built server (remote). Thank you

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.InternalError: Can't connect to window server - not enough permissions. at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1827) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1724) at java.lang.Runtime.loadLibrary0(Runtime.java:823) at java.lang.System.loadLibrary(System.java:1045) at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:50) at java.security.AccessController.doPrivileged(Native Method) at java.awt.Toolkit.loadLibraries(Toolkit.java:1605) at java.awt.Toolkit.(Toolkit.java:1627) at sun.awt.AppContext$2.run(AppContext.java:240) at sun.awt.AppContext$2.run(AppContext.java:226) at java.security.AccessController.doPrivileged(Native Method) at sun.awt.AppContext.initMainAppContext(AppContext.java:226) at sun.awt.AppContext.access$200(AppContext.java:112) at sun.awt.AppContext$3.run(AppContext.java:306) at java.security.AccessController.doPrivileged(Native Method) at sun.awt.AppContext.getAppContext(AppContext.java:287) at com.sun.jmx.trace.Trace.out(Trace.java:180) at com.sun.jmx.trace.Trace.isSelected(Trace.java:88) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.isTraceOn(DefaultMBeanServerInterceptor.java:1830) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:929) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:916) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312) at com.sun.jmx.mbeanserver.JmxMBeanServer$2.run(JmxMBeanServer.java:1195) at java.security.AccessController.doPrivileged(Native Method) at com.sun.jmx.mbeanserver.JmxMBeanServer.initialize(JmxMBeanServer.java:1193) at com.sun.jmx.mbeanserver.JmxMBeanServer.(JmxMBeanServer.java:225) at com.sun.jmx.mbeanserver.JmxMBeanServer.(JmxMBeanServer.java:170) at com.sun.jmx.mbeanserver.JmxMBeanServer.newMBeanServer(JmxMBeanServer.java:1401) at javax.management.MBeanServerBuilder.newMBeanServer(MBeanServerBuilder.java:93) at javax.management.MBeanServerFactory.newMBeanServer(MBeanServerFactory.java:311) at javax.management.MBeanServerFactory.createMBeanServer(MBeanServerFactory.java:214) at javax.management.MBeanServerFactory.createMBeanServer(MBeanServerFactory.java:175) at sun.management.ManagementFactory.createPlatformMBeanServer(ManagementFactory.java:302) at java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:504) at org.broadinstitute.sting.gatk.executive.MicroScheduler.(MicroScheduler.java:222) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.(LinearMicroScheduler.java:70) at org.broadinstitute.sting.gatk.executive.MicroScheduler.create(MicroScheduler.java:169) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.createMicroscheduler(GenomeAnalysisEngine.java:443) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:272) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.5-2-gf57256b):

Created 2013-06-21 07:05:03 | Updated | Tags: unifiedgenotyper commandlinegatk
Comments (6)

Dear GATK Users,

Could anybody tell me how to identify the deletions from the bam file using GATK module?? Actually i used UnifiedGenotyper i am getting list like


gi|262 48155 . G A 80.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.103;DP=10;Dels=0.00;FS=0.000;HaplotypeScore =0.0000;MLEAC=1;MLEAF=0.500;MQ=28.61;MQ0=0;MQRankSum=-1.453;QD=8.08;ReadPosRankSum=-0.336 GT:AD:DP:GQ:PL 0/1:5,5:10:99:109,0,146

Thanks Sridhar

Created 2013-06-12 16:09:40 | Updated | Tags: commandlinegatk workflow rnaseq
Comments (15)

Hi all: I find that among all the work flows of GATK http://www.broadinstitute.org/gatk/guide/topic?name=methods-and-workflows there are no workflows for RNA-seq analysis. I understand that GATK mainly focuses on variant calling, can anyone tell me how to use GATK for RNA-seq analysis?

thanks daniel

Created 2013-06-07 18:45:10 | Updated 2013-06-07 18:45:59 | Tags: commandlinegatk phasebytransmission mkvcf
Comments (8)

Hello Team,

I am attempting to run GATK's PhasebyTransmission command to phase a vcf file contains a father, mother, son trio generated from complete genomics mkvcf command.

After creating the ped file and running the command I generate the error: "MESSAGE: BUG: Attempted to get likelihoods as strings and neither the vector nor the string is set!". I am not exactly sure what this means.

When I check my file and the documentation I am able to see that the 'GL' field is contained in the file, but could this not be the case? I have attached a few lines from the vcf I am using.

Any help with resolving the this issue would be of great help.

Thank you


Created 2013-06-06 21:08:30 | Updated | Tags: commandlinegatk queue hadoop mapreduce google
Comments (8)

Hello, I`m new to GATK and Queue. I understand that we can write a QScript in Queue to generate separate GATK jobs and run them on a cluster of several nodes. Can we implement GATK or Queue on google hadoop?

Created 2013-04-22 13:27:13 | Updated 2013-04-22 13:29:47 | Tags: commandlinegatk intervals
Comments (3)

I got this error message, when trying to use a file to specify at which positions to emit variants:

ERROR MESSAGE: Couldn't read file /lustre/scratch109/sanger/tc9/agv/wgs/pipeline/union4x.positions because The interval file /lustre/scratch109/sanger/tc9/agv/wgs/pipeline/union4x.positions does not have one of the supported extensions (.bed, .list, .picard, .interval_list, or .intervals). Please rename your file with the appropriate extension. Is there a GATK page describing those 5 file formats? Some of them are unknown to me; e.g. .list.

I asked my question here, but please ignore it: http://gatkforums.broadinstitute.org/discussion/2219/l-option

Thanks a lot.

Also, the error message does not mention support for vcf files, but the documentation does. Are vcf files supported?

Created 2013-03-25 10:37:06 | Updated 2013-03-25 10:42:06 | Tags: commandlinegatk printreadswalker
Comments (5)

hi all! I'm trying to complete my first GATK run, I'm doing the step in the "EXECUTION STEP" following section.

please tell me if the step execution are globally correct.


the step 4.1 isn't executed without -maxCycle 1500.

when try to execute 4.2 step I got the following error:

ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Key 1036 is too large for dimension 2 (max is 1001) at org.broadinstitute.sting.utils.collections.NestedIntegerArray.put(NestedIntegerArray.java:128) at org.broadinstitute.sting.utils.recalibration.RecalibrationReport.parseAllCovariatesTable(RecalibrationReport.java:157) at org.broadinstitute.sting.utils.recalibration.RecalibrationReport.(RecalibrationReport.java:68) at org.broadinstitute.sting.utils.recalibration.BaseRecalibration.(BaseRecalibration.java:74) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.setBaseRecalibration(GenomeAnalysisEngine.java:217) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:253) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: Key 1036 is too large for dimension 2 (max is 1001)
ERROR ------------------------------------------------------------------------------------------

---------------------------------------------------------------EXECUTION STEP---------------------------------------------------------


java -Xmx4g -Djava.io.tmpdir=/tmp -jar MarkDuplicates.jar INPUT=M9.bam OUTPUT=m9.marked.bam METRICS_FILE=metrics CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT



java -Xmx4g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta -knowndbsnp_137.hg19.vcf -o m9.list -I m9.marked.bam


java -Xmx4g -Djava.io.tmpdir=/tmp -jar GenomeAnalysisTK.jar -I m9.marked.bam -R ucsc.hg19.fasta -T IndelRealigner -targetIntervals m9.list -known dbsnp_137.hg19.vcf -o m9.marked.realigned.bam


java -Xmx4g -jar GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T ReduceReads -I m9.marked.realigned.bam -o m9.marked.realigned.reduce.bam


java -Djava.io.tmpdir=/tmp/flx-auswerter -Xmx4g -jar FixMateInformation.jar INPUT=m9.marked.realigned.reduce.bam OUTPUT=m9.marked.realigned.reduce.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true



java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R ucsc.hg19.fasta -knownSites dbsnp_137.hg19.vcf -I m9.marked.realigned.reduce.fixed.bam -T BaseRecalibrator -maxCycle 1500 -cov ReadGroupCovariate -cov QualityScoreCovariate -o m9.recal_data.grp

4.2 ***********************************

java -Xmx4g -jar GenomeAnalysisTK.jar -T PrintReads -R ucsc.hg19.fasta -I m9.marked.realigned.reduce.fixed.bam -BQSR m9.recal_data.grp -o m9.marked.realigned.reduce.fixed.recal.bam



java -Xmx4g -jar GenomeAnalysisTK.jar -nct 4 --num_threads 4 -glm BOTH -R ucsc.hg19.fasta -T UnifiedGenotyper --sample_ploidy 5 -I m9.marked.realigned.reduce.fixed.bam -D dbsnp_137.hg19.vcf -o m9.vcf -stand_call_conf 20.0 -stand_emit_conf 20.0
-A DepthOfCoverage -A AlleleBalance

Created 2012-10-31 01:51:31 | Updated 2012-10-31 22:20:05 | Tags: baserecalibrator commandlinegatk phone-home
Comments (1)

HI When I run Base recabrator with the following command:

java -Xmx4g -jar /usr/bin/GenomeAnalysisTK.jar -T BaseRecalibrator -I realignedBam.bam  -R /data1/human_g1k_v37.fasta --knownSites /data1/snp132.vcf -o recalibration_report.grp

I get the following error :

INFO  07:15:53,380 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,380 HttpMethodDirector - Retrying request 
INFO  07:15:53,386 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,387 HttpMethodDirector - Retrying request 
INFO  07:15:53,393 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,393 HttpMethodDirector - Retrying request 
INFO  07:15:53,398 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,398 HttpMethodDirector - Retrying request 
INFO  07:15:53,405 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,405 HttpMethodDirector - Retrying request 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.0-34-g07bda93): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:
##### ERROR          Name        FeatureType   Documentation
##### ERROR          BCF2     VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_bcf2_BCF2Codec.html
##### ERROR        BEAGLE      BeagleFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_beagle_BeagleCodec.html
##### ERROR           BED         BEDFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_bed_BEDCodec.html
##### ERROR      BEDTABLE       TableFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_table_BedTableCodec.html
##### ERROR EXAMPLEBINARY            Feature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_example_ExampleBinaryCodec.html
##### ERROR      GELITEXT    GeliTextFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_gelitext_GeliTextCodec.html
##### ERROR      OLDDBSNP    OldDbSNPFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_dbsnp_OldDbSNPCodec.html
##### ERROR     RAWHAPMAP   RawHapMapFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_hapmap_RawHapMapCodec.html
##### ERROR        REFSEQ      RefSeqFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_refseq_RefSeqCodec.html
##### ERROR     SAMPILEUP   SAMPileupFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html
##### ERROR       SAMREAD     SAMReadFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_samread_SAMReadCodec.html
##### ERROR         TABLE       TableFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_table_TableCodec.html
##### ERROR           VCF     VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_vcf_VCFCodec.html
##### ERROR          VCF3     VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_vcf_VCF3Codec.html
##### ERROR ------------------------------------------------------------------------------------------

Created 2012-10-30 14:27:46 | Updated 2012-10-30 16:36:58 | Tags: commandlinegatk user-error
Comments (1)

Hi I´ve a strange problem with the GATK. Everytime I try to run it my Console shows the following error Message.

30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR ------------------------------------------------------------------------------------------
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR A USER ERROR has occurred (version 2.1-13-g0f021e6): 
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR Please do not post this error to the GATK forum
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR Visit our website and forum for extensive documentation and answers to 
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR MESSAGE: Argument with name '--analysis_type' (-T) is missing.
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR ------------------------------------------------------------------------------------------

Can you show me my mistakes please? With regards Oliver

Created 2012-10-19 02:07:55 | Updated 2012-10-19 02:32:55 | Tags: commandlinegatk java
Comments (2)


During running of the depthOfCoverage tool, I get the error: /tmp/RsQHCt1W: No space left on device

I have tried changing the TMPDIR environment variable (and exporting) but eventually I get the same error. Is there a way to change the temporary directory that GATK uses?

I'm running GATK v2.1-8-g5efb575 on a Linux system.

Thanks, Rick