Built-in command-line arguments

From GSA

Jump to: navigation, search

The GATK has many options for controlling the analyses' input, processing, and output. In addition to the core arguments supplied by the GATK itself, each analysis can request its own suite of command-line arguments. To get information on the GATK's arguments, run the GATK with the --help parameter:

java -jar $GATK_HOME/GenomeAnalysisTK.jar --help

To discover the available command-line arguments for a given analysis, select the analysis with the --analysis-type argument:

java -jar $GATK_HOME/GenomeAnalysisTK.jar -T PrintReads --help

Several command-line arguments can be multivalued. Pass multiple values to the GATK by specifying the same argument multiple times as follows:

java -jar $GATK_HOME/GenomeAnalysisTK.jar -T PrintReads -I <first>.bam -I <second>.bam ...

Contents

Required Options

--analysis_type (-T) 
Type of analysis to run. Built-in analyses include CountLoci, CountReads, DepthOfCoverage, and Pileup.
--input_file (-I)
The source of reads for the analysis, in BAM file format. BAM files must be sorted and accompanied by an index file (.bai). The -I argument must specify either a .list file containing a newline-separated list of BAM files or a .bam file. The -I parameter can accept a multiple values, and those values can be a mix of pure .bam files and .list files.
--reference_sequence (-R)  
Reference sequence file, in fasta format. Reference sequence files must be accompanied by a dictionary (.dict) and a fasta index file (.fai). See Input files for the GATK for a description of how to generate these supporting files from a fasta.

Controlling Input

--intervals (-L)  
A list of genomic intervals over which to operate. Each interval are specified in samtools format. Sets of intervals can be provided to the GATK in three formats: directly on the command-line, separated by a semicolon (-L "chr1:1-100000;chr2:1-50000"), in a flat file with one interval per line, or in a Picard-formatted interval file with header. An example Intervals file can be obtained here: thousand_genomes_alpha_redesign.targets.interval_list
--rodToIntervalTrackName (-BTI) 
Generates a interval list for the GATK to run over, using the specified ROD as the input. The parameter is the name of the associated -B you provided, i.e. -B myVCF,vcf,myVCFFile.vcf -BTI myVCF would take the call locations in the myVCFFile.vcf, parse them out, and run the GATK over those locations. If you also provide a -L option, the two sets are unioned.
--interval_merging (-im) 
Sets the rule for merging intervals in the GATK. The default is ALL, which merges all overlapping or abutting intervals (chr1:1-10 and chr1:5-15 would get merged into chr1:1-15, and well as chr1:1-5 and chr1:6-10 into chr1:1-10). The other option is OVERLAPPING_ONLY, which only merges overlapping intervals (the first example above) but not abutting (the second example). In the future we hope to support NONE, which would allow a raw interval list to be used.
--maximum_reads (-M) 
The maximum number of iterations to process before exiting. One iteration is defined as one read in a reads traversal, one locus in a locus or reference traversal, or one interval in a locus window traversal.
--validation_strictness (-S) 
Controls how aggressively the GATK validates input data.

Reference-Ordered Data

--DBSNP (-D) 
Historical syntactic sugar for the frequently used forms of reference-ordered data. Please use -B:dbsnp,VCF dbsnp.vcf instead see GATK resource bundle
--rodBind (-B) 
Bindings for reference-ordered data, in the form <name>,<type>,<file>

Controlling Processing

--numthreads (-nt)  
Enable shared memory parallelism with specified number of parallel threads. See Parallelism and the GATK for more information regarding whether your walker is parallel-ready. If supported, the GATK usually scales nearly linearly in performance with the more threads, at least up to 32 cores, which is the most we have access to.
--nonDeterministicRandomSeed (-ndrs) 
Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run

Controlling Output

--out (-o), --err (-e), --outerr (-oe) 
Output files for the walker. When specified, all output written to the out or err streams from within the walker will end up in these files.
--bam_compression (-compress) 
Many walkers generate BAM files as output. This walker allows users to adjust the compression level with which the bam file is written.
--sites_only (-sites_only) 
Appropriate for walkers that output VCF files, this argument instructs the GATK to produce sites-only files (i.e. with no genotype information).

Diagnostics

--log_to_file (-log), --logging_level (-l) 
Controls the amount of logging information presented to the user.
--help (-h)  
Get help and more specifics about command-line arguments.
Personal tools