CatVariants

Concatenates VCF files of non-overlapped genome intervals, all with the same set of samples

Category Variant Evaluation and Manipulation Tools


Overview

The main purpose of this tool is to speed up the gather function when using scatter-gather parallelization. This tool concatenates the scattered output VCF files. It assumes that: - All the input VCFs (or BCFs) contain the same samples in the same order. - The variants in each input file are from non-overlapping (scattered) intervals. When the input files are already sorted based on the intervals start positions, use -assumeSorted. Note: Currently the tool is more efficient when working with VCFs; we will work to make it as efficient for BCFs.

Input

One or more variant sets to combine. They should be of non-overlapping genome intervals and with the same samples (in the same order). The input files should be 'name.vcf' or 'name.VCF' or 'name.bcf' or 'name.BCF'. If the files are ordered according to the appearance of intervals in the ref genome, then one can use the -assumeSorted flag.

Output

A combined VCF. The output file should be 'name.vcf' or 'name.VCF'. <\p>

Important note

This is a command-line utility that bypasses the GATK engine. As a result, the command-line you must use to invoke it is a little different from other GATK tools (see example below), and it does not accept any of the classic "CommandLineGATK" arguments.

Example

 java -cp GenomeAnalysisTK.jar org.broadinstitute.sting.tools.CatVariants \
    -R ref.fasta \
    -V input1.vcf \
    -V input2.vcf \
    -out output.vcf \
    -assumeSorted
 

Command-line Arguments

CatVariants specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Inputs
--reference
 -R
NA genome reference file .fasta
--variant
 -V
NA Input VCF file/s named .vcf or .bcf
Required Outputs
--outputFile
 -out
NA output file name .vcf or .bcf
Optional Outputs
--log_to_file
 -log
NA Set the logging location
Optional Parameters
--logging_level
 -l
INFO Set the minimum level of logging
--variant_index_parameter
-1 the parameter (bin width or features per bin) to pass to the VCF/BCF IndexCreator
--variant_index_type
DYNAMIC_SEEK which type of IndexCreator to use for VCF/BCF indices
Optional Flags
--assumeSorted
false assumeSorted should be true if he input files are already sorted (based on the position of the variants
--help
 -h
false Generate the help message
--version
false Output version information

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--assumeSorted / -assumeSorted

assumeSorted should be true if he input files are already sorted (based on the position of the variants

Boolean  false


--help / -h

Generate the help message
This will produce a help message in the terminal with general usage information, listing available arguments as well as tool-specific information if applicable.

Boolean  false


--log_to_file / -log

Set the logging location
File to save the logging output.

String


--logging_level / -l

Set the minimum level of logging
Setting INFO gets you INFO up to FATAL, setting ERROR gets you ERROR and FATAL level logging, and so on.

String  INFO


--outputFile / -out

output file name .vcf or .bcf

R File


--reference / -R

genome reference file .fasta

R File


--variant / -V

Input VCF file/s named .vcf or .bcf
The VCF or BCF files to merge together CatVariants can take any number of -V arguments on the command line. Each -V argument will be included in the final merged output VCF. The order of arguments does not matter, but it runs more efficiently if they are sorted based on the intervals and the assumeSorted argument is used.

R List[File]


--variant_index_parameter

the parameter (bin width or features per bin) to pass to the VCF/BCF IndexCreator

Integer  -1  [ [ -?  ? ] ]


--variant_index_type

which type of IndexCreator to use for VCF/BCF indices

The --variant_index_type argument is an enumerated type (GATKVCFIndexType), which can have one of the following values:

DYNAMIC_SEEK
DYNAMIC_SIZE
LINEAR
INTERVAL

GATKVCFIndexType  DYNAMIC_SEEK


--version / -version

Output version information
Use this to check the version number of the GATK executable you are invoking. Note that the version number is always included in the output at the start of every run as well as any error message.

Boolean  false


See also Guide Index | Tool Documentation Index | Support Forum

GATK version 3.1-1-g07a4bf8 built at 2014/03/18 07:00:36. GTD: NA