Concatenates VCF files of non-overlapped genome intervals, all with the same set of samples
The main purpose of this tool is to speed up the gather function when using scatter-gather parallelization. This tool concatenates the scattered output VCF files. It assumes that: - All the input VCFs (or BCFs) contain the same samples in the same order. - The variants in each input file are from non-overlapping (scattered) intervals. When the input files are already sorted based on the intervals start positions, use -assumeSorted. Note: Currently the tool is more efficient when working with VCFs; we will work to make it as efficient for BCFs.
One or more variant sets to combine. They should be of non-overlapping genome intervals and with the same samples (in the same order). The input files should be 'name.vcf' or 'name.VCF' or 'name.bcf' or 'name.BCF'. If the files are ordered according to the appearance of intervals in the ref genome, then one can use the -assumeSorted flag.
A combined VCF. The output file should be 'name.vcf' or 'name.VCF'. <\p>
This is a command-line utility that bypasses the GATK engine. As a result, the command-line you must use to invoke it is a little different from other GATK tools (see example below), and it does not accept any of the classic "CommandLineGATK" arguments.
java -cp GenomeAnalysisTK.jar org.broadinstitute.sting.tools.CatVariants \ -R ref.fasta \ -V input1.vcf \ -V input2.vcf \ -out output.vcf \ -assumeSorted
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
|Argument name(s)||Default value||Summary|
|NA||genome reference file
|NA||Input VCF file/s named
|NA||output file name
|NA||Set the logging location|
|INFO||Set the minimum level of logging|
||-1||the parameter (bin width or features per bin) to pass to the VCF/BCF IndexCreator|
||DYNAMIC_SEEK||which type of IndexCreator to use for VCF/BCF indices|
||false||assumeSorted should be true if he input files are already sorted (based on the position of the variants|
|false||Generate the help message|
||false||Output version information|
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
assumeSorted should be true if he input files are already sorted (based on the position of the variants
Generate the help message
This will produce a help message in the terminal with general usage information, listing available arguments as well as tool-specific information if applicable.
Set the logging location
File to save the logging output.
Set the minimum level of logging
Setting INFO gets you INFO up to FATAL, setting ERROR gets you ERROR and FATAL level logging, and so on.
output file name
genome reference file
Input VCF file/s named
The VCF or BCF files to merge together CatVariants can take any number of -V arguments on the command line. Each -V argument will be included in the final merged output VCF. The order of arguments does not matter, but it runs more efficiently if they are sorted based on the intervals and the assumeSorted argument is used.
the parameter (bin width or features per bin) to pass to the VCF/BCF IndexCreator
Integer -1 [ [ -? ? ] ]
which type of IndexCreator to use for VCF/BCF indices
The --variant_index_type argument is an enumerated type (GATKVCFIndexType), which can have one of the following values:
Output version information
Use this to check the version number of the GATK executable you are invoking. Note that the version number is always included in the output at the start of every run as well as any error message.
GATK version 3.1-1-g07a4bf8 built at 2014/03/18 07:00:36. GTD: NA