Concatenates VCF files of non-overlapped genome intervals, all with the same set of samples
The main purpose of this tool is to speed up the gather function when using scatter-gather parallelization. This tool concatenates the scattered output VCF files. It assumes that: - All the input VCFs (or BCFs) contain the same samples in the same order. - The variants in each input file are from non-overlapping (scattered) intervals. When the input files are already sorted based on the intervals start positions, use -assumeSorted. Note: Currently the tool is more efficient when working with VCFs; we will work to make it as efficient for BCFs.
One or more variant sets to combine. They should be of non-overlapping genome intervals and with the same samples (in the same order). The input files should be 'name.vcf' or 'name.VCF' or 'name.bcf' or 'name.BCF'. If the files are ordered according to the appearance of intervals in the ref genome, then one can use the -assumeSorted flag.
A combined VCF. The output file should be 'name.vcf' or 'name.VCF'. <\p>
This is a command-line utility that bypasses the GATK engine. As a result, the command-line you must use to invoke it is a little different from other GATK tools (see example below), and it does not accept any of the classic "CommandLineGATK" arguments.
java -cp GenomeAnalysisTK.jar org.broadinstitute.sting.tools.CatVariants \ -R ref.fasta \ -V input1.vcf \ -V input2.vcf \ -out output.vcf \ -assumeSorted
This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.
|--reference||File||NA||genome reference file
|--variant||List[File]||NA||Input VCF file/s named
|--outputFile||File||NA||output file name
|--log_to_file||String||NA||Set the logging location|
|--logging_level||String||INFO||Set the minimum level of logging, i.e. setting INFO get's you INFO up to FATAL, setting ERROR gets you ERROR and FATAL level logging.|
|--variant_index_parameter||Integer||-1||the parameter (bin width or features per bin) to pass to the VCF/BCF IndexCreator|
|--variant_index_type||GATKVCFIndexType||DYNAMIC_SEEK||which type of IndexCreator to use for VCF/BCF indices|
|--assumeSorted||Boolean||false||assumeSorted should be true if he input files are already sorted (based on the position of the variants|
|--help||Boolean||false||Generate this help message|
|--version||Boolean||false||Output version information|
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
assumeSorted should be true if he input files are already sorted (based on the position of the variants.
Generate this help message. this is used to indicate if they've asked for help
Set the logging location. where to send the output of our logger
Set the minimum level of logging, i.e. setting INFO get's you INFO up to FATAL, setting ERROR gets you ERROR and FATAL level logging.. the default log level
output file name
genome reference file
Input VCF file/s named
the parameter (bin width or features per bin) to pass to the VCF/BCF IndexCreator.
which type of IndexCreator to use for VCF/BCF indices.
The --variant_index_type argument is an enumerated type (GATKVCFIndexType), which can have one of the following values:
Output version information. This is used to indicate if they've asked for the version information
GATK version 2.8-1-g2a26ec9 built at 2013/12/06 16:54:02.