ValidateVariants

Validates a VCF file with an extra strict set of criteria.

Category Validation Utilities

Traversal LocusWalker

PartitionBy LOCUS


Overview

ValidateVariants is a GATK tool that takes a VCF file and validates much of the information inside it. In addition to standard adherence to the VCF specification, this tool performs extra checks to make ensure the information contained within the file is correct. Checks include the correctness of the reference base(s), accuracy of AC & AN values, tests against rsIDs when a dbSNP file is provided, and that all alternate alleles are present in at least one sample. If you are looking simply to test the adherence to the VCF specification, use --validationType NONE.

Input

A variant set to validate.

Examples

 java -Xmx2g -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T ValidateVariants \
   --variant input.vcf \
   --dbsnp dbsnp.vcf
 

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by ValidateVariants.

Window size

This tool uses a sliding window on the reference.

  • Window start: 0 bp before the locus
  • Window stop: 100 bp after the locus

Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

ValidateVariants specific arguments

This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.

Name Type Default value Summary
Required
--variant RodBinding[VariantContext] NA Input VCF file
Optional
--dbsnp RodBinding[VariantContext] none dbSNP file
--doNotValidateFilteredRecords Boolean false skip validation on filtered records
--validationType ValidationType ALL which validation type to run
--warnOnErrors Boolean false just emit warnings on errors instead of terminating the run at the first instance

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--dbsnp / -D ( RodBinding[VariantContext] with default value none )

dbSNP file. --dbsnp binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

--doNotValidateFilteredRecords / -doNotValidateFilteredRecords ( Boolean with default value false )

skip validation on filtered records. By default, even filtered records are validated.

--validationType / -type ( ValidationType with default value ALL )

which validation type to run.
The --validationType argument is an enumerated type (ValidationType), which can have one of the following values:

ALL
REF
IDS
ALLELES
CHR_COUNTS
NONE

--variant / -V ( required RodBinding[VariantContext] )

Input VCF file. Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file). --variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

--warnOnErrors / -warnOnErrors ( Boolean with default value false )

just emit warnings on errors instead of terminating the run at the first instance.


See also Guide Index | Technical Documentation Index | Support Forum

GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.