VariantValidationAssessor

Annotates a validation (from Sequenom for example) VCF with QC metrics (HW-equilibrium, % failed probes)

Category Validation Utilities

Traversal LocusWalker

PartitionBy LOCUS


Overview

The Variant Validation Assessor is a tool for vetting/assessing validation data (containing genotypes). The tool produces a VCF that is annotated with information pertaining to plate quality control and by default is soft-filtered by high no-call rate or low Hardy-Weinberg probability. If you have .ped files, please first convert them to VCF format.

Input

A validation VCF to annotate.

Output

An annotated VCF. Additionally, a table like the following will be output:

     Total number of samples assayed:                  185
     Total number of records processed:                152
     Number of Hardy-Weinberg violations:              34 (22%)
     Number of no-call violations:                     12 (7%)
     Number of homozygous variant violations:          0 (0%)
     Number of records passing all filters:            106 (69%)
     Number of passing records that are polymorphic:   98 (92%)
 

Examples

 java -Xmx2g -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T VariantValidationAssessor \
   --variant input.vcf \
   -o output.vcf
 

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by VariantValidationAssessor.

Downsampling settings

This tool applies the following downsampling settings by default.

  • Mode: BY_SAMPLE
  • To coverage: 1,000

Window size

This tool uses a sliding window on the reference.

  • Window start: 0 bp before the locus
  • Window stop: 40 bp after the locus

Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

VariantValidationAssessor specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Inputs
--variant
 -V
NA Input VCF file
Optional Outputs
--out
 -o
stdout File to which variants should be written
Optional Parameters
--maxHardy
20.0 Maximum phred-scaled Hardy-Weinberg violation pvalue to consider an assay valid
--maxHomVar
1.1 Maximum homozygous variant rate (as a fraction) to consider an assay valid
--maxNoCall
0.05 Maximum no-call rate (as a fraction) to consider an assay valid

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--maxHardy

Maximum phred-scaled Hardy-Weinberg violation pvalue to consider an assay valid

double  20.0  [ [ -?  ? ] ]


--maxHomVar

Maximum homozygous variant rate (as a fraction) to consider an assay valid
To disable, set to a value greater than 1.

double  1.1  [ [ -?  ? ] ]


--maxNoCall

Maximum no-call rate (as a fraction) to consider an assay valid
To disable, set to a value greater than 1.

double  0.05  [ [ -?  ? ] ]


--out / -o

File to which variants should be written

VariantContextWriter  stdout


--variant / -V

Input VCF file
Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file).

--variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

R RodBinding[VariantContext]


See also Guide Index | Tool Documentation Index | Support Forum

GATK version 3.2-2-gec30cee built at 2014/07/17 17:54:48. GTD: NA