No official posts found with tag VariantValidationAssessor
No discussions found with tag VariantValidationAssessor

VariantValidationAssessor

Annotates a validation (from Sequenom for example) VCF with QC metrics (HW-equilibrium, % failed probes)

Category Validation Utilities

Traversal LocusWalker

PartitionBy LOCUS


Overview

The Variant Validation Assessor is a tool for vetting/assessing validation data (containing genotypes). The tool produces a VCF that is annotated with information pertaining to plate quality control and by default is soft-filtered by high no-call rate or low Hardy-Weinberg probability. If you have .ped files, please first convert them to VCF format.

Input

A validation VCF to annotate.

Output

An annotated VCF. Additionally, a table like the following will be output:

     Total number of samples assayed:                  185
     Total number of records processed:                152
     Number of Hardy-Weinberg violations:              34 (22%)
     Number of no-call violations:                     12 (7%)
     Number of homozygous variant violations:          0 (0%)
     Number of records passing all filters:            106 (69%)
     Number of passing records that are polymorphic:   98 (92%)
 

Examples

 java -Xmx2g -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T VariantValidationAssessor \
   --variant input.vcf \
   -o output.vcf
 

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by VariantValidationAssessor.

Window size

This tool uses a sliding window on the reference.

  • Window start: 0 bp before the locus
  • Window stop: 40 bp after the locus

Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

VariantValidationAssessor specific arguments

This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.

Name Type Default value Summary
Required
--variant RodBinding[VariantContext] NA Input VCF file
Optional
--maxHardy double 20.0 Maximum phred-scaled Hardy-Weinberg violation pvalue to consider an assay valid
--maxHomVar double 1.1 Maximum homozygous variant rate (as a fraction) to consider an assay valid
--maxNoCall double 0.05 Maximum no-call rate (as a fraction) to consider an assay valid
--out VariantContextWriter stdout File to which variants should be written

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--maxHardy ( double with default value 20.0 )

Maximum phred-scaled Hardy-Weinberg violation pvalue to consider an assay valid.

--maxHomVar ( double with default value 1.1 )

Maximum homozygous variant rate (as a fraction) to consider an assay valid. To disable, set to a value greater than 1.

--maxNoCall ( double with default value 0.05 )

Maximum no-call rate (as a fraction) to consider an assay valid. To disable, set to a value greater than 1.

--out / -o ( VariantContextWriter with default value stdout )

File to which variants should be written.

--variant / -V ( required RodBinding[VariantContext] )

Input VCF file. Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file). --variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3


See also Guide Index | Technical Documentation Index | Support Forum

GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.