# VariantFiltration

Filters variant calls using a number of user-selectable, parameterizable criteria.

## Overview

VariantFiltration is a GATK tool for hard-filtering variant calls based on certain criteria. Records are hard-filtered by changing the value in the FILTER field to something other than PASS.

### Input

A variant set to filter.

A filtered VCF.

### Examples

 java -Xmx2g -jar GenomeAnalysisTK.jar \
-R ref.fasta \
-T VariantFiltration \
-o output.vcf \
--variant input.vcf \
--filterExpression "AB < 0.2 || MQ0 > 50" \
--filterName "Nov09filters" \


These Read Filters are automatically applied to the data by the Engine before processing by VariantFiltration.

### Downsampling settings

This tool applies the following downsampling settings by default.

• Mode: BY_SAMPLE
• To coverage: 1,000

### Window size

This tool uses a sliding window on the reference.

• Window start: -50 bp before the locus
• Window stop: 50 bp after the locus

## Command-line Arguments

### Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

### VariantFiltration specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Inputs
--variant
-V
NA Input VCF file
Optional Inputs
Optional Outputs
--out
-o
NA File to which variants should be written
Optional Parameters
--clusterSize
-cluster
3 The number of SNPs which make up a cluster
--clusterWindowSize
-window
0 The window size (in bases) in which to evaluate clustered SNPs
--filterExpression
-filter
NA One or more expression used with INFO fields to filter
--filterName
NA Names to use for the list of filters
--genotypeFilterExpression
-G_filter
NA One or more expression used with FORMAT (sample/genotype-level) fields to filter (see documentation guide for more info)
--genotypeFilterName
-G_filterName
NA Names to use for the list of sample/genotype filters (must be a 1-to-1 mapping); this name is put in the FILTER field for variants that get filtered
0 How many bases beyond records from a provided 'mask' rod should variants be filtered
NA The text to put in the FILTER field if a 'mask' rod is provided and overlaps with a variant call
Optional Flags
NA Filter records NOT in given input mask.
--invalidatePreviousFilters
NA Remove previous filters applied to the VCF
--missingValuesInExpressionsShouldEvaluateAsFailing
NA When evaluating the JEXL expressions, missing values should be considered failing the expression

### Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

### --clusterSize / -cluster

The number of SNPs which make up a cluster
Works together with the --clusterWindowSize argument.

Integer  [ [ -∞  ∞ ] ]

### --clusterWindowSize / -window

The window size (in bases) in which to evaluate clustered SNPs
Works together with the --clusterSize argument. To disable the clustered SNP filter, set this value to less than 1.

Integer  [ [ -∞  ∞ ] ]

### --filterExpression / -filter

One or more expression used with INFO fields to filter
VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using --filterName One --filterExpression "X < 1" --filterName Two --filterExpression "X > 2").

ArrayList[String]

### --filterName / -filterName

Names to use for the list of filters
This name is put in the FILTER field for variants that get filtered. Note that there must be a 1-to-1 mapping between filter expressions and filter names.

ArrayList[String]

Filter records NOT in given input mask.
By default, if the -mask argument is used, any variant falling in a mask will be filtered. If this argument is used, logic is reversed, and variants falling outside a given mask will be filtered. Use case is, for example, if we have an interval list or BED file with "good" sites. Note that it is up to the user to adapt the name of the mask to make it clear that the reverse logic was used (e.g. if masking against Hapmap, use -maskName=hapmap for the normal masking and -maskName=not_hapmap for the reverse masking).

boolean

### --genotypeFilterExpression / -G_filter

One or more expression used with FORMAT (sample/genotype-level) fields to filter (see documentation guide for more info)
Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead. VariantFiltration will add the sample-level FT tag to the FORMAT field of filtered samples (this does not affect the record's FILTER tag). One can filter normally based on most fields (e.g. "GQ < 5.0"), but the GT (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets ("isHet == 1"), refs ("isHomRef == 1"), or homs ("isHomVar == 1"). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object.

ArrayList[String]

### --genotypeFilterName / -G_filterName

Names to use for the list of sample/genotype filters (must be a 1-to-1 mapping); this name is put in the FILTER field for variants that get filtered
Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead.

ArrayList[String]

### --invalidatePreviousFilters / NA

Remove previous filters applied to the VCF
Invalidate previous filters applied to the VariantContext, applying only the filters here

boolean

Any variant which overlaps entries from the provided mask rod will be filtered. If the user wants logic to be reversed, i.e. filter variants that do not overlap with provided mask, then argument -filterNotInMask can be used. Note that it is up to the user to adapt the name of the mask to make it clear that the reverse logic was used (e.g. if masking against Hapmap, use -maskName=hapmap for the normal masking and -maskName=not_hapmap for the reverse masking).

--mask binds reference ordered data. This argument supports ROD files of the following types: BCF2, BEAGLE, BED, BEDTABLE, EXAMPLEBINARY, GELITEXT, OLDDBSNP, RAWHAPMAP, REFSEQ, SAMPILEUP, SAMREAD, TABLE, VCF, VCF3

RodBinding[Feature]

How many bases beyond records from a provided 'mask' rod should variants be filtered

Integer  [ [ -∞  ∞ ] ]

The text to put in the FILTER field if a 'mask' rod is provided and overlaps with a variant call

String

### --missingValuesInExpressionsShouldEvaluateAsFailing / NA

When evaluating the JEXL expressions, missing values should be considered failing the expression
By default, if JEXL cannot evaluate your expression for a particular record because one of the annotations is not present, the whole expression evaluates as PASSing. Use this argument to have it evaluate as failing filters instead for these cases.

Boolean

### --out / -o

File to which variants should be written

VariantContextWriter

### --variant / -V

Input VCF file
Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file).

--variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

R RodBinding[VariantContext]