No official posts found with tag GenotypeConcordance

GenotypeConcordance

Genotype concordance (per-sample and aggregate counts and frequencies, NRD/NRS and site allele overlaps) between two callsets

Category Variant Evaluation and Manipulation Tools

Traversal LocusWalker

PartitionBy LOCUS


Overview

GenotypeConcordance takes in two callsets (vcfs) and tabulates the number of sites which overlap and share alleles, and for each sample, the genotype-by-genotype counts (for instance, the number of sites at which a sample was called homozygous reference in the EVAL callset, but homozygous variant in the COMP callset). It outputs these counts as well as convenient proportions (such as the proportion of het calls in the EVAL which were called REF in the COMP) and metrics (such as NRD and NRS).

Input

Genotype concordance requires two callsets (as it does a comparison): an EVAL and a COMP callset, specified via the -eval and -comp arguments. (Optional) Jexl expressions for genotype-level filtering of EVAL or COMP genotypes, specified via the -gfe and -cfe arguments, respectively.

Output

Genotype Concordance writes a GATK report to the specified file (via -o) , consisting of multiple tables of counts and proportions. These tables may be optionally moltenized via the -moltenize argument.

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by GenotypeConcordance.


Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

GenotypeConcordance specific arguments

This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.

Name Type Default value Summary
Required
--comp RodBinding[VariantContext] NA The variants and genotypes to compare against
--eval RodBinding[VariantContext] NA The variants and genotypes to evaluate
--moltenize boolean false Molten rather than tabular output
Optional
--genotypeFilterExpressionComp ArrayList[String] [] One or more criteria to use to set COMP genotypes to no-call. These genotype-level filters are only applied to the COMP rod.
--genotypeFilterExpressionEval ArrayList[String] [] One or more criteria to use to set EVAL genotypes to no-call. These genotype-level filters are only applied to the EVAL rod.
--ignoreFilters boolean false Filters will be ignored
--out PrintStream stdout An output file created by the walker. Will overwrite contents if file exists

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--comp / -comp ( required RodBinding[VariantContext] )

The variants and genotypes to compare against. The callset you want to treat as 'truth'. Can also be of unknown quality for the sake of callset comparisons. --comp binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

--eval / -eval ( required RodBinding[VariantContext] )

The variants and genotypes to evaluate. The callset you want to evaluate, typically this is where you'd put 'unassessed' callsets. --eval binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

--genotypeFilterExpressionComp / -gfc ( ArrayList[String] with default value [] )

One or more criteria to use to set COMP genotypes to no-call. These genotype-level filters are only applied to the COMP rod.. Identical to -gfe except the filter is applied to genotypes in the comp rod.

--genotypeFilterExpressionEval / -gfe ( ArrayList[String] with default value [] )

One or more criteria to use to set EVAL genotypes to no-call. These genotype-level filters are only applied to the EVAL rod.. A genotype level JEXL expression to apply to eval genotypes. Genotypes filtered in this way will be replaced by NO_CALL. For instance: -gfe 'GQ<20' will set to no-call any genotype with genotype quality less than 20.

--ignoreFilters ( boolean with default value false )

Filters will be ignored. The FILTER field of the eval and comp VCFs will be ignored. If this flag is not included, all FILTER sites will be treated as not being present in the VCF. (That is, the genotypes will be assigned UNAVAILABLE, as distinct from NO_CALL).

--moltenize / -moltenize ( boolean with default value false )

Molten rather than tabular output. Moltenize the count and proportion tables. Rather than moltenizing per-sample data into a 2x2 table, it is fully moltenized into elements. That is, WITHOUT this argument, each row of the table begins with the sample name and proceeds directly with counts/proportions of eval/comp counts (for instance HOM_REF/HOM_REF, HOM_REF/NO_CALL). If the Moltenize argument is given, the output will begin with a sample name, followed by the contrastive genotype type (such as HOM_REF/HOM_REF), followed by the count or proportion. This will significantly increase the number of rows.

--out / -o ( PrintStream with default value stdout )

An output file created by the walker. Will overwrite contents if file exists.


See also Guide Index | Technical Documentation Index | Support Forum

GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.