For a complete, detailed argument reference, refer to the technical documentation page.
You can find detailed information about the various modules here.
Note that the GenotypeConcordance module has been rewritten as a separate walker tool (see its Technical Documentation page).
We in GSA often find ourselves performing an analysis of 2 different call sets. For SNPs, we often show the overlap of the sets (their "venn") and the relative dbSNP rates and/or transition-transversion ratios. The picture provided is an example of such a slide and is easy to create using VariantEval. Assuming you have 2 filtered VCF callsets named 'foo.vcf' and 'bar.vcf', there are 2 quick steps.
java -jar GenomeAnalysisTK.jar \ -R ref.fasta \ -T CombineVariants \ -V:FOO foo.vcf \ -V:BAR bar.vcf \ -priority FOO,BAR \ -o merged.vcf
java -jar GenomeAnalysisTK.jar \ -T VariantEval \ -R ref.fasta \ -D dbsnp.vcf \ -select 'set=="Intersection"' -selectName Intersection \ -select 'set=="FOO"' -selectName FOO \ -select 'set=="FOO-filterInBAR"' -selectName InFOO-FilteredInBAR \ -select 'set=="BAR"' -selectName BAR \ -select 'set=="filterInFOO-BAR"' -selectName InBAR-FilteredInFOO \ -select 'set=="FilteredInAll"' -selectName FilteredInAll \ -o merged.eval.gatkreport \ -eval merged.vcf \ -l INFO
It is wise to check the actual values for the set names present in your file before writing complex VariantEval commands. An easy way to do this is to extract the value of the set fields and then reduce that to the unique entries, like so:
java -jar GenomeAnalysisTK.jar -T VariantsToTable -R ref.fasta -V merged.vcf -F set -o fields.txt grep -v 'set' fields.txt | sort | uniq -c
This will provide you with a list of all of the possible values for 'set' in your VCF so that you can be sure to supply the correct select statements to VariantEval.
The VariantEval output is formatted as a GATKReport.
The VariantEval genotype concordance module emits information the relationship between the eval calls and genotypes and the comp calls and genotypes. The following three slides provide some insight into three key metrics to assess call sensitivity and concordance between genotypes.
##:GATKReport.v0.1 GenotypeConcordance.sampleSummaryStats : the concordance statistics summary for each sample GenotypeConcordance.sampleSummaryStats CompRod CpG EvalRod JexlExpression Novelty percent_comp_ref_called_var percent_comp_het_called_het percent_comp_het_called_var percent_comp_hom_called_hom percent_comp_hom_called_var percent_non-reference_sensitivity percent_overall_genotype_concordance percent_non-reference_discrepancy_rate GenotypeConcordance.sampleSummaryStats compOMNI all eval none all 0.78 97.65 98.39 99.13 99.44 98.80 99.09 3.60
The key outputs:
All defined below.