What do the VariantEval modules do?
Posted in FAQs | Last updated on 2013-03-18 20:26:03


Comments (4)

VariantEval accepts two types of modules: stratification and evaluation modules.

  • Stratification modules will stratify (group) the variants based on certain properties.
  • Evaluation modules will compute certain metrics for the variants

CpG

CpG is a three-state stratification:

  • The locus is a CpG site ("CpG")
  • The locus is not a CpG site ("non_CpG")
  • The locus is either a CpG or not a CpG site ("all")

A CpG site is defined as a site where the reference base at a locus is a C and the adjacent reference base in the 3' direction is a G.

EvalRod

EvalRod is an N-state stratification, where N is the number of eval rods bound to VariantEval.

Sample

Sample is an N-state stratification, where N is the number of samples in the eval files.

Filter

Filter is a three-state stratification:

  • The locus passes QC filters ("called")
  • The locus fails QC filters ("filtered")
  • The locus either passes or fails QC filters ("raw")

FunctionalClass

FunctionalClass is a four-state stratification:

  • The locus is a synonymous site ("silent")
  • The locus is a missense site ("missense")
  • The locus is a nonsense site ("nonsense")
  • The locus is of any functional class ("any")

CompRod

CompRod is an N-state stratification, where N is the number of comp tracks bound to VariantEval.

Degeneracy

Degeneracy is a six-state stratification:

  • The underlying base position in the codon is 1-fold degenerate ("1-fold")
  • The underlying base position in the codon is 2-fold degenerate ("2-fold")
  • The underlying base position in the codon is 3-fold degenerate ("3-fold")
  • The underlying base position in the codon is 4-fold degenerate ("4-fold")
  • The underlying base position in the codon is 6-fold degenerate ("6-fold")
  • The underlying base position in the codon is degenerate at any level ("all")

See the [http://en.wikipedia.org/wiki/Genetic_code#Degeneracy Wikipedia page on degeneracy] for more information.

JexlExpression

JexlExpression is an N-state stratification, where N is the number of JEXL expressions supplied to VariantEval. See [[Using JEXL expressions]]

Novelty

Novelty is a three-state stratification:

  • The locus overlaps the knowns comp track (usually the dbSNP track) ("known")
  • The locus does not overlap the knowns comp track ("novel")
  • The locus either overlaps or does not overlap the knowns comp track ("all")

CountVariants

CountVariants is an evaluation module that computes the following metrics:

Metric Definition
nProcessedLoci Number of processed loci
nCalledLoci Number of called loci
nRefLoci Number of reference loci
nVariantLoci Number of variant loci
variantRate Variants per loci rate
variantRatePerBp Number of variants per base
nSNPs Number of snp loci
nInsertions Number of insertion
nDeletions Number of deletions
nComplex Number of complex loci
nNoCalls Number of no calls loci
nHets Number of het loci
nHomRef Number of hom ref loci
nHomVar Number of hom var loci
nSingletons Number of singletons
heterozygosity heterozygosity per locus rate
heterozygosityPerBp heterozygosity per base pair
hetHomRatio heterozygosity to homozygosity ratio
indelRate indel rate (insertion count + deletion count)
indelRatePerBp indel rate per base pair
deletionInsertionRatio deletion to insertion ratio

CompOverlap

CompOverlap is an evaluation module that computes the following metrics:

Metric Definition
nEvalSNPs number of eval SNP sites
nCompSNPs number of comp SNP sites
novelSites number of eval sites outside of comp sites
nVariantsAtComp number of eval sites at comp sites (that is, sharing the same locus as a variant in the comp track, regardless of whether the alternate allele is the same)
compRate percentage of eval sites at comp sites
nConcordant number of concordant sites (that is, for the sites that share the same locus as a variant in the comp track, those that have the same alternate allele)
concordantRate the concordance rate

Understanding the output of CompOverlap

A SNP in the detection set is said to be 'concordant' if the position exactly matches an entry in dbSNP and the allele is the same. To understand this and other output of CompOverlap, we shall examine a detailed example. First, consider a fake dbSNP file (headers are suppressed so that one can see the important things):

 $ grep -v '##' dbsnp.vcf
 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
 1       10327   rs112750067     T       C       .       .       ASP;R5;VC=SNP;VP=050000020005000000000100;WGT=1;dbSNPBuildID=132

Now, a detection set file with a single sample, where the variant allele is the same as listed in dbSNP:

 $ grep -v '##' eval_correct_allele.vcf
 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT            001-6
 1       10327   .       T       C       5168.52 PASS    ...     GT:AD:DP:GQ:PL    0/1:357,238:373:99:3959,0,4059

Finally, a detection set file with a single sample, but the alternate allele differs from that in dbSNP:

 $ grep -v '##' eval_incorrect_allele.vcf
 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT            001-6
 1       10327   .       T       A       5168.52 PASS    ...     GT:AD:DP:GQ:PL    0/1:357,238:373:99:3959,0,4059

Running VariantEval with just the CompOverlap module:

 $ java -jar $STING_DIR/dist/GenomeAnalysisTK.jar -T VariantEval \
        -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
        -L 1:10327 \
        -B:dbsnp,VCF dbsnp.vcf \
        -B:eval_correct_allele,VCF eval_correct_allele.vcf \
        -B:eval_incorrect_allele,VCF eval_incorrect_allele.vcf \
        -noEV \
        -EV CompOverlap \
        -o eval.table

We find that the eval.table file contains the following:

 $ grep -v '##' eval.table | column -t 
 CompOverlap  CompRod  EvalRod                JexlExpression  Novelty  nEvalVariants  nCompVariants  novelSites  nVariantsAtComp  compRate      nConcordant  concordantRate
 CompOverlap  dbsnp    eval_correct_allele    none            all      1              1              0           1                100.00000000  1            100.00000000
 CompOverlap  dbsnp    eval_correct_allele    none            known    1              1              0           1                100.00000000  1            100.00000000
 CompOverlap  dbsnp    eval_correct_allele    none            novel    0              0              0           0                0.00000000    0            0.00000000
 CompOverlap  dbsnp    eval_incorrect_allele  none            all      1              1              0           1                100.00000000  0            0.00000000
 CompOverlap  dbsnp    eval_incorrect_allele  none            known    1              1              0           1                100.00000000  0            0.00000000
 CompOverlap  dbsnp    eval_incorrect_allele  none            novel    0              0              0           0                0.00000000    0            0.00000000

As you can see, the detection set variant was listed under nVariantsAtComp (meaning the variant was seen at a position listed in dbSNP), but only the eval_correct_allele dataset is shown to be concordant at that site, because the allele listed in this dataset and dbSNP match.

TiTvVariantEvaluator

TiTvVariantEvaluator is an evaluation module that computes the following metrics:

Metric Definition
nTi number of transition loci
nTv number of transversion loci
tiTvRatio the transition to transversion ratio
nTiInComp number of comp transition sites
nTvInComp number of comp transversion sites
TiTvRatioStandard the transition to transversion ratio for comp sites

Return to top Comment on this article in the forum