Using the following command line: $ java -Xmx2g -jar /media/data/Applications/GenomeAnalysisTK.jar -R /media/data/GATK/hg19/ucsc.hg19.fasta -T VariantEval -noEV -ST FunctionalClass -o eval759_FunctionalClassreport --eval Sample_759.filtered.vcf -D /media/data/GATK/hg19/dbsnp_138.hg19.vcf
The tables show no data for missense, nonsense etc Eg
CompOverlap CompRod EvalRod FunctionalClass JexlExpression Novelty nEvalVariants novelSites nVariantsAtComp compRate nConcordant concordantRate CompOverlap dbsnp eval all none all 2850055 55006 2795049 98.07 2531746 90.58 CompOverlap dbsnp eval all none known 2795049 0 2795049 100 2531746 90.58 CompOverlap dbsnp eval all none novel 55006 55006 0 0 0 0 CompOverlap dbsnp eval missense none all 0 0 0 0 0 0 CompOverlap dbsnp eval missense none known 0 0 0 0 0 0 CompOverlap dbsnp eval missense none novel 0 0 0 0 0 0 CompOverlap dbsnp eval nonsense none all 0 0 0 0 0 0 CompOverlap dbsnp eval nonsense none known 0 0 0 0 0 0 CompOverlap dbsnp eval nonsense none novel 0 0 0 0 0 0 CompOverlap dbsnp eval silent none all 0 0 0 0 0 0 CompOverlap dbsnp eval silent none known 0 0 0 0 0 0 CompOverlap dbsnp eval silent none novel 0 0 0 0 0 0
This is my first foray into GATK but I thought everything in the command was OK. Is there something wrong with selection of the specific dbSNP file or reference that the SNP are not beng split into classes?
I'd like to be able to perform stratifications in a multi sample vcf, by values that are in the format fields. Almost all of the existing stratifications are based on site specific information rather than sample specific ones. One stratification in particular that I would like to perform is by ReadDepth. I would like to be able to differentiate for instance, all samples with ReadDepth greater than 20. This works in single sample vcfs, but it produces strange results in ones with multiple samples, since each VariantContext contains multiple genotypes.
Melting my vcfs and reporting multiple lines for each position seems possible, but ugly. Splitting vcfs so that each sample is in it's own vcf is also possible and ugly. What is the recommended method for dealing with this sort of stratification?
I would like to evaluate variant calls to produce a plot (psuedo-ROC) of sensitivity vs. specificity (or concordance, etc) when I condition on a minimum/maximum value for a particular metric (coverage, genotype quality, etc.). I can do this by running VariantEval or GenotypeConcordance multiple times, once for each cutoff value, but this is inefficient, since I believe I should be able to compute these values in one pass. Alternatively, if there was a simple tool to annotate each variant as concordance or discordant, I could tabulate the results myself. I would like to rely upon GATK's variant comparison logic to compare variants (especially indels). Any thoughts on if current tools can be parameterized, or adapted for these purposes?
Thanks for your help in advance,