I just finished running a fairly large number of WGS samples through HaplotypeCaller and I've been using VariantEval to look at some summary stats on these samples. I've noticed that under '#:GATKTable:VariantSummary:1000 Genomes Phase I summary of variants table' there's a section on structural variations and that apparently I'm getting about 3500 in one of my samples. Here's the actual section of the table in question:
#:GATKTable:20:3:%s:%s:%s:%s:%s:%d:%d:%d:%.2f:%s:%d:%.2f:%.1f:%d:%s:%d:%.1f:%d:%s:%d:; #:GATKTable:VariantSummary:1000 Genomes Phase I summary of variants table VariantSummary CompRod EvalRod JexlExpression Novelty nSamples nProcessedLoci nSNPs TiTvRatio SNPNoveltyRate nSNPsPerSample TiTvRatioPerSample SNPDPPerSample nIndels IndelNoveltyRate nIndelsPerSample IndelDPPerSample nSVs SVNoveltyRate nSVsPerSample VariantSummary dbsnp vcf1 none all 1 3095693981 3446166 2.08 1.34 3446166 2.08 0.0 962028 15.33 962028 0.0 3282 73.58 3282 VariantSummary dbsnp vcf1 none known 1 3095693981 3399907 2.08 0.00 3399907 2.08 0.0 814506 0.00 814506 0.0 867 0.00 867 VariantSummary dbsnp vcf1 none novel 1 3095693981 46259 1.71 100.00 46259 1.71 0.0 147522 100.00 147522 0.0 2415 100.00 2415
I didn't think that HaplotypeCaller even looked for structural variations, so I tried to find these structural variations in the VCF, hoping they were encoded as described here and I couldn't find anything. Could someone tell me why VariantEval is showing a number of structural variations but the actual VCF isn't finding any? Does VariantEval just interpret a sufficiently large indel as a SV? If so, I can understand why it may call some structural variations considering there are indels longer than 1k bp in the indels of the sample.