VariantEval Structural Variation
Posted in Ask the GATK team | Last updated on


Comments (5)

Hi,

I just finished running a fairly large number of WGS samples through HaplotypeCaller and I've been using VariantEval to look at some summary stats on these samples. I've noticed that under '#:GATKTable:VariantSummary:1000 Genomes Phase I summary of variants table' there's a section on structural variations and that apparently I'm getting about 3500 in one of my samples. Here's the actual section of the table in question:

#:GATKTable:20:3:%s:%s:%s:%s:%s:%d:%d:%d:%.2f:%s:%d:%.2f:%.1f:%d:%s:%d:%.1f:%d:%s:%d:;
#:GATKTable:VariantSummary:1000 Genomes Phase I summary of variants table
VariantSummary  CompRod  EvalRod  JexlExpression  Novelty  nSamples  nProcessedLoci  nSNPs    TiTvRatio  SNPNoveltyRate  nSNPsPerSample  TiTvRatioPerSample  SNPDPPerSample  nIndels  IndelNoveltyRate  nIndelsPerSample  IndelDPPerSample  nSVs  SVNoveltyRate  nSVsPerSample
VariantSummary  dbsnp    vcf1     none            all             1      3095693981  3446166       2.08            1.34         3446166                2.08             0.0   962028             15.33            962028               0.0  3282          73.58           3282
VariantSummary  dbsnp    vcf1     none            known           1      3095693981  3399907       2.08            0.00         3399907                2.08             0.0   814506              0.00            814506               0.0   867           0.00            867
VariantSummary  dbsnp    vcf1     none            novel           1      3095693981    46259       1.71          100.00           46259                1.71             0.0   147522            100.00            147522               0.0  2415         100.00           2415

I didn't think that HaplotypeCaller even looked for structural variations, so I tried to find these structural variations in the VCF, hoping they were encoded as described here and I couldn't find anything. Could someone tell me why VariantEval is showing a number of structural variations but the actual VCF isn't finding any? Does VariantEval just interpret a sufficiently large indel as a SV? If so, I can understand why it may call some structural variations considering there are indels longer than 1k bp in the indels of the sample.

Thanks,

Grant


Return to top Comment on this article in the forum