This document describes the reference confidence model applied by HaplotypeCaller to generate genomic VCFs (gVCFS), invoked by
-ERC GVCF or
-ERC BP_RESOLUTION (see the FAQ on gVCFs for format details).
Please note that this document may be expanded with more detailed information in the near future.
The mode works by assembling the reads to create potential haplotypes, realigning the reads to their most likely haplotypes, and then projecting these reads back onto the reference sequence via their haplotypes to compute alignments of the reads to the reference. For each position in the genome we have either a non-reference call (via the standard calling mechanism) or we can estimate the chance that some (unknown) non-reference allele is segregating at this position by examining the realigned reads that span the reference base. At this base we perform two calculations:
Based on this, we emit the genotype likelihoods (
PL) and compute the
GQ (from the
PLs) for the least confidence of these two models.
We use a symbolic allele,
<NON_REF>, to indicate that the site is homozygous reference, and because we have an ALT allele we can provide allele-specific
PL field values.
For details of the gVCF format, please see the document that explains what is a gVCF.