GenotypeConcordance
GenotypeConcordance is an evaluation module for VariantEval. It is comprised of two subtables (a detailed table and a simplified table) that calculate per-sample and overall metrics related to genotype concordance.
Contents |
Introduction
GenotypeConcordance computes two different tables of metrics related to genotype concordance of an evaluation callset to a comparison callset. These tables are:
| Table | Description |
|---|---|
| detailedStats | the concordance statistics for each sample |
| simplifiedStats | the concordance statistics summary for each sample |
Understanding the output of each subtable
GenotypeConcordance.detailedStats
Given an evaluation track and a comparison track, this table computes the number of homozygous-reference, heterozygous, and homozygous-variant sites per sample in the comparison track. Then, for each of those classes, it compares the called genotype in the evaluation track to the given genotype in the comparison track, and emits a table specifying the full "confusion matrix" (denoting how many genotypes are consistent with the comparison data, and if they are different, what genotype they are called instead).
This table has the following columns:
| Table | Description |
|---|---|
| row | the sample name |
| total_true_ref | the total number of true homozygous-reference sites in the sample (as indicated in the comparison track) |
| pct_ref_vs_ref | the percentage of hom-ref sites in the comparison track that are called hom-ref in the evaluation track |
| n_ref_vs_no_call | the number of hom-ref sites in the comparison track that are no-calls in the evaluation track |
| n_ref_vs_ref | the number of hom-ref sites in the comparison track that are called hom-ref in the evaluation track |
| n_ref_vs_het | the number of hom-ref sites in the comparison track that are called heterozygous in the evaluation track |
| n_ref_vs_hom | the number of hom-ref sites in the comparison track that are called hom-var in the evaluation track |
| total_true_het | the total number of true heterozygous-reference sites in the sample (as indicated in the comparison track) |
| pct_het_vs_het | the percentage of het sites in the comparison track that are called het in the evaluation track |
| n_het_vs_no_call | the number of het sites in the comparison track that are no-calls in the evaluation track |
| n_het_vs_ref | the number of het sites in the comparison track that are called hom-ref in the evaluation track |
| n_het_vs_het | the number of het sites in the comparison track that are called het in the evaluation track |
| n_het_vs_hom | the number of het sites in the comparison track that are called hom-var in the evaluation track |
| total_true_hom | the total number of true homozygous-variant sites in the sample (as indicated in the comparison track) |
| pct_hom_vs_hom | the percentage of hom-var sites in the comparison track that are called hom-var in the evaluation track |
| n_hom_vs_no_call | the percentage of hom-var sites in the comparison track that are no-calls in the evaluation track |
| n_hom_vs_ref | the percentage of hom-var sites in the comparison track that are called hom-ref in the evaluation track |
| n_hom_vs_het | the percentage of hom-var sites in the comparison track that are called het in the evaluation track |
| n_hom_vs_hom | the percentage of hom-var sites in the comparison track that are called hom-var in the evaluation track |
GenotypeConcordance.simplifiedStats
This table is a simplified version of the one above, containing only the percentage of genotypes of each class called correctly, and the non-reference sensitivity, overall genotype concordance, and non-reference discrepancy rate metrics commonly used in evaluating a variant callset against comparison data.
| Table | Description |
|---|---|
| row | the sample name |
| percent_comp_ref_called_ref | the percentage of hom-ref sites in the comparison track that are called hom-ref in the evaluation track |
| percent_comp_het_called_het | the percentage of het sites in the comparison track that are called het in the evaluation track |
| percent_comp_hom_called_hom | the percentage of hom-var sites in the comparison track that are called hom-var in the evaluation track |
| percent_non_reference_sensitivity | Measures fraction of sites called variant (A/B or B/B) in comparison that are also called variant in evaluation data. |
| percent_overall_genotype_concordance | Measures accuracy of genotype calls at all loci (excluding no-calls in either set). This is often biased towards A/A loci and is not recommended for routine analysis. |
| percent_non_reference_discrepancy_rate | Measures accuracy of genotype calls at sites called by both sets (excluding concordant A/A genotypes since these are often large in number and easier to get correct). This is a good metric for assaying accuracy of genotype calls. |
GenotypeConcordance.simplifiedStats example
The table below gives an example of what the non-reference sensitivity (NRS), non-reference discrepancy rate (NRD), and overall genotype concordance (OGC) would be given the number of genotypes of each class present in the evaluation set and the comparison set.