Tagged with #genotypeconcordance
0 documentation articles | 0 events or announcements | 4 forum discussions


Sorry, there are no publicly available documents of this type with the tag #genotypeconcordance. Try one of the other types.
Sorry, there are no publicly available documents of this type with the tag #genotypeconcordance. Try one of the other types.

Hi all,

I'd like to know if someone has tested the concordance from output of PhaseByTransmission with SNP array data.

I have calculated the genotype concordance for the most likely GT combination from the VCF obtained from unified genotyper for a family trio based on the GL values against SNP array data and then did the same for the genotypes obtained after using PhaseByTransmission and I'm seeing a drop in concordance.

Is this to be expected?

Thanks!

Hi,

I have the following problem:

I am evaluating genotype concordance using:

-T VariantEval --evalModule GenotypeConcordance -comp ref.vcf -eval sample1.observed.vcf

If I use a reference genotype file with multiple samples in it where one of the genotype columns is NA1234 (the sample in question), then the sensitivity for all SNP types (HOM_REF,HET,HOM_VAR) decreases drastically. This is because the GATK gets confused when there is more than one sample in the reference file. I know this because if I use a reference genotype file (ref.vcf) with only a single hapmap sample (NA1234) everything works fine and sensitivity is good. So this is not a detection problem is a problem when SNPs are being compared against the reference.

I tried passing the sample name using the --sample parameter for -T VariantEval, but this does not work either (sensitivity is still way off).

In previous versions of the GATK this was done automatically where genotypes where compared based on the sample name within the detection vcf file (sample1.observed.vcf ) vs the ref.vcf file without having to specify the sample name explicitly.

How can I avoid this problem? I want to have a master reference genotype file with multiple samples that I can use for different samples.

I am using GATK version v1.6

Thank you, Gene

I'm looking to find all the entries that change between two calls to UG on the same data. I would like to find all the entries where the call in the variant tract are different from those in the comparison track. So in effect I want those entries that would not be result from -using -conc in SelectVariants. From the documentation is is unclear if the -disc option does this:

A site is considered discordant if there exists some sample in the variant track that has a non-reference genotype and either the site isn't present in this track, the sample isn't present in this track, or the sample is called reference in this track.

What if the comp is HOM_VAR and the variant track is HET? Or if they are both HET but disagree on the specific allele?

Thanks.

Hi,

I am using VariantEval --evalModule GenotypeConcordance in order to establish concordance and sensitivity metrics against a HapMap reference. In the resulting GATK report I obtain the following fields for a given SNP category (example with HETs):

GenotypeConcordance  CompRod  EvalRod  JexlExpression  Novelty  variable                                value---
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_HET                       6220
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_HOM_REF                      0
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_HOM_VAR                     20
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_MIXED                        0
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_NO_CALL                    318
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_UNAVAILABLE                  0

What is the meaning of the _MIXED and _UNAVAILABLE fields?

Thx, Gene