# Tagged with #genotypeconcordance 0 documentation articles | 0 announcements | 14 forum discussions

No posts found with the requested search criteria.
No posts found with the requested search criteria.

Created 2015-07-07 15:01:06 | Updated | Tags: genotypeconcordance

Since vcf files usually do not contain invariant sites, the Sensitivity, Discrepancy, and Concordence computed by GC are not correct actually. How to solve this problem?

Even if the two input vcf files do contain invariant sites, those three metrics should be computed ONLY in the intersected sites of the two vcf files; but currently they are computed in the union sites.

Thanks for any suggestion!

Created 2015-05-21 06:27:17 | Updated | Tags: haplotypecaller genotypeconcordance

Dear GATK Team,

I performed a variant caller comparison, and the genotype concordance analysis of HaplotypeCaller's results a little strange to me, and I can't understand it at all.

ALLELES_MATCH: 0 EVAL_SUPERSET_TRUTH: 2438 EVAL_SUBSET_TRUTH: 0 ALLELES_DO_NOT_MATCH: 636 EVAL_ONLY: 20 TRUTH_ONLY: 1014546

The last and first value could be true? Or anyone can tell what could cause this? I got very different (and reasonable) results with other callers.

Many thanks!

Created 2015-04-23 23:58:48 | Updated | Tags: jexl genotypeconcordance

We have VCFs that we've applied genotype-level filters to using FilterVariants, and we are interested in using your –gfc/gfe flags in order to allow all genotypes with an non-"PASS" FT annotation to be set to no call for a concordance check using GenotypeConcordance.

I have tried using the follow JEXL expression 'FT!="PASS"' for –gfc and –gfe but this has not been successful in setting the genotypes that FAIL to no call. Is there way to achieve this genotype-level filtration within GATK GenotypeConcordace? Is there a different JEXL expression that I should be using?

(Asking on behalf of @azo121 who isn't able to post yet)

Thanks!

Created 2014-12-18 08:52:46 | Updated | Tags: genotypeconcordance gatk printinterestinsites

Hi, I didn't find the answer anywhere, so I'm asking here. I'm using GATK 2.6-4. I used GenotypeConcordance to compare two datasets, and it was great, exactly what I needed. But then, I was interesting in the discordant sites, and I saw that printInterestingSites should be a good solution. But I can't make it work. Just adding in my command line "-sites filename" doesn't work, and I've tried to be creative in changing this option. Then I thought, that maybe this option isn't working in my GATK version. Have I just done something wrong?

Created 2014-10-01 14:56:51 | Updated | Tags: genotypeconcordance

Could anyone help me with two questions in comparing my vcf file to gold standard? 1. Regarding sites present in my vcf but absent in gold standard, are they ignored? 2. Regarding sites absent in my vcf but present in gold standard, are they assumed 0/0?

Thanks,

Created 2014-09-17 16:45:28 | Updated | Tags: vcf genotypeconcordance

Hello, I just ran genotype concordance in order to determine how similar two samples were. However, in the output, everything is showing up as zero, the NRD determined is 1 and the overall genotype concordance is also 1. Has anyone encountered this before. The two samples I am using for --eval and --comp are actually the same sample that were sequenced using two different methodologies (HiSeq and Ion proton). So it is odd that I am getting this output. Any help is appreciated.

Thanks, Ricky

Created 2014-04-03 16:05:54 | Updated 2014-04-03 16:06:49 | Tags: genotypeconcordance

I'd like to check the accuracy of genotypes imputed from exome seq data of NA12891, as compared to those genotypes from GWAS chips. But the GWAS chips data I found from hapmap are all in b36, does anyone know where I can get those for b37?

If there is no chip data for b37 and since the liftover tool isn't working, I wonder how the SNPs are mapped between comp and eval in GenotypeConcordance? Are they mapped by SNP IDs or coordinates? If by SNP IDs, maybe it's OK to compare a vcf file in b37 to one in b36 since the SNP IDs are not changed?

Created 2014-03-27 14:50:15 | Updated | Tags: genotypeconcordance

Hi,

I hope this isn't a stupid question. I would like to compare genotypes between samples in a vcf. For example, in the vcf I have sample A, sample B and sample C and I would like to know the concordance between A and B and A and C. Is there a GATK tool that can do this? I have tried using GenotypeConcordance and VariantEval and supplying the multi-sample vcf as the file for evaluation (--eval) and a vcf containing only sample A (generated using SelectVariants) as the comparison (--comp). However, it doesn't produce the output I want. For example, GenotypeConcordance gives me concordance between ALL samples in the --eval file, rather than on a sample-by-sample basis.

Thanks

Kath

Created 2013-10-29 16:25:23 | Updated 2013-10-29 16:27:22 | Tags: genotypeconcordance

Just to make sure my understanding is correct:

HET: heterozygous
HOM_REF: homozygous reference
HOM_VAR: homozygous variant
MIXED: something like ./1
Mismatching_Alleles: ??
UNAVAILABLE: for internal use
ALLELES_MATCH: ??
ALLELES_DO_NOT_MATCH: ??
EVAL_ONLY: ??
TRUTH_ONLY: does it actually mean the variants present in comp but not in eval, like COMP_ONLY?


how does the following computed?

Non-Reference_Discrepancy
Non-Reference_Sensitivity
Overall_Genotype_Concordance


Thanks a lot!

Created 2013-06-10 21:09:29 | Updated | Tags: varianteval genotypeconcordance stratification

I would like to evaluate variant calls to produce a plot (psuedo-ROC) of sensitivity vs. specificity (or concordance, etc) when I condition on a minimum/maximum value for a particular metric (coverage, genotype quality, etc.). I can do this by running VariantEval or GenotypeConcordance multiple times, once for each cutoff value, but this is inefficient, since I believe I should be able to compute these values in one pass. Alternatively, if there was a simple tool to annotate each variant as concordance or discordant, I could tabulate the results myself. I would like to rely upon GATK's variant comparison logic to compare variants (especially indels). Any thoughts on if current tools can be parameterized, or adapted for these purposes?

N

Created 2012-11-06 19:59:43 | Updated 2012-11-06 20:03:47 | Tags: genotypeconcordance

Hi,

I have the following problem:

I am evaluating genotype concordance using:

-T VariantEval --evalModule GenotypeConcordance -comp ref.vcf -eval sample1.observed.vcf


If I use a reference genotype file with multiple samples in it where one of the genotype columns is NA1234 (the sample in question), then the sensitivity for all SNP types (HOM_REF,HET,HOM_VAR) decreases drastically. This is because the GATK gets confused when there is more than one sample in the reference file. I know this because if I use a reference genotype file (ref.vcf) with only a single hapmap sample (NA1234) everything works fine and sensitivity is good. So this is not a detection problem is a problem when SNPs are being compared against the reference.

I tried passing the sample name using the --sample parameter for -T VariantEval, but this does not work either (sensitivity is still way off).

In previous versions of the GATK this was done automatically where genotypes where compared based on the sample name within the detection vcf file (sample1.observed.vcf ) vs the ref.vcf file without having to specify the sample name explicitly.

How can I avoid this problem? I want to have a master reference genotype file with multiple samples that I can use for different samples.

I am using GATK version v1.6

Thank you, Gene

Created 2012-11-06 16:13:18 | Updated | Tags: selectvariants gatkdocs genotypeconcordance

I'm looking to find all the entries that change between two calls to UG on the same data. I would like to find all the entries where the call in the variant tract are different from those in the comparison track. So in effect I want those entries that would not be result from -using -conc in SelectVariants. From the documentation is is unclear if the -disc option does this:

A site is considered discordant if there exists some sample in the variant track that has a non-reference genotype and either the site isn't present in this track, the sample isn't present in this track, or the sample is called reference in this track.

What if the comp is HOM_VAR and the variant track is HET? Or if they are both HET but disagree on the specific allele?

Thanks.

Created 2012-10-10 17:04:41 | Updated 2013-01-07 20:30:55 | Tags: phasebytransmission genotypeconcordance community snparray

Hi all,

I'd like to know if someone has tested the concordance from output of PhaseByTransmission with SNP array data.

I have calculated the genotype concordance for the most likely GT combination from the VCF obtained from unified genotyper for a family trio based on the GL values against SNP array data and then did the same for the genotypes obtained after using PhaseByTransmission and I'm seeing a drop in concordance.

Is this to be expected?

Thanks!

Created 2012-09-24 21:04:12 | Updated 2012-09-25 15:53:45 | Tags: varianteval genotypeconcordance

Hi,

I am using VariantEval --evalModule GenotypeConcordance in order to establish concordance and sensitivity metrics against a HapMap reference. In the resulting GATK report I obtain the following fields for a given SNP category (example with HETs):

GenotypeConcordance  CompRod  EvalRod  JexlExpression  Novelty  variable                                value---
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_HET                       6220
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_HOM_REF                      0
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_HOM_VAR                     20
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_MIXED                        0
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_NO_CALL                    318
GenotypeConcordance  comp     eval     none            all      n_true_HET_called_UNAVAILABLE                  0


What is the meaning of the _MIXED and _UNAVAILABLE fields?

Thx, Gene