Tagged with #emit_all_confident_sites
0 documentation articles | 0 announcements | 7 forum discussions

No articles to display.

No articles to display.

Created 2014-01-23 08:01:04 | Updated 2014-01-23 08:03:11 | Tags: unifiedgenotyper

Comments (3)

Hi, everyone.

from.. GATK Document

-out_mode,--output_mode specifies which sites to emit; possible values are EMIT_VARIANTS_ONLY (the default), EMIT_ALL_CONFIDENT_SITES (include confident reference sites), or EMIT_ALL_SITES (any callable site regardless of confidence).

I really want to know the meaning of confident reference site.

When I calling with the GATK UnifiedGenotyper EMIT_ALL_CONFIDENT_SITES option in each sample BAM file, Can I distinguish the genotype in each sample? (No call, Ref homo, Alt homo, Hetero)

In other words, I know the some site is no call or ref homo for this purpose.

Created 2013-08-16 21:12:22 | Updated | Tags: haplotypecaller

Comments (48)

If I run HaplotypeCaller with a VCF file as the intervals file, -stand_emit_conf 0, and -out_mode EMIT_ALL_SITES, should I get back an output VCF with all the sites from the input VCF, whether or not there was a variant call there? If not, is there a way to force output even if the calls are 0/0 or ./. for everyone in the cohort?

I have been trying to run HC with the above options, but I can't understand why some variants are included in my output file and others aren't. Some positions are output with no alternate allele and GTs of 0 for everyone. However, other positions that I know have coverage are not output at all.



Created 2013-02-04 16:59:48 | Updated 2013-02-06 21:31:28 | Tags: unifiedgenotyper

Comments (3)

I am using Unified Genotyper to call variants from multiple samples. I have used the emit_all_confident_sites flag. The output vcf file occasionally has two entries for one position. It is always a monomorphic site and the depth between the two entries is quite different. Usually one entry has very high depth & when I return to the original bam file, the depth does not match. Any idea what I am missing here?

Created 2012-12-24 03:25:30 | Updated | Tags: unifiedgenotyper indels

Comments (4)

When I use EMIT_ALL_CONFIDENT_SITES for SNPs, I get an expected very large list of genotypes regardless if the genotypes vary from the reference. When I use the same command line but I switch the model to Indels, I only get a VCF of variant sites. Is the EMIT_ALL_CONFIDENT_SITES option not compatible with Indel discovery?

I'm grateful for any clarification.

Created 2012-11-09 20:25:22 | Updated 2013-01-07 20:07:33 | Tags: unifiedgenotyper

Comments (1)


When I use UnifiedGenotyper with --genotype_likelihoods_model SNP --output_mode EMIT_ALL_CONFIDENT_SITES I get the reference SNP homozygote calls (or ./. if insufficient depth/quality etc). Great!

But when I use UnifiedGenotyper with --genotype_likelihoods_model INDEL --output_mode EMIT_ALL_CONFIDENT_SITES I only get non-reference calls, everything else (i.e. reference homozygotes, and anything uncallable) is ./.

I want to be able to select variants (SNPs and INDELs) on call rate across samples - as one would do for array genotype data. And avoid case-control bias due to differential missingness.


david van heel

Created 2012-10-18 17:13:42 | Updated 2012-10-19 01:14:10 | Tags: unifiedgenotyper vqsr

Comments (1)


Does VQSR behave differently when the -out_mode flag in UnifiedGenotyper is set to EMIT_VARIANTS_ONLY as compared to EMIT_ALL_CONFIDENT_SITES. I think by using EMIT_ALL_CONFIDENT_SITES we might give VQSR more information to train the model, but I may be wrong. Can someone please help me with this ? Thanks.

cheers, Rahul

Created 2012-09-13 17:17:38 | Updated 2012-09-13 17:41:47 | Tags:

Comments (1)


I previously reported an issue in which I could not emit all sites or all confident sites when using a ploidy of 1. I downloaded the most recent version and it seems to be able to print the reference calls in these modes. The odd thing is that the quality for all those calls is the same and really low, which I know isn't a reflection of reality in all cases given the knowledge we have about the sequence data and its relationship to the reference. It is also consistent across many samples, whether I run multiple samples through a single GATK run or a single sample on its own. I pasted in an example from a recent run (multiple samples in same run). The problem is that all of these reference calls get a LowQual filter, which makes it difficult to differentiate from good LowQual calls. Any thoughts as to if this is to be expected and why that might be?

Reference   72  .   T   .   3   LowQual DP=281;MQ=50.69;MQ0=0;NDA=1 GT  .   .   .   .   .   .
Reference   187 .   T   .   3   LowQual DP=301;MQ=51.00;MQ0=0;NDA=1 GT  .   .   .   .   .   .
Reference   188 .   C   .   3   LowQual DP=296;MQ=50.84;MQ0=0;NDA=1 GT  .   .   .   .   .   .
Reference   206 .   A   .   3   LowQual DP=292;MQ=50.14;MQ0=0;NDA=1 GT  .   .   .   .   .   .
Reference   1844    .   T   .   3   LowQual DP=369;MQ=58.59;MQ0=0;NDA=1 GT  .   .   .   .   .   .
Reference   1854    .   C   .   3   LowQual DP=363;MQ=58.63;MQ0=0;NDA=1 GT  .   .   .   .   .   .
Reference   1972    .   A   .   3   LowQual DP=345;MQ=59.11;MQ0=0;NDA=1 GT  .   .   .   .   .   .
Reference   1993    .   T   .   3   LowQual DP=355;MQ=58.54;MQ0=0;NDA=1 GT  .   .   .   .   .   .
Reference   2096    .   C   .   3   LowQual DP=355;MQ=58.92;MQ0=0;NDA=1 GT  .   .   .   .   .   .
Reference   2376    .   T   C   1105.23 .   AC=1;AF=0.167;AN=6;BaseQRankSum=-10.910;DP=417;Dels=0.00;FS=27.994;HaplotypeScore=1.6883;MLEAC=1;MLEAF=0.167;MQ=58.90;MQ0=0;MQRankSum=0.195;NDA=1;QD=16.75;ReadPosRankSum=-5.021;SB=-4.370e+02;Samples=Ba-4599_4    GT:AD:DP:GQ:MLPSAC:MLPSAF:PL    0:43,0:43:99:0:0.00:0,1813  1:0,66:66:99:1:1.00:1143,0  0:38,0:38:99:0:0.00:0,1627  0:127,0:127:99:0:0.00:0,5248    0:42,0:42:99:0:0.00:0,1739  0:100,0:100:99:0:0.00:0,4166

Thanks, John