Hello, I have an alignment with 140 reference reads (Ref Base = C) and 10 variant Reads (Var Base = T) at locus: Chr17:7578406. When I use the "EMIT_ALL_SITES" mode, the UnifiedGenotyper generates the following output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT z 17 7578406 rs28934578 C A 0 LowQual AC=0;AF=0.00;AN=2;DB;DP=150;Dels=0.00;FS=0.000;HaplotypeScore=11.1188;MLEAC=0;MLEAF=0.00;MQ=38.26;MQ0=0 GT:AD:DP:GQ:PL 0/0:140,0:150:99:0,361,4503
My questions are:
1) In ALT column, why is the base "A" being shown? Is it a randomly selected base when no SNP is identified at that position?
2) The ID column shows dbSNP record rs28934578 which is a C>T mutation (which is what my data has). Why is the dbSNP records for C>T mutation in the output when no variant is identified at this position (or C>A variant is shown?). Does this imply that ID column shows ALL dbSNP records at that position rather than a dbSNP record of the identified variant?
3) Is there a document that details the VCF output when EMIT_ALL_VARIANTS is used so I could understand the output vcf?
I am implementing a tool that uses reads to identify potential sites for cancer, but I need a reliable genotype call from non-cancerous tissue. Currently, I'm trying to use GATK to make those calls, filter out the bad, and then use the remaining calls when I look over the raw reads in the cancer tissue.
The problem I'm finding is that GATK doesn't provide a GQ for variants that are homozygous for the reference, and I don't know how to correlate the QUAL with the GQs from those homozygous non-ref. Is there a way that I'm overlooking to have GATK provide the GQ for homozygous reference non-variant sites?
Hello, I noticed that you can use emit_all_sites in UnifiedGenotyper to gather information on alternate alleles at every base provided in the interval file. Is there a way to do this same process but with indels? I want to pass in an interval list of indel locations and a bam file and for every site in the interval list get number of reads supporting the indel and number of reference. Thanks you.
I would like to trigger calls at HapMap sites even if they are HOM_REF in my sample. I used to accomplish this in an older GATK version with the following parameter passed to the UnifiedGenotyper: -B:trigger,VCF hapmap.vcf Right now I am using version 1.6 of the GATK. How could I accomplish exactly the same with this new version?
What I am trying to do (when doing VariantEval on the detected SNPs) is to obtain GenotypeConcordance for all: HETs, HOM_REFs, and HOM_VAR. Currently I only get the concordance values for HETs and HOM_VAR on the VariantEval output. Asked in a different way, how could I get the 'n_true_HOM_REF_called_*' fields populated in the VariantEval?
Thanks for your help, Gene
I have used UnifiedGenotyper with the EMIT_ALL_SITES option on selected sites of interest. However, in some cases a genotype is called (see below), but no probability is given. Is it possible to force UG to write those probabilities?
Here is a monomorphic site for which no probabilities are given: 1 100008719 . C .13.19 . AN=76;DP=70;MQ=55.57;MQ0=0 GT:DP ./. ./. 0/0:2 ./. 0/0:1 0/0:2 0/0:2 0/0:1 ./. 0/0:2 0/0:3 0/0:2 ./. 0/0:1 0/0:1 0/0:3 ./. 0/0:2 ./. ./. ./. 0/0:1 0/0:1 0/0:3 0/0:1 ./. 0/0:1 0/0:1 ./. 0/0:1 0/0:1 ./. 0/0:2 ./. ./. ./. 0/0:3 ./. 0/0:1 0/0:1 0/0:2 ./. ./. ./. 0/0:2 ./. 0/0:1 ./.0/0:1 ./. ./. 0/0:3 ./. 0/0:1 0/0:3 0/0:1 0/0:2 ./. 0/0:1 0/0:3 0/0:3 ./. 0/0:2 ./. ./. 0/0:1 ./.
I think I can solve my problem by using the GENOTYPE_GIVEN_ALLELES option in UG. I'm currently checking it.