We are running the best practices pipeline for variant discovery with GATK 3-2.2. When running the HaplotypeCaller with the flags -A AlleleBalance & -A HomopolymerRun to generate a gVCF, there are no ABHet/ABHom or HRun annotations showing up in the gVCF. I tried running VariantAnnotator on the gVCF and still no annotations.
The documentation on both of these annotations state that they only work for biallelic variants. I suspected that the ,
Is there a way to either generate the gVCF without the ,
Thanks for you help.
I am using GATK 3.3-0 HaplotypeCaller for variant calling. When I run HaplotypeCaller with the command
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R all_chr_reordered.fa -I 30_S30.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 50 -o 30_S30_control1.vcf -L brca.intervals
I get all the variants I want, however, I also want to get the number of forward and reverse reads that support REF and ALT alleles. Therefore I use StrandBiasBySample annotation when running HaplotypeCaller with the command:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R all_chr_reordered.fa -I 30_S30.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 50 -o 30_S30_control2.vcf -L brca.intervals -A StrandBiasBySample
The SB field is added, but a variant that was in 30_S30_control1.vcf is absent in 30_S30_control2.vcf. All the remaining variants are there. The only difference between two variant calls was adding -A StrandBiasBySample. What I'm wondering about is that why this one variant is absent.
the missing variant: 17 41276152 . CAT C 615.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.147;ClippingRankSum=0.564;DP=639;FS=15.426;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-1.054;QD=0.96;ReadPosRankSum=0.698;SOR=2.181 GT:AD:DP:GQ:PL 0/1:565,70:635:99:653,0,18310
So I decided to run HaplotypeCaller without -A StrandBiasBySample and later add the annotations with VariantAnnotator. Here is the command:
java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R all_chr_reordered.fa -I 30_S30.bam --variant 30_S30_control1.vcf -L 30_S30_control1.vcf -o 30_S30_control1_SBBS.vcf -A StrandBiasBySample
However, the output vcf file 30_S30_control1_SBBS.vcf was not different from the input variant file 30_S30_control1.vcf except for the header, SB field wasn't added. Why was the SB field not added? Is there any other way to get the number of forward and reverse reads?
Please find 30_S30_control1.vcf, 30_S30_control2.vcf and 30_S30_control1_SBBS.vcf in attachment
Hey, GATK team,
I downloaded snpEff and browsed the database by inputting the commands as $ java -jar snpEff.jar databases
It provides a list of database of all available species including mouse's as follows (GRCm38.73 )
However, there is no database of GRCm38 under the actual databases at all.
Did I missed something or database is still under construction? Where can I download the mouse database? I need to do functional annotation of my variant VCF file.
In addition, according to the presentation materials in GATK, we will do variant filter/selectvariant right after annotation, or it does not matter which one goes first?
Could you tell me how to encourage GATK to annotate my genotype columns (i.e. add annotations to the FORMAT and PANC_R columns in the following file):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PANC_R chrX 259221 . GA G 136.74 . AC=2;AF=1.00;AN=2;DP=15;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=8.82;MQ0=1;QD=3.04 GT:AD:GQ:PL 1/1:0,2:6:164,6,0
The file was generated with HaplotypeCaller. I used a command line similar to this one to no effect:
java -jar $GATKROOT/GenomeAnalysisTK.jarT VariantAnnotator -R hg19_random.fa -I chr7_recalibrated.bam -V chr7.vcf --dbsnpdbSNP135_chr.vcf -A Coverage -A QualByDepth -A FisherStrand -A MappingQualityRankSumTest -A ReadPosRankSumTest -o chr7_annotated-again.vcf
Does anyone have any suggestions? Thanks in advance!
I have a set of VCFs with identical positions in them:
VCF1: 1 10097 . T . 26 . AN=196;DP=1622;MQ=20.06;MQ0=456 GT:DP
VCF2: 1 10097 . T . 21.34 . AN=198;DP=2338;MQ=19.53;MQ0=633 GT:DP
VCF3: 1 10097 . T . 11.70 . AN=240;DP=3957;MQ=19.74;MQ0=1085 GT:DP
VCF4: 1 10097 . T . 15.56 . AN=134;DP=1348;MQ=18.22;MQ0=442 GT:DP
If I use all of them as input for VariantRecalibrator, which annotations will VariantRecalibrator use? Should I instead merge the VCFs with CombineVariants and run VariantAnnotator, before I run VariantRecalibrator?
I'm not sure if the forum is for asking technical questions only or you are allowed to ask for best practices as well. Feel free to delete my question, if it doesn't belong here. Thank you.
I'm attempting to use Variant Annotator to annotate some VCFs produced by samtools so I can run VQSR on them. Unfortunately I've gottent stuck and I'm trying to figure out why Variant Annotator wouldn't be annotating INDELs with MappingQualityRankSumTest and ReadPosRankSumTest, it seems to annotate SNPs fine. There are both Homs and het's called on the sample. Could it be I need to left align the indels to get enough coverage? What would you suggest is the best way to debug this? Is there a way to make GATK behave more verbosely about why it's refusing an annotation?