I am using snpSift for variant annotation. Now I want to add CADD for the variant annotation, and was easily directed to dbNSFP.
According to its official site (https://sites.google.com/site/jpopgen/dbNSFP), dbNSFP has integrated CADD.
But when I run snpSift using dbNSFP, CADD is not in the output (while others like SIFT, Polyphen have no problem to be output).
Thanks in advance!
I have used several tools from the GATK and now I am wondering what is the next step that I should proceed. Would be great if you could give me some help. I had raw reads coming from a metagenomic sample that had been mapped against a reference genome of Bathycoccus prasinos. The resulting BAM file had been realigned for INDEL, then sorted. Then I ran the following command line: java -jar apps/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -R genome/Bathycoccus_genome_FINAL_RELEASE.fasta -T HaplotypeCaller -I BAMfile/ReadsConcat_Bathy_VerySensitiveLocal_bowtie_sorted_readsgroup.realign.bam -o output_HaplotypeCaller_BAMrealigned.vcf In order to create the vcf file that store all the variants. Then I thought that I should run the VariantAnnotator tools but it is just creating the same vcf file again. I would like to detect the effect of all the modifications founded in my field sequences compared to my reference genome . I was wondering if there is a tools implemented in GATK that will do that? Like SnpEff for example?
Hi, I first ran snpEff using the command: java -jar snpEff.jar eff -v -i vcf -o gatk phased.vcf > snpEff_output.vcf
which gave me the snpEff_output.vcf file that looks something like this:
Chr1 7050 . T A 22.76 LowQual AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=11.38 G T:AD:DP:GQ:PL 1/1:0,2:2:6:50,6,0
Chr1 7328 . A G 21.77 LowQual AC=2;AF=1.00;AN=2;DP=3;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=21.77;EFF= synonymous_variant(LOW|SILENT|tcA/tcG|p.Ser188Ser/c.564A>G|702|LOC_Os01g01010|protein_coding|CODING|LOC_Os01g01010.1|5|1),synonymous_va riant(LOW|SILENT|tcA/tcG|p.Ser188Ser/c.564A>G|616|LOC_Os01g01010|protein_coding|CODING|LOC_Os01g01010.2|5|1) GT:AD:DP:GQ:PL 1/1:0,3 :3:6:49,6,0
Chr1 9055 . T A 23.76 LowQual AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=23.76 G T:AD:DP:GQ:PL 1/1:0,2:2:6:51,6,0
Chr1 9233 . T C 45.28 LowQual AC=2;AF=1.00;AN=2;DP=3;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=22.64;EFF= synonymous_variant(LOW|SILENT|aaT/aaC|p.Asn539Asn/c.1617T>C|616|LOC_Os01g01010|protein_coding|CODING|LOC_Os01g01010.2|9|1),synonymous_v ariant(LOW|SILENT|aaT/aaC|p.Asn539Asn/c.1617T>C|702|LOC_Os01g01010|protein_coding|CODING|LOC_Os01g01010.1|9|1) GT:AD:DP:GQ:PL 1/1:0,3 :3:9:73,9,0
Then I ran VariantAnnotator using the command: java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R genome.fasta -A SnpEff --variant phased.vcf --snpEffFile snpEff_output.vcf -o annotated.vcf
but it gave me a warning for every effect added by snpEff that look like this (the annotated.vcf file was non-empty though):
WARN 20:00:30,645 SnpEff - Skipping malformed SnpEff effect field at ChrSy:525610. Error was: "synonymous_variant is not a recognized effect type". Field was: "synonymous_variant(LOW|SILENT|caG/caA|p.Gln171Gln/c.513G>A|ChrSy.fgenesh.gene.77|protein_coding|CODING|ChrSy.fgenesh.mRNA.77|2|WARNING_TRANSCRIPT_INCOMPLETE)" WARN 20:00:30,649 SnpEff - Skipping malformed SnpEff effect field at ChrSy:537868. Error was: "missense_variant is not a recognized effect type". Field was: "missense_variant(LOW|MISSENSE|Gat/Aat|p.Asp298Asn/c.892G>A|ChrSy.fgenesh.gene.79|protein_coding|CODING|ChrSy.fgenesh.mRNA.79|5|WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS)" WARN 20:00:30,650 SnpEff - Skipping malformed SnpEff effect field at ChrSy:538754. Error was: "stop_gained is not a recognized effect type". Field was: "stop_gained(LOW|NONSENSE|Cag/Tag|p.Gln137*/c.409C>T|ChrSy.fgenesh.gene.79|protein_coding|CODING|ChrSy.fgenesh.mRNA.79|3|WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS)" WARN 20:00:30,650 SnpEff - Skipping malformed SnpEff effect field at ChrSy:538769. Error was: "missense_variant is not a recognized effect type". Field was: "missense_variant(LOW|MISSENSE|Gac/Aac|p.Asp132Asn/c.394G>A|ChrSy.fgenesh.gene.79|protein_coding|CODING|ChrSy.fgenesh.mRNA.79|3|WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS)"
What am In doing wrong? Clearly, snpEff outputs the effects in terms/words that GATK does not accept. Please help. Best, nb
Hi. I am reading the manual for GATK and SNPeff version must be 2.0.5, but I don't see anything from the SNPeff manual to give further details...is the version 2.0.5 the only supported currently?
I am planning to use snpEff to map my variant data (SNPs & Indels) to genes data.
My query is
Thanks for your input!
Hi, We use GATK with snpEff to annotate variants of our WES experiments. We use VariantAnnotator to keep only the effect with highest impact. We've realized that for some experiments it could be useful for us to customize priorities assignment (i.e. if a variant is categorized both as DOWNSTREAM for a gene and INTRON for the adjacent gene, the effect picked by default is DOWNSTREAM while we'd like to keep INTRON, etc.) Is there a way to modify VariantAnnotator default behaviour?
I have the genomes of several isolates of a parasite, and I would like to investigate synonymous/non-synonymous substitution for identifying potential antigens, as well as SNPs genome-wide and I am wondering how well BWA/GATK are suited for this purpose. I've been told that BWA is only very good with sequences <2% divergent, and some of the antigens in this specie are known to be >20% divergent. However, I also know that GATK does local realignments of indels. So I would specifically like to know - is BWA/GATK good for looking at substitutions/SNPs in highly variable genes, and if not which other alignment tools are compatible and appropriate for this purpose?
Hi GATK Team,
I have been experimenting with newer versions of snpEff and would like to incorporate a newer version into my local GATK build. However, when I update the jar to the snpeff 3.0 release (by updating the ivy specification and copying the jar in into the proper settings/repository/... directory) I get an internal scalar compiler error:
scala.compile.public: [mkdir] Created dir: /hpc/users/lindem03/packages/gatk-mssm/dev/build/scala/classes [echo] Building Scala... [scalac] Compiling 87 source files to /hpc/users/lindem03/packages/gatk-mssm/dev/build/scala/classes [scalac] Compiling 107 source files to /hpc/users/lindem03/packages/gatk-mssm/dev/build/scala/classes [scalac] Exception in thread "main" java.lang.AssertionError: assertion failed: List(object Byte, object Byte) [scalac] at scla.tools.nsc.symtab.Symbols$Symbol.suchThat(Symbols.scala:1056) [scalac] at scala.tools.nsc.symtab.Symbols$Symbol.companionModule0(Symbols.scala:1271) [scalac] at scala.tools.nsc.symtab.Symbols$Symbol.companionModule(Symbols.scala:1281) [scalac] at scala.tools.nsc.symtab.Symbols$Symbol.linkedClassOfClass(Symbols.scala:1302) [scalac] at scala.tools.nsc.symtab.Definitions$definitions$.addModuleMethod$1(Definitions.scala:711) [scalac] at scala.tools.nsc.symtab.Definitions$definitions$.initValueClasses(Definitions.scala:714) [scalac] at scala.tools.nsc.symtab.Definitions$definitions$.init(Definitions.scala:791) [scalac] at scala.tools.nsc.Global$Run.<init>(Global.scala:604) [scalac] at scala.tools.nsc.Main$.process(Main.scala:105) [scalac] at scala.tools.nsc.Main$.main(Main.scala:120) [scalac] at scala.tools.nsc.Main.main(Main.scala)
Any thoughts? Do I need to modify the snpEff jar in some way to get it play nicely with GATK? That error is so opaque I am not sure where to start debugging. This is against a "clean" directory (I ran
ant clean) with java 1.6.0_30
$ java -version java version "1.6.0_30" Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode)
and GATK 1.6-24-gdc14575.
Broad recommends using snpEff to add annotations to VCF files created by GATK. This gives annotations about the effect of a given variant: is it in a coding region? Does it cause a frameshift? What transcripts are impacted? etc. However, snpEff does not provide other annotations you might want, such as 1000 genomes minor allele frequency, SIFT scores, phyloP conservation scores, and so on. I've previously used annovar to get those sorts of things, and that worked well enough, though I did not find it to be especially user-friendly.
So my question is, what other ways have users found of getting this sort of annotation information? I'm interested specifically in human exomes, but I am sure other users reading this Ask the Community post will be interested in answers for other organisms as well. I'm looking for recommendations on what's quick, simple, easy to use, and has been used successfully with VCFs produced by GATK. I'm open to answers in the form of other software tools or sources of raw data that I can easily manipulate on my own.
Thanks in advance.
Hi there, I've done with a run of HaplotypeCaller on my samples. I'm now analysing everything with snpEff, although I'm doing this "outside" GATK. I had to stop the analysis because a huge number of errors, all dealing with indels, such as:
Error while processing VCF entry (line 580649) : chr21 26718345 . TAATCCTGAGTTTAA TATCCTAAATGTTTAC 943.26 […] java.lang.RuntimeException: Insertion '-A+AT' does not start with '+'. This should never happen! chr21 35260360 . CATAACAGTTCAT AGAGACAGAG 425.22 […] java.lang.RuntimeException: Deletion '+G-TTC' does not start with '-'. This should never happen!
Of course, this is a snpEff error, nevertheless the Indel format looks quite different from what I've ever seen. Consider the first line above: shouldn't it be like
chr21 26718345 . AT T 943.26 […]
(I can't resolve the second right now). Any hint is appreciated at this point. I'm writing to snpEff developer for the same reason...