The answer depends on what tool we're talking about, and whether we're considering variant discovery or variant manipulation.
GATK variant manipulation tools are able to recognize the following types of alleles:
<NON-REF>allele used in GVCFs produced by HaplotypeCaller, the
*allele used to signify the presence of a spanning deletion, or undefined events like a very large allele or one that's fuzzy and not fully modeled; i.e. there's some event going on here but we don't know what exactly)
Note that SelectVariants, the GATK tool most used for VCF subsetting operations, discriminates strictly between these categories. This means that if you use for example
INDEL to pull out indels, it will only select pure INDEL records, excluding any MIXED records that might include a SNP allele in addition to the insertion or deletion alleles of interest. To include those you would have to also specify
selectType MIXED in the same command.
The HaplotypeCaller is a sophisticated variant caller that can call different types of variants at the same time. So in addition to SNPs and indels, it is capable of emitting mixed records by default, as well as symbolic representations for e.g. spanning deletions. It does emit physical phasing information, but in its current version, HC is not able to emit MNPs. If you would like to combine contiguous SNPs into MNPs, you will need to use the ReadBackedPhasing tool with the MNP merging function activated. See the tool documentation for details. Our older (and now deprecated) variant caller, UnifiedGenotyper, was even more limited. It only called SNPs and indels, and did so separately (even if you ran in calling mode BOTH, the program performed separate calling operations internally) so it was not able to recognize that SNPs and Indels should be emitted together as a joint record when they occur at the same site.
The general release version of GATK is currently not able to detect SVs (structural variations) or CNVs (copy number variations). However, the alpha version of GATK 4 (the next generation of GATK tools) includes tools for performing CNV (copy number variation) analysis in exome data. Let us know if you're interested in trying them out by commenting on this article in the forum.
There is also a third-party software package called GenomeSTRiP built on top of GATK that provides SV (structural variation) analysis capabilities.
We support the Variant Call Format (VCF) for variant callsets. No other file formats are supported.
No, we like VCF and we think it's important to have a good standard format. Multiplying formats just makes life hard for everyone, both developers and analysts.
Dear, GATK team, I have done raw snp and indel calling with UnifiedGenotyper following the command line below.
java -Xmx16g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -glm BOTH -R ucsc.hg19.fasta -T UnifiedGenotyper -I ERR031029.marked.realigned.fixed.recal.bam -I ERR031030.marked.realigned.fixed.recal.bam -D dbsnp_135.hg19.vcf -o ERR031030.raw.snps.indels.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000
After that, I did snp filteration using the following command lines.
java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T SelectVariants --variant ERR031030.raw.snps.indels.vcf -o ERR031030.snpsonly.vcf -selectType SNP java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T SelectVariants --variant ERR031030.raw.snps.indels.vcf -o ERR031030.indelsonly.vcf -selectType INDEL java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T VariantRecalibrator -R ucsc.hg19.fasta -input ERR031030.snpsonly.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_phase1.indels.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 dbsnp_135.hg19.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an MQ -mode SNP -recalFile ERR031030.snp.recal.vcf -tranchesFile ERR031030.snp.tranches.vcf -rscriptFile ERR031030.plots.R java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T ApplyRecalibration -input ERR031030.snpsonly.vcf -tranchesFile ERR031030.snp.tranches.vcf -recalFile ERR031030.snp.recal.vcf -o ERR031030.snps.filtered.vcf java -Xmx16g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T VariantFiltration --variant ERR031030.snps.filtered.vcf -o ERR031030.final.filtered.vcf --filterName "Nov28filters && QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5" --filterExpression "HaplotypeScore > 13.0"
The filtered snp.vcf file came up, however, it seems it contains some problem.
chrM 311 . T C 429.19 Nov28filters **_&& QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5;VQSRTrancheSNP99.90to100.00 AC=1;AF=0.250;AN=4;BaseQRankSum=-13.010;DP=2000;Dels=0.00;FS=50.500;HaplotypeScore=382.2016;MLEAC=1;MLEAF=0.250;MQ=50.86;MQ0=0;MQRankSum=1.458;QD=0.43;ReadPosRankSum=-10.687;VQSLOD=-6.143e+02;culprit=HaplotypeScore GT:AD:DP:GQ:PL 0/0:634,353:949:99:0,232,7697 0/1:463,521:945:99:459,0,4190 chrM 410 . A T 64750.20 PASS AC=4;AF=1.00;AN=4;DP=2000;Dels=0.00;FS=0.000;HaplotypeScore=7.3762;MLEAC=4;MLEAF=1.00;MQ=56.04;MQ0=0;QD=32.38;VQSLOD=2.27;culprit=HaplotypeScore GT:AD:DP:GQ:PL 1/1:0,998:998:99:32010,2926,0 1/1:0,999:999:99:32767,2912,0 chrM 711 . G A 62989.20 PASS AC=4;AF=1.00;AN=4;BaseQRankSum=2.500;DP=2000;Dels=0.00;FS=3.751;HaplotypeScore=8.7084;MLEAC=4;MLEAF=1.00;MQ=56.74;MQ0=1;MQRankSum=-0.107;QD=31.49;ReadPosRankSum=-2.169;VQSLOD=2.46;culprit=HaplotypeScore GT:AD:DP:GQ:PL 1/1:0,998:972:99:30899,2808,0 1/1:3,997:972:99:32117,2830,0 chrM 1121 . T C 16719.20 Nov28filters && QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5;VQSRTrancheSNP99.90to100.00 AC=4;AF=1.00;AN=4;BaseQRankSum=-0.239;DP=2000;Dels=0.00;FS=2.141;HaplotypeScore=22.9003;MLEAC=4;MLEAF=1.00;MQ=21.32;MQ0=703;MQRankSum=-1.627;QD=8.36;ReadPosRankSum=-0.027;VQSLOD=-4.195e+00;culprit=HaplotypeScore GT:AD:DP:GQ:PL 1/1:3,985:986:99:9547,976,0 1/1:4,983:983:99:7199,739,0 chrM 2489 . A C 34.19 LowQual;Nov28filters && QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5 AC=1;AF=0.250;AN=4;BaseQRankSum=-17.321;DP=2000;Dels=0.00;FS=180.208;HaplotypeScore=18.7245;MLEAC=1;MLEAF=0.250;MQ=46.52;MQ0=31;MQRankSum=3.365;QD=0.03;ReadPosRankSum=-4.198 GT:AD:DP:GQ:PL 0/1:278,719:950:64:64,0,4623 0/0:309,688:950:99:0,263,6065
For the filter option, most of the filtered snps show Nov28filters rather than PASS or LowQual, what's wrong with that, Are there some problems with my command lines? Thank you so much for your reply.
I am using the following set of commands on GATK2.1.13 to generate a VCF file
echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -I B2_with_ReadGroup.ddup.sorted.bam -R human_g1k_v37.fasta -T RealignerTargetCreator -o my.intervals -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key` echo "Realignment Done at `date`" echo "Starting IndelRealigner at `date`" echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -I B2_with_ReadGroup.ddup.sorted.bam -R human_g1k_v37.fasta -T IndelRealigner -targetIntervals my.intervals -o myrealignedBam.bam -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key` echo "Realignment done at `date`" echo "Starting UnifiedGenotyper at `date`" echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -l INFO -R human_g1k_v37.fasta -T UnifiedGenotyper -I myrealignedBam.bam -o mygatk_vcf.vcf --output_mode EMIT_ALL_SITES -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key` echo "Gentoypxing complete at `date`"
When i do a 'mpileup' for B2_with_ReadGroup.ddup.sorted.bam , I get a devcent 10 MB VCF file. But on the last ste of the above pipeline, my " mygatk_vcf.vcf " is goinging into 81GBs !!
Do you know what is wrong ?
Could you tell me when we can use new version of SnpEff with GATK?
I have some bugs :
caused by exception org.broadinstitute.sting.gatk.walkers.annotator.interfaces.ExperimentalAnnotation.
I don't know if I forget some other options linked these annotations. These options are important for me. So I deleted them but if somebody want to use them ...