Tagged with #indel
1 documentation article | 0 announcements | 12 forum discussions

Created 2014-01-17 15:36:06 | Updated 2016-03-05 11:20:37 | Tags: snp variants symbolicallele indel cnv mnp sv

Comments (2)

The answer depends on what tool we're talking about, and whether we're considering variant discovery or variant manipulation.

Variant manipulation

GATK variant manipulation tools are able to recognize the following types of alleles:

  • SNP (single nucleotide polymorphism)
  • INDEL (insertion/deletion)
  • MIXED (combination of SNPs and indels at a single position)
  • MNP (multi-nucleotide polymorphism, e.g. a dinucleotide substitution)
  • SYMBOLIC (such as the <NON-REF> allele used in GVCFs produced by HaplotypeCaller, the * allele used to signify the presence of a spanning deletion, or undefined events like a very large allele or one that's fuzzy and not fully modeled; i.e. there's some event going on here but we don't know what exactly)

Note that SelectVariants, the GATK tool most used for VCF subsetting operations, discriminates strictly between these categories. This means that if you use for example -selectType INDEL to pull out indels, it will only select pure INDEL records, excluding any MIXED records that might include a SNP allele in addition to the insertion or deletion alleles of interest. To include those you would have to also specify selectType MIXED in the same command.

Variant discovery

The HaplotypeCaller is a sophisticated variant caller that can call different types of variants at the same time. So in addition to SNPs and indels, it is capable of emitting mixed records by default, as well as symbolic representations for e.g. spanning deletions. It does emit physical phasing information, but in its current version, HC is not able to emit MNPs. If you would like to combine contiguous SNPs into MNPs, you will need to use the ReadBackedPhasing tool with the MNP merging function activated. See the tool documentation for details. Our older (and now deprecated) variant caller, UnifiedGenotyper, was even more limited. It only called SNPs and indels, and did so separately (even if you ran in calling mode BOTH, the program performed separate calling operations internally) so it was not able to recognize that SNPs and Indels should be emitted together as a joint record when they occur at the same site.

The general release version of GATK is currently not able to detect SVs (structural variations) or CNVs (copy number variations). However, the alpha version of GATK 4 (the next generation of GATK tools) includes tools for performing CNV (copy number variation) analysis in exome data. Let us know if you're interested in trying them out by commenting on this article in the forum.

There is also a third-party software package called GenomeSTRiP built on top of GATK that provides SV (structural variation) analysis capabilities.

No articles to display.

Created 2015-10-17 21:23:46 | Updated | Tags: indel brca variant-calling

Comments (9)

Hello, I am implementing my pipeline for Roche 454 and I have used GATK for called of variants. I have compared my variants with Roche's variants and for BRCA1 my variants corresponded, but the position of indel for BRCA2 are wrong, the vcf file reported that them position is a few bp before (instead the positions of snp are right) I have searched on genome browser and I have seen that the positions of Roche were right.

For example, this are the coordinates that I have in my files. Roche GATK VARIANT 32893207 32893197 T/- 32900342 32900337 A/- 32900370 32900363 T/- 32900376 32900371 C/- 32900933 32900933 T/A 32907208 32907202 A/- 32907428 32907420 A/- 32907546 32907535 TTT/-

Could help me?

Created 2015-03-12 16:15:59 | Updated | Tags: unifiedgenotyper snp error indel

Comments (3)


when I run the UnifiedGenotyper to call INDELs, I get the following error (detailed command line output below) HAPLOTYPE_MAX_LENGTH must be > 0 but got 0

When I call only SNPs instead, it doe not occur. I have searched to find an answer to why this is happening, but cannot figure out the reason. Could this be a bug? I get the error no matter if I run the Realigner before or not.

Did you observe this problem before?

Command line output: java -Xmx10g -jar ~/work/tools/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /media/rebecca/UUI/Work/BputSem/BputSem_gapfilled.final.fa -I realigned_A.bam -gt_mode DISCOVERY -glm INDEL -stand_call_conf 30 -stand_emit_conf 10 -o rawINDELS_q30_A.vcf -ploidy 10

INFO 15:50:47,987 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:50:47,990 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22 INFO 15:50:47,990 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 15:50:47,990 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 15:50:47,996 HelpFormatter - Program Args: -T UnifiedGenotyper -R /media/rebecca/UUI/Work/BputSem/BputSem_gapfilled.final.fa -I realigned_A.bam -gt_mode DISCOVERY -glm INDEL -stand_call_conf 30 -stand_emit_conf 10 -o rawINDELS_q30_A.vcf -ploidy 10 INFO 15:50:48,002 HelpFormatter - Executing as rebecca@rebecca-ThinkPad-T440s on Linux 3.13.0-44-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_65-b32. INFO 15:50:48,002 HelpFormatter - Date/Time: 2015/03/12 15:50:47 INFO 15:50:48,002 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:50:48,002 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:50:48,132 GenomeAnalysisEngine - Strictness is SILENT INFO 15:50:48,402 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 INFO 15:50:48,409 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 15:50:48,447 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04 INFO 15:50:48,594 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 15:50:48,622 GenomeAnalysisEngine - Done preparing for traversal INFO 15:50:48,622 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 15:50:48,622 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 15:50:48,623 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime INFO 15:50:48,658 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. INFO 15:50:48,658 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. INFO 15:51:06,585 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: HAPLOTYPE_MAX_LENGTH must be > 0 but got 0 at org.broadinstitute.gatk.utils.pairhmm.PairHMM.initialize(PairHMM.java:97) at org.broadinstitute.gatk.utils.pairhmm.N2MemoryPairHMM.initialize(N2MemoryPairHMM.java:60) at org.broadinstitute.gatk.utils.pairhmm.LoglessPairHMM.initialize(LoglessPairHMM.java:66) at org.broadinstitute.gatk.utils.pairhmm.PairHMM.computeLikelihoods(PairHMM.java:194) at org.broadinstitute.gatk.tools.walkers.indels.PairHMMIndelErrorModel.computeGeneralReadHaplotypeLikelihoods(PairHMMIndelErrorModel.java:461) at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyIndelGenotypeLikelihoods.add(GeneralPloidyIndelGenotypeLikelihoods.java:201) at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyIndelGenotypeLikelihoods.add(GeneralPloidyIndelGenotypeLikelihoods.java:124) at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoodsCalculationModel.getLikelihoods(GeneralPloidyGenotypeLikelihoodsCalculationModel.java:270) at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateLikelihoods(UnifiedGenotypingEngine.java:317) at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotypingEngine.java:201) at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:379) at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:151) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR ------------------------------------------------------------------------------------------

Created 2015-02-12 13:25:05 | Updated | Tags: snp dp indel

Comments (15)


we are working with GATK 3.2.2 and focus on an optimization of the SNP and indel calling in the context of our data. Thereby, we also determine the frequency of a certain variant. However, we came across some strange results in this case. In the vcf files there is of course the DP in the info field, representing the unfiltered depth, and the DP in the format field, representing the filtered depth. Yet, we also calculated the depth by ourselves in the bam file and looked it up in the IGV. One especially strange case is the following: For a sample GATK calls a deletion. The unfiltered DP (info field) is DP=644. The filtered DP (format field) is DP=505. It consists of 438 reads with the original genotype and 67 reads containing the variant. However, in our raw bam file and in the IGV things are different. There we see 499 reads in total. 425 containing a C (reference), one with an N and 73 with a deletion. We checked and the results are reproduceable. Furthermore, we see similar differences as well. There are many cases in which the number of reads containing the variant or the reference allele as we count it and as it is observable in the IGV is smaller than the reported number of filtered reads. How can this happen? I'm really looking forward to your answers.


Created 2014-09-26 06:33:13 | Updated | Tags: haplotypecaller indel gvcf

Comments (2)

I have WES data of 235 samples, aligned using bwa aln. I followed GATK's Best Pratice for marking duplicates, realignment around INDEL, and BQSR until I got the recalibrated bamfiles. All these are done with GATK 2.7-2. After that I generated gvcf file from the recalibrated bamfiles, using HaplotypeCaller from GATK 3.1-1, followed by GenotypeGVCFs and VQSR.

I came across one INDEL, which passes VQSR,

The variant is chr14 92537378 . G GC 11330.02 PASS AC=11;AF=0.023;AN=470;BaseQRankSum=0.991;ClippingRankSum=-6.820e-01;DP= 13932;FS=0.000;GQ_MEAN=137.20;GQ_STDDEV=214.47;InbreedingCoeff=-0.0311;MLEAC=11;MLEAF=0.023;MQ=43.67;MQ0=0;MQRankSum=-1.073e+00;NCC=0;QD=18.16; ReadPosRankSum=-1.760e-01;VQSLOD=3.70;culprit=FS

Indivudual genotypes seem very good. To name a few: 1800 0/0:65,0:65:99:0,108,1620 0/0:86,0:86:99:0,120,1800 0/1:23,25:48:99:1073,0,1971 0/1:51,39:90:99:1505,0,4106

But when I checked the variants in IGV, something strange popped out.

The pictures (and complete post) can be view here: http://alittlebitofsilence.blogspot.hk/2014/09/to-be-solved-gatk-haplotypecaller-indel.html

Judging from the alignment visualized by IGV, there are no insertions at that site, but an SNP in the next position, in all samples. But HaplotypeCaller calls G->GC insertion in some samples while not in other samples. My questions are: 1) In variant calling HaplotypeCaller would do a local de-novo assembly in some region. Is the inconsistency between bamfiles and gvcf because of the re-assembly? 2) Why is the insertion called in some samples but not others, although the position in all samples looks similar? 3) There are other variants called adjacent or near this site, but later filtered by VQSR. Does that indicate the variants from this region cannot be trusted? chr14 92537353 . C CG 303342.41 VQSRTrancheINDEL0.00to90.00+ chr14 92537354 . C CTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTG,CTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTG,CTGCTGCTGCTGCTGCTG,CTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTG,CTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTG,CTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTG 142809.26 VQSRTrancheINDEL0.00to90.00+ chr14 92537378 . G GC 11330.02 PASS chr14 92537379 rs12896583 T TGCTGCTGCTGCTGC,C 13606.25 VQSRTrancheINDEL0.00to90.00+ chr14 92537385 rs141993435 CTTT C 314.64 VQSRTrancheINDEL0.00to90.00+ chr14 92537387 rs12896588 T G 4301.60 VQSRTrancheSNP99.90to100.00 chr14 92537388 rs12896589 T C 4300.82 VQSRTrancheSNP99.90to100.00 chr14 92537396 . G GC,GCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGC 4213.61 VQSRTrancheINDEL0.00to90.00+

chr14 92537397 . T TGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGCTG,TGCTGCTGCTGCTG,TGCTGCTGCTG,TGCTGCTG 2690.79 VQSRTrancheINDEL0.00to90.00+ 3) I used GATK 2.7 for processing before calling gvcf files. Any possibility that if I change to 3.2, the bamfiles would look more similar to gvcf result? I have tried GATK3.2-2 for generating gvcf from GATK 2.7 bamfiles, the results are the same.

Created 2014-08-19 22:04:32 | Updated | Tags: unifiedgenotyper indel qd

Comments (4)

Dear support,

According to the definition of QD, it equals to the variant confidence (from the QUAL field) divided by the unfiltered depth of non-reference samples.

However, when I ran the Unified Genotyper (v2.8.1) on a pair of samples for indel calls, the QD score for some indels are much lower than expected. For example,

1) chrX 51075762 . GCAGCTGCGT G 142.15 . AC=1;AF=0.250;AN=4;BaseCounts=0,0,204,0;BaseQRankSum=-1.435;DP=204;FS=0.000;GC=70.82;LowMQ=0.0049,0.0049,204;MLEAC=1;MLEAF=0.250;MQ=55.32;MQ0=0;MQRankSum=-2.698;QD=0.15;ReadPosRankSum=-3.784;Samples=tumor_DNA_2C;VariantType=DELETION.NumRepetitions_1.EventLength_9 GT:AD:DP:GQ:PL 0/0:94,0:95:99:0,283,11503 0/1:101,7:108:99:181,0,11532

in this case, the expected QD = 142.15 / 108 = 1.32, but the reported QD = 0.15

2) chr21 45649571 . C CCTGGACCCGCCCTGGACACCCCACGGGGG 526.15 . AC=1;AF=0.250;AN=4;BaseCounts=0,78,0,0;BaseQRankSum=0.787;DP=78;FS=0.000;GC=65.84;LowMQ=0.0000,0.0000,78;MLEAC=1;MLEAF=0.250;MQ=60.33;MQ0=0;MQRankSum=1.453;QD=0.44;ReadPosRankSum=-3.875;Samples=tumor_DNA_2C;VariantType=INSERTION.NOVEL_10orMore GT:AD:DP:GQ:PL 0/0:31,1:37:86:0,86,3672 0/1:30,6:41:99:565,0,4204

the expected QD = 526.15 / 41 = 12.83, the reported QD 0.44 is much smaller.

Do you know what's happening? Should I filter those indel calls (since QD < 2.0)?



Created 2014-04-02 07:06:05 | Updated | Tags: indel fastaalternatereferencemaker indexdictionaryutils

Comments (4)


first of all thanks for the support offered here. But (sadly) I have an other two questions.

I used the FastaAlternateReferenceMaker to create a consesus sequence and it worked great. Here I have a question regarding a warning: "IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation". I do not get this warning. Which dictionary is needed here. Is this warning comming up, because I used .gz and .tbi compressed vcf-files?

And a furhter question regarding the creation of a consesus sequence. I`m unsure what to do in cases of heterozygous INDELs. What would you suggest to do? Skip them as they are only represented by one allele or include them in the consensus. Both seems to be incorrect. Do you know the general handling here?


Created 2013-12-19 04:01:06 | Updated | Tags: haplotypecaller platform454filter indel

Comments (2)


Can i use HaplotypeCaller to call the 454 indel? Can anyone share his/her idea?thanks.

Best, J

Created 2013-10-06 17:28:09 | Updated 2013-10-06 17:29:10 | Tags: indel

Comments (4)

Hello all,

I have narrowed down this issue to UG Indel caller. I am currently running UG will an additional 136 1000Genome individuals to be call at the same time my variants of interest. I run my SNP and Indel calls separate as per GATK best practices. But have noticed this issue only with my Indel calls.

This is the command I'm running:

java -Xmx32g -jar -Djava.io.tmpdir=/tmp/ /home/srynearson/GenomeAnalysisTK-nightly-oct/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /home/srynearson/cAPTUrE/references/human_g1k_v37.fasta -I [each of the 1000Genome .bam files] -I [each of my variants of interest .bam files] --num_threads 15 --standard_min_confidence_threshold_for_calling 30.0 --standard_min_confidence_threshold_for_emitting 30.0 --output_mode EMIT_VARIANTS_ONLY --num_cpu_threads_per_data_thread 2 --genotype_likelihoods_model INDEL

And this is the error I get:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Somehow the requested coordinate is not covered by the read. Alignment 88304365 | 67S24M1D8M1S at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:574) at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:438) at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinateUpToEndOfRead(ReadUtils.java:434) at org.broadinstitute.sting.gatk.walkers.annotator.BaseQualityRankSumTest.getElementForRead(BaseQualityRankSumTest.java:76) at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.getElementForRead(RankSumTest.java:200) at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.fillQualsFromLikelihoodMap(RankSumTest.java:179) at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.annotate(RankSumTest.java:102) at org.broadinstitute.sting.gatk.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:192) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateGenotypes(UnifiedGenotyperEngine.java:560) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:234) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:367) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:143) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version nightly-2013-10-04-ge260355):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Alignment 88304365 | 67S24M1D8M1S
ERROR ------------------------------------------------------------------------------------------

I have confirmed prior files and the SNP file runs to completion.

Any ideas? Thanks!

Created 2013-09-18 20:26:06 | Updated | Tags: unifiedgenotyper haplotypecaller indel inderealigner

Comments (4)

I have Ion Torrent data. I am trying to call a variant that I know to exist (confirmed with Sanger). In the position where there is the known indel, I have a depth of roughly 80-90 (in two different runs) and a of those between 20-23% of the reads have the insertion called. What parameters should I be adjusting to get this indel to call? I don't mind a large number of false positives.

I've tried several iterations that include indel realignment using known indels (1000G_phase1 and ills_and_1000G_gold_standard) and also excluding them. I have also tried iterations of setting these flags in UnifiedGenotyper:-stand_call_conf 30.0 -stand_emit_conf 0.0 --min_base_quality_score 0 -glm BOTH --dbsnp dbsnp_137.b37.vcf -nt -rf BadCigar -minIndelCnt 3 -minIndelFrac 0.15. I have also attempted to use HaplotypeCaller: -stand_call_conf 30.0 -stand_emit_conf 0.0 --dbsnp dbsnp_137.b37.vcf -rf BadCigar

Any suggestions would be great.

Created 2013-05-23 17:43:39 | Updated 2013-05-23 21:20:48 | Tags: vcf indel

Comments (5)


I'm having problems understanding a GATK output VCF. I have read the VCF standard, but I'm obviously missing something.

I /think/ I understand how SNPs and short indels are represented, but clearly I do not. Below is an excerpt that illustrates sites which I do not understand. I suspect it may be something to do with GATK quality filters that I'm not understanding, or something about using EMIT_ALL_SITES...

The excerpt below was generated using

GATK -l INFO -I my.bam -R my.fa -T UnifiedGenotyper -S LENIENT -nt 8 --heterozygosity 0.1 -o test.vcf --genotype_likelihoods_model BOTH --min_base_quality_score 10 --output_mode EMIT_ALL_SITES -ploidy 2



    CH1 225 .   T   G   12.71   LowQual AC=1;AF=0.500;AN=2;BaseQRankSum=1.978;DP=59;Dels=0.03;FS=0.000;HaplotypeScore=10.2840;MLEAC=1;MLEAF=0.500;MQ=70.25;MQ0=8;MQRankSum=-5.349;QD=0.22;ReadPosRankSum=-3.188 GT:AD:DP:GQ:PL  0/1:41,16:55:20:20,0,1435
    CH1 226 .   T   .   121.53  .   AN=2;DP=59;MQ=70.25;MQ0=8   GT:DP   0/0:43
    CH1 227 .   A   .   121.53  .   AN=2;DP=59;MQ=70.25;MQ0=8   GT:DP   0/0:43
    CH1 228 .   T   .   121.53  .   AN=2;DP=59;MQ=70.25;MQ0=8   GT:DP   0/0:43
    CH1 229 .   A   .   115.53  .   AN=2;DP=57;MQ=69.66;MQ0=8   GT:DP   0/0:38
    CH1 230 .   C   .   115.53  .   AN=2;DP=57;MQ=69.66;MQ0=8   GT:DP   0/0:38
    CH1 231 .   T   .   115.53  .   AN=2;DP=57;MQ=69.66;MQ0=8   GT:DP   0/0:36
    CH1 232 .   G   .   115.53  .   AN=2;DP=57;MQ=69.66;MQ0=8   GT:DP   0/0:36
    CH1 233 .   C   .   115.53  .   AN=2;DP=57;MQ=69.66;MQ0=8   GT:DP   0/0:37
    CH1 234 .   A   .   139.53  .   AN=2;DP=70;MQ=59.20;MQ0=14  GT:DP   0/0:63
    CH1 235 .   A   .   175.53  .   AN=2;DP=84;MQ=51.67;MQ0=15  GT:DP   0/0:79
    CH1 236 .   A   .   175.53  .   AN=2;DP=84;MQ=51.67;MQ0=15  GT:DP   0/0:79
    CH1 237 .   T   .   175.53  .   AN=2;DP=85;MQ=51.37;MQ0=16  GT:DP   0/0:80
    CH1 238 .   A   .   175.53  .   AN=2;DP=102;MQ=46.90;MQ0=28 GT:DP   0/0:97
    CH1 238 .   A   AGAAAGAAAGCTTGTA    83.73   .   AC=1;AF=0.500;AN=2;BaseQRankSum=6.172;DP=102;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=46.90;MQ0=0;MQRankSum=-6.190;QD=0.05;ReadPosRankSum=-5.733 GT:AD:DP:GQ:PL  0/1:27,25:57:99:121,0,4853
    CH1 239 .   A   .   175.53  .   AN=2;DP=102;MQ=46.90;MQ0=28 GT:DP   0/0:101
    CH1 240 .   T   .   175.53  .   AN=2;DP=102;MQ=46.90;MQ0=28 GT:DP   0/0:98
    CH1 241 .   A   .   169.53  .   AN=2;DP=108;MQ=44.14;MQ0=29 GT:DP   0/0:107
*    CH1    242 .   T   .   169.53  .   AN=2;DP=109;MQ=43.94;MQ0=29 GT:DP   0/0:103
*    CH1    242 .   T   .   118.27  .   AN=2;DP=109;MQ=43.94;MQ0=29 GT:AD:DP    0/0:27:55
*    CH1    243 .   C   .   172.53  .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:DP   0/0:108
*    CH1    243 .   CTTTT   .   118.27  .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:AD:DP    0/0:27:56
    CH1 244 .   T   .   91.53   .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:DP   0/0:61
    CH1 245 .   T   .   91.53   .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:DP   0/0:53
    CH1 246 .   T   .   73.53   .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:DP   0/0:41
    CH1 247 .   T   .   91.53   .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:DP   0/0:46
    CH1 248 .   A   .   172.53  .   AN=2;DP=116;MQ=42.61;MQ0=31 GT:DP   0/0:100
    CH1 249 .   A   .   172.53  .   AN=2;DP=116;MQ=42.61;MQ0=31 GT:DP   0/0:100
    CH1 250 .   T   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:101
    CH1 251 .   T   .   169.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:96
    CH1 251 .   T   .   118.27  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:AD:DP    0/0:27:56
    CH1 252 .   C   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:113
    CH1 253 .   C   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:110
    CH1 254 .   T   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:111
    CH1 255 .   T   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:111
    CH1 256 .   T   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:111

Line 1 is a SNP
Lines 14 and 15 are an indel that I do understand
Lines 19 and 20 I do /not/ understand
Lines 21 and 22 I do /not/ understand

Created 2013-03-11 14:11:32 | Updated | Tags: selectvariants indel

Comments (3)

Dear All,

Just thought I should report a possible small bug with SelectVariant --maxIndelSize as I think it's only filtering insertions greater than the given value and not deletions. Of course using JEXL expressions I can still get the variants I'm after...

Cheers, Laurent