Tagged with #realignertargetcreator
2 documentation articles | 0 announcements | 46 forum discussions



Created 2012-07-23 23:56:41 | Updated 2012-07-23 23:56:41 | Tags: realignertargetcreator gatkdocs
Comments (10)

A new tool has been released!

Check out the documentation at RealignerTargetCreator.


Created 2012-07-23 16:48:55 | Updated 2012-09-30 23:35:55 | Tags: indelrealigner realignertargetcreator official
Comments (110)

Realigner Target Creator

For a complete, detailed argument reference, refer to the GATK document page here.


Indel Realigner

For a complete, detailed argument reference, refer to the GATK document page here.


Running the Indel Realigner only at known sites

While we advocate for using the Indel Realigner over an aggregated bam using the full Smith-Waterman alignment algorithm, it will work for just a single lane of sequencing data when run in -knownsOnly mode. Novel sites obviously won't be cleaned up, but the majority of a single individual's short indels will already have been seen in dbSNP and/or 1000 Genomes. One would employ the known-only/lane-level realignment strategy in a large-scale project (e.g. 1000 Genomes) where computation time is severely constrained and limited. We modify the example arguments from above to reflect the command-lines necessary for known-only/lane-level cleaning.

The RealignerTargetCreator step would need to be done just once for a single set of indels; so as long as the set of known indels doesn't change, the output.intervals file from below would never need to be recalculated.

 java -Xmx1g -jar /path/to/GenomeAnalysisTK.jar \
  -T RealignerTargetCreator \
  -R /path/to/reference.fasta \
  -o /path/to/output.intervals \
  -known /path/to/indel_calls.vcf

The IndelRealigner step needs to be run on every bam file.

java -Xmx4g -Djava.io.tmpdir=/path/to/tmpdir \
  -jar /path/to/GenomeAnalysisTK.jar \
  -I <lane-level.bam> \
  -R <ref.fasta> \
  -T IndelRealigner \
  -targetIntervals <intervalListFromStep1Above.intervals> \
  -o <realignedBam.bam> \
  -known /path/to/indel_calls.vcf
  --consensusDeterminationModel KNOWNS_ONLY \
  -LOD 0.4
No posts found with the requested search criteria.
Comments (5)

Hello,

I used bwa to map my samples to a mitochondrial genome of a non-model organism. Afterwards I careated a merged .bam file from multiple (288) sample .bams (used samtools merge and re-assigned RG tags), but when I run UnifiedGenotyper on that file it gets stuck at 32.1% and never moves forward from there. I also wanted to run RealignerTargetCreator, but I always get a truncated realigned.bam file. Any suggestions for how to troubleshoot this?

Thanks you.


Created 2015-07-15 10:04:10 | Updated 2015-07-15 10:05:21 | Tags: realignertargetcreator empty out-interval genomeanalysistk-jar
Comments (2)

Hi,

I am running GATK for mouse variation analysis. I am trying to do indel realignment using RealignerTargetCreator. Program run successfully but there is no data in put.interval file

**********command **********

java -Xmx5g -Xms5g -Djava.io.tmpdir=`pwd`/tmp -jar /share/apps/gatk/src/GenomeAnalysisTK.jar \
 -nt 36\
 -T RealignerTargetCreator \
  -R mm_ref_GRCm38.p2_Genome_renamed_reordered.fa \
  -o out.intervals \
   -known:myvcf,VCF /home/cparsania/Database/mm_ref_GRCm38.p2/mgp.v5.merged.snps_all.dbSNP142_renamed_sorted.vcf 

**********Output Log**********

INFO  16:07:22,948 HelpFormatter - Executing as cparsania@compute-2-3.local on Linux 2.6.18-308.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17. 
INFO  16:07:22,948 HelpFormatter - Date/Time: 2015/07/15 16:07:22 
INFO  16:07:22,948 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  16:07:22,948 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  16:07:23,280 GenomeAnalysisEngine - Strictness is SILENT 
INFO  16:07:23,403 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO  16:07:23,574 MicroScheduler - Running the GATK in parallel mode with 36 total threads, 1 CPU thread(s) for each of 36 data thread(s), of 48 processors available on this machine 
INFO  16:07:23,701 GenomeAnalysisEngine - Preparing for traversal 
INFO  16:07:23,705 GenomeAnalysisEngine - Done preparing for traversal 
INFO  16:07:23,706 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  16:07:23,706 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  16:07:23,706 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
INFO  16:07:53,723 ProgressMeter - NC_000067.6:144540201      1.28E8    30.0 s       0.0 s        5.3%     9.4 m       8.9 m 
INFO  16:08:23,806 ProgressMeter - NC_000068.7:11335701      1.94E8    60.0 s       0.0 s        7.6%    13.2 m      12.2 m 
INFO  16:08:53,841 ProgressMeter - NC_000068.7:78999901   2.62471971E8    90.0 s       0.0 s       10.1%    14.9 m      13.4 m 
INFO  16:09:23,964 ProgressMeter - NC_000068.7:136999901   3.25471971E8   120.0 s       0.0 s       12.2%    16.4 m      14.4 m 
INFO  16:09:54,102 ProgressMeter - NC_000069.6:16999901   3.88585195E8     2.5 m       0.0 s       14.5%    17.3 m      14.8 m 
INFO  16:10:24,427 ProgressMeter - NC_000069.6:81057901   4.46585195E8     3.0 m       0.0 s       16.8%    17.8 m      14.8 m 
INFO  16:10:54,599 ProgressMeter - NC_000069.6:148214401   5.04585195E8     3.5 m       0.0 s       19.3%    18.1 m      14.6 m 
INFO  16:11:24,921 ProgressMeter - NC_000070.6:38804101   5.69624875E8     4.0 m       0.0 s       21.1%    19.0 m      15.0 m 
INFO  16:11:55,015 ProgressMeter - NC_000070.6:105999901   6.29624875E8     4.5 m       0.0 s       23.6%    19.1 m      14.6 m 
INFO  16:12:25,096 ProgressMeter - NC_000071.6:7734201   6.89624875E8     5.0 m       0.0 s       25.8%    19.5 m      14.5 m 
INFO  16:12:55,351 ProgressMeter - NC_000071.6:68405801   7.52132991E8     5.5 m       0.0 s       28.0%    19.7 m      14.2 m 
INFO  16:13:25,454 ProgressMeter - NC_000071.6:140296601   8.12132991E8     6.0 m       0.0 s       30.6%    19.7 m      13.6 m 
INFO  16:13:55,785 ProgressMeter - NC_000072.6:41999901   8.81967675E8     6.5 m       0.0 s       32.6%    20.1 m      13.5 m 
INFO  16:14:25,985 ProgressMeter - NC_000072.6:94999901   9.29967675E8     7.0 m       0.0 s       34.5%    20.4 m      13.3 m 
INFO  16:14:56,287 ProgressMeter - NC_000073.6:7225501   9.88967675E8     7.5 m       0.0 s       36.8%    20.5 m      12.9 m 
INFO  16:15:26,297 ProgressMeter - NC_000073.6:51999901   1.044704221E9     8.0 m       0.0 s       38.4%    20.9 m      12.9 m 
INFO  16:15:56,453 ProgressMeter - NC_000073.6:128014401   1.097704221E9     8.5 m       0.0 s       41.2%    20.7 m      12.2 m 
INFO  16:16:26,581 ProgressMeter - NC_000074.6:32999901   1.17014568E9     9.0 m       0.0 s       43.1%    21.0 m      11.9 m 
INFO  16:16:56,874 ProgressMeter - NC_000074.6:97999901   1.22914568E9     9.6 m       0.0 s       45.5%    21.0 m      11.5 m 
INFO  16:17:27,138 ProgressMeter - NC_000075.6:33693401   1.289546893E9    10.1 m       0.0 s       47.9%    21.0 m      11.0 m 
INFO  16:17:57,593 ProgressMeter - NC_000075.6:83999901   1.348546893E9    10.6 m       0.0 s       49.7%    21.2 m      10.7 m 
INFO  16:18:27,651 ProgressMeter - NC_000076.6:32999901   1.416142003E9    11.1 m       0.0 s       52.4%    21.1 m      10.0 m 
INFO  16:18:57,828 ProgressMeter - NC_000076.6:92999901   1.474142003E9    11.6 m       0.0 s       54.6%    21.2 m       9.6 m 
INFO  16:19:27,910 ProgressMeter - NC_000077.6:20049601   1.532836996E9    12.1 m       0.0 s       56.7%    21.3 m       9.2 m 
INFO  16:19:58,237 ProgressMeter - NC_000077.6:78999901   1.602836996E9    12.6 m       0.0 s       58.9%    21.3 m       8.8 m 
INFO  16:20:28,247 ProgressMeter - NC_000078.6:25999901   1.665919539E9    13.1 m       0.0 s       61.4%    21.3 m       8.2 m 
INFO  16:20:58,267 ProgressMeter - NC_000078.6:90999901   1.732919539E9    13.6 m       0.0 s       63.8%    21.3 m       7.7 m 
INFO  16:21:28,376 ProgressMeter - NC_000079.6:38925901   1.794048561E9    14.1 m       0.0 s       66.3%    21.2 m       7.2 m 
INFO  16:21:58,544 ProgressMeter - NC_000079.6:97482301   1.851048561E9    14.6 m       0.0 s       68.4%    21.3 m       6.7 m 
INFO  16:22:28,722 ProgressMeter - NC_000080.6:45083501   1.9224702E9    15.1 m       0.0 s       70.9%    21.3 m       6.2 m 
INFO  16:22:58,892 ProgressMeter - NC_000080.6:117999901   1.9884702E9    15.6 m       0.0 s       73.6%    21.2 m       5.6 m 
INFO  16:23:28,971 ProgressMeter - NC_000081.6:58225001   2.052372444E9    16.1 m       0.0 s       76.0%    21.2 m       5.1 m 
INFO  16:23:59,036 ProgressMeter - NC_000082.6:15999901   2.110372444E9    16.6 m       0.0 s       78.3%    21.2 m       4.6 m 
INFO  16:24:29,111 ProgressMeter - NC_000082.6:85802101   2.178416129E9    17.1 m       0.0 s       80.8%    21.1 m       4.0 m 
INFO  16:24:59,175 ProgressMeter - NC_000083.6:33999901   2.238623897E9    17.6 m       0.0 s       82.5%    21.3 m       3.7 m 
INFO  16:25:29,331 ProgressMeter - NC_000083.6:84758401   2.290623897E9    18.1 m       0.0 s       84.4%    21.4 m       3.3 m 
INFO  16:25:59,601 ProgressMeter - NC_000084.6:42999901   2.351611168E9    18.6 m       0.0 s       86.4%    21.5 m       2.9 m 
INFO  16:26:29,851 ProgressMeter - NC_000085.6:26999901   2.420313807E9    19.1 m       0.0 s       89.1%    21.4 m       2.3 m 
INFO  16:27:00,149 ProgressMeter - NC_000086.7:55531001   2.506745373E9    19.6 m       0.0 s       92.4%    21.2 m      96.0 s 
INFO  16:27:30,234 ProgressMeter - NC_000086.7:140398101   2.590745373E9    20.1 m       0.0 s       95.5%    21.0 m      56.0 s 
INFO  16:27:49,193 ProgressMeter -            done   2.725537669E9    20.4 m       0.0 s      100.0%    20.4 m       0.0 s 
INFO  16:27:49,193 ProgressMeter - Total runtime 1225.49 secs, 20.42 min, 0.34 hours 
INFO  16:28:12,666 GATKRunReport - Uploaded run statistics report to AWS S3 

I don't understand though command run successfully why out.interval is empty.

Please help

Thanks you Chirag


Created 2015-05-05 09:47:41 | Updated | Tags: indelrealigner realignertargetcreator
Comments (3)

I was just wondering what you guys thought of my realignment intervals length distribution. This is 30Mb from a single diploid sample without prior indel position information. Approximately 60,000 events , i.e. one every fifty bases seems like a lot. How indicative of true indels is the data from TargetCreator and IndelRealigner? Guess I'll have to check with the ug-vcf calls... Across the genome, distribution of 'all' events is uniform. Does multi-sample realignment improve the accuracy or efficiency of the realignment process ?


Created 2015-04-20 16:08:37 | Updated | Tags: realignertargetcreator platform454filter
Comments (1)

Hello,

I read in the Realigner Target Creator page that this tool will not work on 454 reads due to the presence of false indels inherent to the technology, but i still tried it to check the quality of my mappings. Below is an example of one of the results in one sample:

INFO 21:13:47,410 MicroScheduler - 526217 reads were filtered out during the traversal out of approximately 526217 total reads (100.00%) INFO 21:13:47,411 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter INFO 21:13:47,411 MicroScheduler - -> 0 reads (0.00% of total) failing BadMateFilter INFO 21:13:47,412 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter INFO 21:13:47,412 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter INFO 21:13:47,413 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 21:13:47,414 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter INFO 21:13:47,414 MicroScheduler - -> 175060 reads (33.27% of total) failing MappingQualityZeroFilter INFO 21:13:47,415 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter INFO 21:13:47,416 MicroScheduler - -> 351157 reads (66.73% of total) failing Platform454Filter INFO 21:13:47,416 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter INFO 21:13:51,655 GATKRunReport - Uploaded run statistics report to AWS S3

The 33.27% might refer to the reads that mapped in more than one location, i will look at them later, but i would like to know what does the "Platform454Filter" really do, i couldn't find any description here.

Furthermore, do you recommend any best practices for calling vairants with this technology ? Particularly, in the removing duplicates step, does it make a big difference not removing duplicates with single end reads ?

I would appreciate any help on this. Thanks in advance, Pedro Barbosa


Created 2015-04-02 21:03:41 | Updated | Tags: realignertargetcreator edge-alignment-state
Comments (16)

Hi

I am trying to run RealignerTargetCreator on a highly soft clipped bam file (soft clipped primer sequences). As soon as I start the command for RealignerTargetCreator it fails with the following error. Is this a known bug/error? Are there specific conditions when this error is thrown? Sorry I could not find any material relevant to this error and hence posted here and sent this email to Appistry as well. Hope someone can help. I can share data snippet if that helps.

Thanks

Siva

INFO 16:46:21,849 HelpFormatter - The Genome Analysis Toolkit (GATK) v2014.3-3.2.2-7-gf9cba99, Compiled 2014/08/06 10:49:54 INFO 16:46:21,849 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 16:46:21,849 HelpFormatter - For support and documentation go to http://gatkdocs.appistry.com/ INFO 16:46:21,853 HelpFormatter - Program Args: -T RealignerTargetCreator -K /usr/prog/gatk/3.2.2-7/gatk.license --input_file /mnt/tmplabdata/as-ngs/CCPV/analysis/Variant_Call_NIBR/bwamem_trimWithinAlign/Sample_1906_HX/Sample_1906_HX.cleaned.bam --reference_sequence /mnt/tmplabdata/as-ngs/referenceGenomes/hg19/bwamem/Homo_sapiens_assembly19.fasta --out /mnt/tmplabdata/as-ngs/CCPV/analysis/Variant_Call_NIBR/bwamem_trimWithinAlign/Sample_1906_HX/Sample_1906_HX.realigned.intervals --validation_strictness SILENT INFO 16:46:21,860 HelpFormatter - Executing as gowrisi1@clusca530.local on Linux 2.6.18-371.11.1.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13. INFO 16:46:21,861 HelpFormatter - Date/Time: 2015/04/02 16:46:21 INFO 16:46:21,861 HelpFormatter - ----------------------------------------------------------------------------------------- INFO 16:46:21,861 HelpFormatter - ----------------------------------------------------------------------------------------- INFO 16:46:22,369 GenomeAnalysisEngine - Strictness is SILENT INFO 16:46:22,477 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 16:46:22,486 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 16:46:22,505 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 16:46:22,589 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 16:46:22,725 GenomeAnalysisEngine - Done preparing for traversal INFO 16:46:22,725 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 16:46:22,726 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 16:46:22,726 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalStateException: Cannot make a pileup element from an edge alignment state at org.broadinstitute.gatk.utils.locusiterator.AlignmentStateMachine.makePileupElement(AlignmentStateMachine.java:362) at org.broadinstitute.gatk.utils.locusiterator.LocusIteratorByState.lazyLoadNextAlignmentContext(LocusIteratorByState.java:317) at org.broadinstitute.gatk.utils.locusiterator.LocusIteratorByState.hasNext(LocusIteratorByState.java:233) at htsjdk.samtools.util.PeekableIterator.advance(PeekableIterator.java:67) at htsjdk.samtools.util.PeekableIterator.next(PeekableIterator.java:54) at org.broadinstitute.gatk.engine.executive.WindowMaker$WindowMakerIterator.advance(WindowMaker.java:200) at org.broadinstitute.gatk.engine.executive.WindowMaker$WindowMakerIterator.hasNext(WindowMaker.java:169) at org.broadinstitute.gatk.engine.datasources.providers.LocusView.advance(LocusView.java:180) at org.broadinstitute.gatk.engine.datasources.providers.LocusView.hasNextLocus(LocusView.java:148) at org.broadinstitute.gatk.engine.datasources.providers.AllLocusView.advance(AllLocusView.java:127) at org.broadinstitute.gatk.engine.datasources.providers.AllLocusView.hasNext(AllLocusView.java:85) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$MapDataIterator.hasNext(TraverseLociNano.java:167) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:268) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:126) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2014.3-3.2.2-7-gf9cba99):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR Visit our website for extensive documentation and answers to
ERROR commonly asked questions http://gatkdocs.appistry.com/
ERROR
ERROR MESSAGE: Cannot make a pileup element from an edge alignment state
ERROR ------------------------------------------------------------------------------------------

Created 2015-03-09 10:08:14 | Updated 2015-03-09 10:10:04 | Tags: realignertargetcreator bug gatk-runtime-error
Comments (1)

I am trying to reproduce running the GATK pipeline (best practices) on Power8, but during the RealignerTargetCreator phase, I run into the following problem: INFO 06:05:39,926 HelpFormatter - -------------------------------------------------------------------------------- INFO 06:05:39,928 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22 INFO 06:05:39,929 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 06:05:39,929 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 06:05:39,933 HelpFormatter - Program Args: -T RealignerTargetCreator -R genome/hg19.fasta -I results/out_sorted_dup.bam -o results/out_target_intervals.interval_list -known vcf/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -known vcf/1000G_phase1.indels.hg19.sites.vcf INFO 06:05:39,941 HelpFormatter - Executing as admin@power8-2xl on Linux 3.14.17-100.fc19.ppc64p7 ppc64; OpenJDK 64-Bit Server VM 1.7.0_65-mockbuild_2014_07_26_03_56-b00. INFO 06:05:39,942 HelpFormatter - Date/Time: 2015/03/09 06:05:39 INFO 06:05:39,942 HelpFormatter - -------------------------------------------------------------------------------- INFO 06:05:39,942 HelpFormatter - -------------------------------------------------------------------------------- INFO 06:05:40,016 GenomeAnalysisEngine - Strictness is SILENT INFO 06:05:40,150 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 06:05:40,159 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 06:05:40,182 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 06:05:40,461 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 06:05:41,576 GenomeAnalysisEngine - Done preparing for traversal INFO 06:05:41,577 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 06:05:41,578 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 06:05:41,578 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime INFO 06:05:43,206 GATKRunReport - Uploaded run statistics report to AWS S3 ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR stack trace java.lang.ArrayIndexOutOfBoundsException: -94 at org.broadinstitute.gatk.utils.BaseUtils.convertIUPACtoN(BaseUtils.java:216) at org.broadinstitute.gatk.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:288) at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.initializeReferenceSequence(LocusReferenceView.java:150) at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.(LocusReferenceView.java:126) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:90) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107) ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af): ##### ERROR ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem. ##### ERROR If not, please post the error message, with stack trace, to the GATK forum. ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk ##### ERROR ##### ERROR MESSAGE: -94

This is the command I used: java jar ~Xmx32g -jar ~/workspace/GATK/tools/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R genome/hg19.fasta -I results/out_sorted_dup.bam -o results/out_target_intervals.interval_list -known vcf/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -known vcf/1000G_phase1.indels.hg19.sites.vcf

I tried to look for a similar problem on the forum, but could not find anything. Could you advise how to resolve this situation? The error message suggests this may be a bug..

Thanks in advance!


Created 2014-12-19 01:00:34 | Updated | Tags: realignertargetcreator error
Comments (9)

I keep getting the same error message when running a command in GATK. I am using GATK 3.3 Eventually I would like to call SNPs, but want to realign around InDels first.

To make sure that my input files are correct, I used Picard: java -jar /programs/ValidateSamFile.jar I=merged_sorted_fixed.bam No errors were found.

I wrote: java -jar /programs/GenomeAnalysisTK/GenomeAnalysisTK.jar \ -T RealignerTargetCreator\ -R target_sequences.fasta -o merged_output.intervals \ -I merged_sorted_fixed.bam \ --minReadsAtLocus 4 \ --mismatchFraction 0

And this is the output:

" Adding rod class GFF Adding rod class dbSNP Adding rod class HapMapAlleleFrequencies Adding rod class SAMPileup Adding rod class GELI Adding rod class RefSeq Adding rod class Table Adding rod class PooledEM Adding rod class 1KGSNPs Adding rod class SangerSNP Adding rod class HapMapGenotype Adding rod class Intervals Adding rod class Variants 76 [main] INFO org.broadinstitute.sting.gatk.WalkerManager - plugin directory: /scratch/atk25/programs/GenomeAnalysisTK/walkers 100 [main] INFO org.broadinstitute.sting.gatk.WalkerManager - * Adding module CountLoci 101 [main] INFO org.broadinstitute.sting.gatk.WalkerManager - * Adding module PrintReads 101 [main] INFO org.broadinstitute.sting.gatk.WalkerManager - * Adding module Pileup 101 [main] INFO org.broadinstitute.sting.gatk.WalkerManager - * Adding module DepthOfCoverage 105 [main] INFO org.broadinstitute.sting.gatk.WalkerManager - * Adding module ValidatingPileup 106 [main] INFO org.broadinstitute.sting.gatk.WalkerManager - * Adding module CountReads 139 [main] FATAL root - Exception caught by base Command Line Program, with message: null 139 [main] FATAL root - with cause: null java.lang.NullPointerException at org.broadinstitute.sting.gatk.WalkerManager.createWalkerByName(WalkerManager.java:93) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.getWalkerByName(GenomeAnalysisEngine.java:157) at org.broadinstitute.sting.gatk.CommandLineGATK.getArgumentSources(CommandLineGATK.java:117) at org.broadinstitute.sting.utils.cmdLine.CommandLineProgram.start(CommandLineProgram.java:199) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:58) java.lang.RuntimeException: java.lang.NullPointerException at org.broadinstitute.sting.utils.cmdLine.CommandLineProgram.start(CommandLineProgram.java:279) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:58) Caused by: java.lang.NullPointerException at org.broadinstitute.sting.gatk.WalkerManager.createWalkerByName(WalkerManager.java:93) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.getWalkerByName(GenomeAnalysisEngine.java:157) at org.broadinstitute.sting.gatk.CommandLineGATK.getArgumentSources(CommandLineGATK.java:117) at org.broadinstitute.sting.utils.cmdLine.CommandLineProgram.start(CommandLineProgram.java:199)

... 1 more

An error has occurred. Please check your command line arguments for any typos or inconsistencies."

I'd really appreciate any help!


Created 2014-08-13 20:01:33 | Updated 2014-08-13 20:06:48 | Tags: realignertargetcreator bwa-and-gatk
Comments (2)

Hi,

I have used bwa mem to align with the below command:

bwa mem -R '@RG\tID:X\tLB:Y\tSM:Z\tPL:ILLUMINA' ref.fa seq1.fastq seq2.fastq | samtools view -bS - > alignment.bam

Then used GATK lastest version to create interval for realignment around indels using RealignTargetCreator which gives the error as shown below:

Command:

/apps/technic/jdk1.7.0_45/bin/java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R ref.fa -I alignment.bam -o realign.intervals Picked up _JAVA_OPTIONS: -Xmx10G INFO 22:38:37,624 HelpFormatter - -------------------------------------------------------------------------------- INFO 22:38:37,627 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.2-2-gec30cee, Compiled 2014/07/17 15:22:03 INFO 22:38:37,627 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 22:38:37,628 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 22:38:37,634 HelpFormatter - Program Args: -T RealignerTargetCreator -R ref.fa -I alignment.bam -o realign.intervals INFO 22:38:37,783 HelpFormatter - Executing as marumill@mars.genome.helsinki.fi on Linux 2.6.18-371.3.1.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18. INFO 22:38:37,784 HelpFormatter - Date/Time: 2014/08/13 22:38:37 INFO 22:38:37,784 HelpFormatter - -------------------------------------------------------------------------------- INFO 22:38:37,784 HelpFormatter - -------------------------------------------------------------------------------- INFO 22:38:38,466 GenomeAnalysisEngine - Strictness is SILENT INFO 22:38:38,604 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 22:38:38,617 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 22:38:40,292 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.2-2-gec30cee):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: SAM/BAM file alignment.bam is malformed: Error parsing SAM header. Problem parsing @PG key:value pair ID:X clashes with ID:bwa. Line:
ERROR @PG ID:bwa PN:bwa VN:0.7.10-r789 CL:bwa mem -R @RG ID:X LB:Y SM:Z PL:ILLUMINA ref.fa R1_001_filtered.fastq R2_001_filtered.fastq; File alignment.bam; Line number 43

I have recently updated to the latest version and encountered with this error which did not occur with the previous version. Could someone give suggestions/ideas to fix this?

Thanks in advance!!


Created 2014-07-10 00:27:44 | Updated | Tags: realignertargetcreator
Comments (7)

Hello, I am using your snp calling pipeline to validate my indel realignment. I just used the default parameter. I hope you can help me how to maximize the other parameters in RealignerTargetCreator. Thank you.


Created 2014-06-20 07:52:12 | Updated 2014-06-20 08:29:34 | Tags: realignertargetcreator
Comments (2)

Dear GATK team,

I'm using GATK RealignerTargetCreator and IndelRealigner for a very small region(~100bp) that trimmed from the original whole exome BAM file. For example, the region I need is chr1:1-150 (only contain one realign target). I first used samtools to get the BAM for this region. Then I met a problem while running RealignerTargetCreator with only chr1 (have chr01.dict) as reference file. Please see the following command I use:

java -Xmx1g -jar ~/programs/GenomeAnalysisTK.jar -T RealignerTargetCreator -R chr01.fa -I Trim_test_sort.bam -o realigner.intervals

Here is the error message: ERROR MESSAGE: Badly formed genome loc: Contig chr2 given as location, but this contig isn't present in the Fasta sequence dictionary

I found this problem could be solved by using whole genome as reference (i.e. hg19.fa). However, it will take a very long time to go through every chromosome (step ProgressMeter), although the BAM file only contain reads located in chr1:1-150. I also tried to delete some @SQ lines from the trimmed BAM header, but it didn't work.

Just wondering if there is anyway to let RealignerTargetCreator only go through the chr1:1-150 (or just chr1) to save time?

Many thanks!!

Shan


Created 2014-06-11 15:10:28 | Updated | Tags: indelrealigner realignertargetcreator indels resources
Comments (3)

Hello there,

Would you please let us know how "IndelRealigner" makes use of "known" resources? I assume it already has the intervals of interest for realignment from the "RealignerTargetCreator". So it's not clear why it needs the resources again. Would it use them for making any sort of decision to reject or accept realigned indels?

Also, please let us know what happens (algorithmically) if we don't provide the same resources used in "RealignerTargetCreator" to the program.

Thank you Amin Zia


Created 2014-04-30 08:27:24 | Updated | Tags: indelrealigner realignertargetcreator gatk-runtime-error
Comments (5)

Hi there,

I'm running GATK (version 3.1-1-g07a4bf8). I already ran RealignerTargerCreator and IndelRealigner without any problem on my computer. However, today, RealignerTargetcreator no longer works and I got this error message:

ERROR stack trace

java.lang.ExceptionInInitializerError at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.(GenomeAnalysisEngine.java:167) at org.broadinstitute.sting.gatk.CommandLineExecutable.(CommandLineExecutable.java:57) at org.broadinstitute.sting.gatk.CommandLineGATK.(CommandLineGATK.java:66) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:106) Caused by: java.lang.NullPointerException at org.reflections.Reflections.scan(Reflections.java:220) at org.reflections.Reflections.scan(Reflections.java:166) at org.reflections.Reflections.(Reflections.java:94) at org.broadinstitute.sting.utils.classloader.PluginManager.(PluginManager.java:79) ... 4 more

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------

Yesterday, I updated my setting from Ubuntu 13.10 to 14.04 and java from 1.7 to "1.8.0_05". Reading the error stack trace (notably "Caused by: java.lang.NullPointerException"), could it be that this last java update is causing this error?

But after I upgraded java, I have been successfully using 2 programs running with java: picard (MarkDuplicates.jar or BuildBamIndex.jar tools) or Qualimap_v0.8.

Would you have any idea to help me solve this issue? Many thanks !

Fabrice


Created 2014-04-23 15:38:54 | Updated | Tags: indelrealigner realignertargetcreator mouse
Comments (2)

Gidday,

I've run the IndelRealigner on my mouse WGS *bam files with known site data from the Sanger MGP, and now I'm trying to figure out how "well" it worked.

The list created by RealignerTargetCreator contains 6547185 intervals

Parsing the output realigned.bam file for reads that had an "OC" tag added (as suggested in http://www.broadinstitute.org/gatk/events/3391/GATKw1310-BP-2-Realignment.pdf) shows that 1648299 reads were actually realigned.

I used the default settings, which means that

1) -model was USE_READS - and from what I've read, this is the correct option to use, given that Smith-Waterman modelling doesn't give greatly improved results;

2) -LOD was 5.0 - but for my data, which is mouse whole-genome sequence at average 10x coverage, this may be too high and I might be losing true positives.

I've tried randomly picking out candidate intervals from the intervals and OC-tagged reads from the realigned.bam file to check, but I was wondering if there's a more empirical way of checking how good the realignment was (I realise there's "no formal measure" as per the presentation but I'm finding it hard to make a judgement call!).

My feeling from looking at the intervals or realigned reads is that the low coverage is a major issue in terms of identifying "true" indels, so preferably we'd go for specificity over sensitivity.

Thanks for any advice/suggestions in advance!


Created 2014-04-23 07:14:49 | Updated | Tags: realignertargetcreator vcf 1000g genomeanalysistk idx
Comments (4)

Hi,

I'm trying to use the RealignerTargetCreator as a test with 1 know file; 1000G_phase1.indels.b37.vcf. At first the contigs didn't match with my .BAM file (chr1/chr2 vs 1/2), so I adjusted that. Now when running it for the first time; java - jar [path to genomeAnalysisTK.jar] -T RealignerTargetCreator -R [.fasta] -I [.bam] -o [.intervals] -known [path to 1000g_phase1_adjusted.indels.b37.vcf] it gives the following error: "I/O error loading or writing tribble index file for [path to 1000g]". When running it the second time, I get the following error; "Problem detecting index type", because the .idx file is not correctly created.

What am I doing wrong?


Created 2014-04-10 16:24:48 | Updated | Tags: indelrealigner realignertargetcreator bqsr knownsites mouse indel-realignment
Comments (5)

Hello,

I was wondering about the format of the known site vcfs used by the RealignerTargetCreator and BaseRecalibrator walkers.

I'm working with mouse whole genome sequence data, so I've been using the Sanger Mouse Genome project known sites from the Keane et al. 2011 Nature paper. From the output, it seems that the RealignerTargetCreator walker is able to recognise and use the gzipped vcf fine:

INFO 15:12:09,747 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:12:09,751 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02 INFO 15:12:09,751 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 15:12:09,752 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 15:12:09,758 HelpFormatter - Program Args: -T RealignerTargetCreator -R mm10.fa -I DUK01M.sorted.dedup.bam -known /tmp/mgp.v3.SNPs.indels/ftp-mouse.sanger.ac.uk/REL-1303-SNPs_Indels-GRCm38/mgp.v3.indels.rsIDdbSNPv137.vcf.gz -o DUK01M.indel.intervals.list INFO 15:12:09,758 HelpFormatter - Date/Time: 2014/03/25 15:12:09 INFO 15:12:09,758 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:12:09,759 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:12:09,918 ArgumentTypeDescriptor - Dynamically determined type of /fml/chones/tmp/mgp.v3.SNPs.indels/ftp-mouse.sanger.ac.uk/REL-1303-SNPs_Indels-GRCm38/mgp.v3.indels.rsIDdbSNPv137.vcf.gz to be VCF INFO 15:12:10,010 GenomeAnalysisEngine - Strictness is SILENT INFO 15:12:10,367 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 15:12:10,377 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 15:12:10,439 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.06 INFO 15:12:10,468 RMDTrackBuilder - Attempting to blindly load /fml/chones/tmp/mgp.v3.SNPs.indels/ftp-mouse.sanger.ac.uk/REL-1303-SNPs_Indels-GRCm38/mgp.v3.indels.rsIDdbSNPv137.vcf.gz as a tabix indexed file INFO 15:12:11,066 IndexDictionaryUtils - Track known doesn't have a sequence dictionary built in, skipping dictionary validation INFO 15:12:11,201 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files INFO 15:12:12,333 GenomeAnalysisEngine - Done creating shard strategy INFO 15:12:12,334 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] I've checked the indel interval lists for my samples and they do all appear to contain different intervals.

However, when I use the equivalent SNP vcf in the following BQSR step, GATK errors as follows:

`##### ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version 2.5-2-gf57256b):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Invalid command line: This calculation is critically dependent on being able to skip over known variant sites. Please provide a VCF file containing known sites of genetic variation.
ERROR ------------------------------------------------------------------------------------------`

Which means that the SNP vcf (which has the same format as the indel vcf) is not used by BQSR.

My question is: given that the BQSR step failed, should I be worried that there are no errors from the Indel Realignment step? As the known SNP/indel vcfs are in the same format, I don't know whether I can trust the realigned .bams.

Thanks very much!


Created 2014-04-06 19:36:24 | Updated 2014-04-06 19:39:05 | Tags: realignertargetcreator error incompatible-contigs
Comments (6)

Hello All,

I am running RealignerTargetCreator using GATK version GenomeAnalysisTK-1.2-4-gd9ea764 and I am getting the following error: `

ERROR MESSAGE: Input files reads and reference have incompatible contigs: Found contigs with the same name but different lengths:
ERROR contig reads = scaffold69676_size1796 / 3149
ERROR contig reference = scaffold69676_size1796 / 1758.
ERROR reads contigs = [scaffold1_size320545, scaffold2_size291774, scaffold3_size284740..........`

I already checked that I am using the right Reference FASTA file and the correct .bam file, that I have used for alignment before. Therefore, I am clueless why I am getting this error? I would appreciate your help regarding this problem. Any suggestion is welcome?

Thanks, Namrata


Created 2014-03-17 11:56:12 | Updated | Tags: indelrealigner realignertargetcreator targetintervals
Comments (4)

Hello, I ran the GATK RealignerTargetCreator command and got an output file with some lines in non-interval format (about 6%). For example: the line chrM:125-346 versus the line chrM:7684 When I ran the next IndelRealigner command it failed with the following error message, indicating the line without interval: "##### ERROR MESSAGE: Invalid argument value 'targetIntervals' at position 10.

ERROR Invalid argument value 'GRC13283077_var_list' at position 11."

What is the reason for this output? What can I do about it? Please advice.

Thanks, Lily


Created 2014-02-25 09:48:15 | Updated | Tags: realignertargetcreator exception
Comments (1)

Hi there,

I've encountered a NullPointerException when running RealignerTargetCreator. I wasn't able to find any hint that that's a known problem, so I'm posting it here (sorry, should I have overlooked something).

INFO  21:41:18,635 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:41:18,638 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.8-1-g932cd3a, Compiled 2013/12/06 16:47:15 
INFO  21:41:18,638 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  21:41:18,638 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  21:41:18,645 HelpFormatter - Program Args: -l INFO -R hg18/hg18.fasta -I aln/hiseq.wholegenome.cov30.stampy.sorted.markdup.bam -T RealignerTargetCreator -nt 12 -o aln/hiseq.wholegenome.cov30.stampy.indels.intervals 
INFO  21:41:18,645 HelpFormatter - Date/Time: 2014/02/24 21:41:18 
INFO  21:41:18,645 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:41:18,646 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  21:41:18,733 GenomeAnalysisEngine - Strictness is SILENT 
INFO  21:41:18,842 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO  21:41:18,853 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:18,923 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.07 
INFO  21:41:18,956 MicroScheduler - Running the GATK in parallel mode with 12 total threads, 1 CPU thread(s) for each of 12 data thread(s), of 24 processors available on this machine 
INFO  21:41:19,038 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
INFO  21:41:19,823 GenomeAnalysisEngine - Done preparing for traversal 
INFO  21:41:19,823 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  21:41:19,825 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
INFO  21:41:19,910 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:19,916 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
INFO  21:41:19,919 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:19,928 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
INFO  21:41:19,937 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:19,944 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
INFO  21:41:19,944 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:19,951 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
INFO  21:41:19,952 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:19,958 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
INFO  21:41:19,958 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:19,964 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
INFO  21:41:19,964 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:19,975 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
INFO  21:41:19,976 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:19,980 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.00 
INFO  21:41:19,980 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:19,984 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.00 
INFO  21:41:19,985 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:19,989 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.00 
INFO  21:41:19,990 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  21:41:20,010 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 
INFO  21:41:49,933 ProgressMeter -   chr1:26116497        2.60e+07   30.0 s        1.0 s      0.8%        59.0 m    58.5 m 
INFO  21:42:19,937 ProgressMeter -   chr1:57885373        5.77e+07   60.0 s        1.0 s      1.9%        53.2 m    52.2 m 
INFO  21:42:49,941 ProgressMeter -   chr1:88146121        8.80e+07   90.0 s        1.0 s      2.9%        52.4 m    50.9 m 
INFO  21:43:03,098 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.NullPointerException
        at org.broadinstitute.sting.utils.locusiterator.LocusIteratorByState.getLocation(LocusIteratorByState.java:218)
        at org.broadinstitute.sting.utils.locusiterator.LocusIteratorByState.lazyLoadNextAlignmentContext(LocusIteratorByState.java:294)
        at org.broadinstitute.sting.utils.locusiterator.LocusIteratorByState.hasNext(LocusIteratorByState.java:233)
        at net.sf.picard.util.PeekableIterator.advance(PeekableIterator.java:70)
        at net.sf.picard.util.PeekableIterator.next(PeekableIterator.java:57)
        at org.broadinstitute.sting.gatk.executive.WindowMaker$WindowMakerIterator.advance(WindowMaker.java:200)
        at org.broadinstitute.sting.gatk.executive.WindowMaker$WindowMakerIterator.hasNext(WindowMaker.java:169)
        at org.broadinstitute.sting.gatk.datasources.providers.LocusView.advance(LocusView.java:180)
        at org.broadinstitute.sting.gatk.datasources.providers.LocusView.hasNextLocus(LocusView.java:148)
        at org.broadinstitute.sting.gatk.datasources.providers.AllLocusView.advance(AllLocusView.java:127)
        at org.broadinstitute.sting.gatk.datasources.providers.AllLocusView.hasNext(AllLocusView.java:85)
        at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$MapDataIterator.hasNext(TraverseLociNano.java:167)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:268)
        at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
        at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
        at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
        at org.broadinstitute.sting.gatk.executive.ShardTraverser.call(ShardTraverser.java:98)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.8-1-g932cd3a):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------

Please let me know if there is more information I can provide.

Best, Tobi


Created 2013-12-17 19:29:51 | Updated | Tags: indelrealigner realignertargetcreator realignment
Comments (4)

Hi

I'm indel realigning with version 2.4-9 using generic commands such as:

java -Xmx4g -jar /path/to/GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R /path/to/reference.fasta \ -I /path/to/input.bam \ -o /path/to/realigner.intervals

java -Xmx4g -jar /path/to/GenomeAnalysisTK.jar \ -T IndelRealigner \ -R /path/to/reference.fasta \ -I /path/to/sample-level.bam \ -targetIntervals /path/to/realigner.intervals.from.rtc \ -o /path/to/realigned.bam \ -model USE_SW \ -LOD 0.4

In most cases this is working fine, but in a few cases it is introducing artefacts that subsequently cause the bam file to fail Picard's ValidateSamFile, and in a couple of cases it introduces errors that can't be fixed by CleanSam and/or FixMateInformation.

Here's an example of an error that can be fixed by CleanSam:

Before indel realignment:

HS24_08564:7:2311:4630:19372#87 69 AAKM01002546 471 0 * = 471 0 HS24_08564:7:2311:4630:19372#87 137 AAKM01002546 471 25 100M = 471 0

After indel realignment:

HS24_08564:7:2311:4630:19372#87 69 AAKM01002546 471 0 * = 471 0 HS24_08564:7:2311:4630:19372#87 137 AAKM01002546 471 35 91M1D9M = 471 0

Here's an example of an error that can't be fixed:

Before indel realignment:

HS24_10061:6:1312:10172:98346#54 69 AAKM01002280 649 0 * = 649 0 HS24_10061:6:1312:10172:98346#54 137 AAKM01002280 649 37 100M = 649 0

After indel realignment:

HS24_10061:6:1312:10172:98346#54 69 AAKM01002280 649 0 * = 649 0 HS24_10061:6:1312:10172:98346#54 137 AAKM01002280 649 47 91M2D9M0S = 649 0

Is this a known bug? Any chance of a fix?

Thanks!

Richard


Created 2013-12-03 17:47:15 | Updated 2013-12-03 18:03:12 | Tags: realignertargetcreator filters
Comments (9)

Dear all can anybody help me with this error while running Realigntargetcreator the run failed to pass through many filters any suggestion why

Run summary

INFO 12:07:39,628 HelpFormatter - -------------------------------------------------------------------------------- 
INFO 12:07:39,631 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-2-g6bda569, Compiled 2013/08/28 16:30:29 
INFO 12:07:39,631 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO 12:07:39,631 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO 12:07:39,635 HelpFormatter - Program Args: -T RealignerTargetCreator -R /home/sab/ref/human_hg19.fa -I /home/sab/pipeline/A_sorted.bam -o /home/sab/pipeline/A_sorted.IndelRealigner.intervals 
INFO 12:07:39,636 HelpFormatter - Date/Time: 2013/12/03 12:07:39 
INFO 12:07:39,636 HelpFormatter - -------------------------------------------------------------------------------- 
INFO 12:07:39,636 HelpFormatter - -------------------------------------------------------------------------------- 
INFO 12:07:39,697 GenomeAnalysisEngine - Strictness is SILENT 
INFO 12:07:39,789 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO 12:07:39,798 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO 12:07:39,820 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 
INFO 12:07:39,906 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
INFO 12:07:40,400 GenomeAnalysisEngine - Done preparing for traversal 
INFO 12:07:40,400 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO 12:07:40,401 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining 
INFO 12:08:10,406 ProgressMeter - chr1:85262337 8.53e+07 30.0 s 0.0 s 2.8% 18.2 m 17.7 m 
INFO 12:08:40,407 ProgressMeter - chr1:180404225 1.80e+08 60.0 s 0.0 s 5.8% 17.2 m 16.2 m 
INFO 12:09:10,409 ProgressMeter - chr2:26489345 2.76e+08 90.0 s 0.0 s 8.9% 16.8 m 15.3 m 
INFO 12:09:40,413 ProgressMeter - chr2:126159101 3.75e+08 120.0 s 0.0 s 12.1% 16.5 m 14.5 m 
INFO 12:10:10,415 ProgressMeter - chr2:226503317 4.76e+08 2.5 m 0.0 s 15.4% 16.3 m 13.8 m 
INFO 12:10:40,417 ProgressMeter - chr3:79985305 5.72e+08 3.0 m 0.0 s 18.5% 16.2 m 13.2 m 
INFO 12:11:10,418 ProgressMeter - chr3:178616501 6.71e+08 3.5 m 0.0 s 21.7% 16.1 m 12.6 m 
INFO 12:11:40,424 ProgressMeter - chr4:82438805 7.73e+08 4.0 m 0.0 s 25.0% 16.0 m 12.0 m 
INFO 12:12:10,426 ProgressMeter - chr4:190799281 8.81e+08 4.5 m 0.0 s 28.5% 15.8 m 11.3 m 
INFO 12:12:40,427 ProgressMeter - chr5:95948005 9.78e+08 5.0 m 0.0 s 31.6% 15.8 m 10.8 m 
INFO 12:13:10,429 ProgressMeter - chr6:6479181 1.07e+09 5.5 m 0.0 s 34.5% 15.9 m 10.4 m 
INFO 12:13:40,430 ProgressMeter - chr6:92259205 1.15e+09 6.0 m 0.0 s 37.3% 16.1 m 10.1 m 
INFO 12:14:10,432 ProgressMeter - chr7:12978229 1.25e+09 6.5 m 0.0 s 40.3% 16.1 m 9.6 m 
INFO 12:14:40,433 ProgressMeter - chr7:102060237 1.34e+09 7.0 m 0.0 s 43.1% 16.2 m 9.2 m 
INFO 12:15:10,435 ProgressMeter - chr8:37320269 1.43e+09 7.5 m 0.0 s 46.2% 16.2 m 8.7 m 
INFO 12:15:40,436 ProgressMeter - chr8:134665297 1.53e+09 8.0 m 0.0 s 49.3% 16.2 m 8.2 m 
INFO 12:16:10,437 ProgressMeter - chr9:94340989 1.63e+09 8.5 m 0.0 s 52.8% 16.1 m 7.6 m 
INFO 12:16:40,439 ProgressMeter - chr10:51925797 1.73e+09 9.0 m 0.0 s 56.0% 16.1 m 7.1 m 
INFO 12:17:10,440 ProgressMeter - chr11:10504845 1.83e+09 9.5 m 0.0 s 59.0% 16.1 m 6.6 m 
INFO 12:17:40,442 ProgressMeter - chr11:102575141 1.92e+09 10.0 m 0.0 s 62.0% 16.1 m 6.1 m 
INFO 12:18:10,443 ProgressMeter - chr12:60381241 2.01e+09 10.5 m 0.0 s 65.0% 16.2 m 5.7 m 
INFO 12:18:40,445 ProgressMeter - chr13:27611641 2.11e+09 11.0 m 0.0 s 68.2% 16.1 m 5.1 m 
INFO 12:19:10,446 ProgressMeter - chr14:15346101 2.20e+09 11.5 m 0.0 s 71.6% 16.1 m 4.6 m 
INFO 12:19:40,456 ProgressMeter - chr15:9565601 2.31e+09 12.0 m 0.0 s 74.8% 16.0 m 4.0 m 
INFO 12:20:10,457 ProgressMeter - chr16:5380553 2.42e+09 12.5 m 0.0 s 78.0% 16.0 m 3.5 m 
INFO 12:20:40,459 ProgressMeter - chr17:7484705 2.51e+09 13.0 m 0.0 s 81.0% 16.0 m 3.0 m 
INFO 12:21:10,460 ProgressMeter - chr18:12533677 2.59e+09 13.5 m 0.0 s 83.8% 16.1 m 2.6 m 
INFO 12:21:40,462 ProgressMeter - chr19:34306113 2.69e+09 14.0 m 0.0 s 87.0% 16.1 m 2.1 m 
INFO 12:22:10,463 ProgressMeter - chr20:55319685 2.77e+09 14.5 m 0.0 s 89.6% 16.2 m 100.0 s 
INFO 12:22:40,465 ProgressMeter - chr22:41605677 2.87e+09 15.0 m 0.0 s 92.8% 16.2 m 70.0 s 
INFO 12:23:10,466 ProgressMeter - chrX:84912589 2.97e+09 15.5 m 0.0 s 95.8% 16.2 m 40.0 s 
INFO 12:23:40,468 ProgressMeter - chrY:30850957 3.07e+09 16.0 m 0.0 s 99.1% 16.1 m 8.0 s 
INFO 12:23:47,504 ProgressMeter - done 3.10e+09 16.1 m 0.0 s 100.0% 16.1 m 0.0 s 
INFO 12:23:47,505 ProgressMeter - Total runtime 967.10 secs, 16.12 min, 0.27 hours 
INFO 12:23:47,591 MicroScheduler - 1390162 reads were filtered out during the traversal out of approximately 5682566 total reads (24.46%) 
INFO 12:23:47,591 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter 
INFO 12:23:47,592 MicroScheduler - -> 51548 reads (0.91% of total) failing BadMateFilter 
INFO 12:23:47,592 MicroScheduler - -> 352814 reads (6.21% of total) failing DuplicateReadFilter 
INFO 12:23:47,592 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
INFO 12:23:47,592 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter 
INFO 12:23:47,592 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter 
INFO 12:23:47,592 MicroScheduler - -> 985800 reads (17.35% of total) failing MappingQualityZeroFilter 
INFO 12:23:47,593 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter 
INFO 12:23:47,593 MicroScheduler - -> 0 reads (0.00% of total) failing Platform454Filter 
INFO 12:23:47,593 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter 
INFO 12:23:49,381 GATKRunReport - Uploaded run statistics report to AWS S3

Regards


Created 2013-10-05 15:40:12 | Updated | Tags: realignertargetcreator unifiedgenotyper
Comments (11)

Dear GATK team,

I'm trying to get genotype calls for whole genome data sequenced on Illumina HiSeq. I have ~1.2B reads from one individual and would like to test the effect of varying the number of reads used as input on GATK calls. I first divide the unsorted reads into parcels of ~200M. Then I use either 1 parcel or 2 parcels or 3 parcels and so on as input to the GATK pipeline to simulate having different numbers of reads. When I ran this process on hg18 aligned data, the vcf files increased in size as number of input parcels increased, as expected. However, when I ran the same process on hg19 aligned data, the vcf files sizes were all similar and the loci and read numbers in all the files were similar. The input files to UnifiedGenotyper varied in size as expected so the problem is likely at the last step.

The only difference between the hg18 and hg19 runs was that for hg18, I used the -L target.intervals option at the RealignerTargetCreator step (and later on whenever required) which solved the filter N cigar problem. Somehow that didn't work for the hg19 run and so I had to use the filter N cigar option. Could this affect the genotyping step?

A diagram of the workflow is attached.

Thank you! Stephanie


Created 2013-08-28 02:13:19 | Updated 2013-08-28 02:16:17 | Tags: realignertargetcreator gatk -l
Comments (3)

root@GR0001:~# java -Xmx2g -Djava.io.tmpdir=/nG/Data/1265/vcf1/node1/9/AnalysisTemp/ -jar /nG/Process/Tools/GATK/GenomeAnalysisTK-2.7-1-g42d771f/GenomeAnalysisTK.jar -T RealignerTargetCreator -nt 3 -L 9 -R '/nG/Reference/CommonName/dog/FASTA/chrAll.fa' -I '/nG/Data/1265/vcf1/node1/9/Databind/9.bam' -o '/nG/Data/1265/vcf1/node1/9/Databind/9.bam.intervals' INFO 11:10:17,730 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:10:17,732 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-1-g42d771f, Compiled 2013/08/21 23:02:55 INFO 11:10:17,733 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 11:10:17,733 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 11:10:17,737 HelpFormatter - Program Args: -T RealignerTargetCreator -nt 3 -L 9 -R /nG/Reference/CommonName/dog/FASTA/chrAll.fa -I /nG/Data/1265/vcf1/node1/9/Databind/9.bam -o /nG/Data/1265/vcf1/node1/9/Databind/9.bam.intervals INFO 11:10:17,738 HelpFormatter - Date/Time: 2013/08/28 11:10:17 INFO 11:10:17,738 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:10:17,738 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:10:17,879 GenomeAnalysisEngine - Strictness is SILENT INFO 11:10:18,189 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 11:10:18,198 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 11:10:18,325 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.13 INFO 11:10:24,257 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.7-1-g42d771f):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Couldn't read file /root/9 because The interval file 9 does not have one of the supported extensions (.bed, .list, .picard, .interval_list, or .intervals). Please rename your file with the appropriate extension. If 9 is NOT supposed to be a file, please move or rename the file at location /root/9
ERROR ------------------------------------------------------------------------------------------

I can't understand why call this error. please check it. Is it a bug?

update When I try to with chromosome 9 (numerical), I have the problem. There is not problem for other chromosome number (like 1,2,3,4,5,6,7,8,10,11...)


Created 2013-08-23 14:16:01 | Updated | Tags: realignertargetcreator realignment
Comments (1)

Hello

I am currently trying to run the RealignerTargetCreator on some bam files which were aligned to hg19 howver am getting this error `ERROR MESSAGE: Input files known and reference have incompatible contigs: Found contigs with the same name but different lengths:

ERROR contig known = chrM / 16571
ERROR contig reference = chrM / 16569.`

After some initial investigation I found that the supplied hg19 reference genome which was being used for mapping was using rCRS mtDNA. other then realigning to a different build of hg19 is there any way to easily fix this problem through GATK?


Created 2013-08-07 15:52:41 | Updated 2013-08-07 16:09:01 | Tags: realignertargetcreator
Comments (1)

Hi, I am working on exome capture data for barley (1.3Gbp). I am interested in variant calling to find out SNPs in my sample. I have used SAMTools SNP calling and things get done in ~1 hr whereas GATK (inspite of its several steps to prepare the BAM for variant caller) takes forever. I understand my reference is large and since its an exome capture the targeted region is only 60 Mbp of 1.3Gbp. RealignerTargetCreator is the step it takes forever to locate for sites where indel realignment is required. Do someone have any suggestions to speed it up? Or try any other variant caller? I have tried downsampling my BAM with samtools view -s and that too takes as the log says 8 more days to finish :(

Here is my sample command:

java -Xmx20g -XX:MaxPermSize=40G -jar /software/production/gatk/2.3.9/x86_64/GenomeAnalysisTK.jar -T RealignerTargetCreator -I in.bam -R ref.fa -o out.bam

Thanks, D


Created 2013-07-15 06:34:15 | Updated 2013-07-15 06:49:32 | Tags: realignertargetcreator gatk
Comments (1)

Hi, I ran the exact same command line twice with the exact same files and parameters and the output file is different: gatk -T RealignerTargetCreator -nt 8 -R /home/ngs/data/tools/gatk/hg/broad_bundle_hg19_v2.2/ucsc.hg19.fasta -I $HOME/jobout/nogroupid_$JOB_ID//JFP0435_02_R2.JFP.lane5.120817FCA_sorted_remdup.bam --known /home/ngs/data/tools/gatk/hg/broad_bundle_hg19_v2.2//1000G_phase1.indels.hg19.vcf --known /home/ngs/data/tools/gatk/hg/broad_bundle_hg19_v2.2//Mills_and_1000G_gold_standard.indels.hg19.vcf --filter_mismatching_base_and_quals -o $HOME/jobout/nogroupid_$JOB_ID//JFP0435_02_R2.JFP.lane5.120817FCA_sorted_remdup.intervals

rem :Gatk vers 2.3.

Size of the intervals files are different and when I run a 'diff' it show some differences, not huge but I wonder if it's due to the algorithm:

here is the diff result:

158138c158138 < chr2:905714-905956 --- > chr2:905685-905956 452144c452144 < chr3:195511953-195512064 --- > chr3:195511916-195512064 461418c461418 < chr4:9241955-9242059 --- > chr4:9241966-9242059 605566,605567c605566,605569 < chr5:21481723-21482150 < chr5:21482294-21482726 --- > chr5:21481723-21481740 > chr5:21481909-21482150 > chr5:21482294-21482563 > chr5:21482690-21482726 605569c605571,605573 < chr5:21484233-21484258 --- > chr5:21483481-21483649 > chr5:21483821-21484114 > chr5:21484233-21484265 615246c615250 < chr5:34189680-34190048 --- > chr5:34189714-34190048 615248a615253 > chr5:34191846-34192088 909440,909441c909445 < chr7:100643452-100643460 < chr7:100643595-100643794 --- > chr7:100643452-100643794 1008760c1008764 < chr8:86572070-86572117 --- > chr8:86572070-86572085 1008763c1008767 < chr8:86573453-86573764 --- > chr8:86573453-86573707 1478683c1478687 < chr14:19553519-19553562 --- > chr14:19553519-19553559 1478828c1478832 < chr14:20019712-20019990 --- > chr14:20019712-20019951 1994633c1994637 < chrUn_gl000212:6736-6842 --- > chrUn_gl000212:6736-6843

Kind regards

Didier


Created 2013-07-10 22:31:37 | Updated 2013-07-10 22:34:34 | Tags: realignertargetcreator commandlinegatk
Comments (5)

I started with BWA-MEM to do alignment, used Picard to process the .SAM files (converted to bam, reorder, addorreplacegroup, etc). The GATK version I'm using is version 2.5-2-gf57256b, I cannot run 2.6 because the server only has Java 6 and I cannot upgrade it to Java 7.

I got a huge stack of error message when I run this command line (RealignerTargetCrator):

java -Xmx2g -jar $CLASSPATH/GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R /Volumes/files/Users/user1/GATK_ref/hg19.fasta \ -I sorted_Deduped_reorder_grp.bam \ -o ./GATK/forIndelRealigner.intervals>

The error messages are these (sorry, a lot): I don't know why GATK needs to connect to window server? what permission problem? I am using a Mac OS X built server (remote). Thank you

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.InternalError: Can't connect to window server - not enough permissions. at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1827) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1724) at java.lang.Runtime.loadLibrary0(Runtime.java:823) at java.lang.System.loadLibrary(System.java:1045) at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:50) at java.security.AccessController.doPrivileged(Native Method) at java.awt.Toolkit.loadLibraries(Toolkit.java:1605) at java.awt.Toolkit.(Toolkit.java:1627) at sun.awt.AppContext$2.run(AppContext.java:240) at sun.awt.AppContext$2.run(AppContext.java:226) at java.security.AccessController.doPrivileged(Native Method) at sun.awt.AppContext.initMainAppContext(AppContext.java:226) at sun.awt.AppContext.access$200(AppContext.java:112) at sun.awt.AppContext$3.run(AppContext.java:306) at java.security.AccessController.doPrivileged(Native Method) at sun.awt.AppContext.getAppContext(AppContext.java:287) at com.sun.jmx.trace.Trace.out(Trace.java:180) at com.sun.jmx.trace.Trace.isSelected(Trace.java:88) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.isTraceOn(DefaultMBeanServerInterceptor.java:1830) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:929) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:916) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312) at com.sun.jmx.mbeanserver.JmxMBeanServer$2.run(JmxMBeanServer.java:1195) at java.security.AccessController.doPrivileged(Native Method) at com.sun.jmx.mbeanserver.JmxMBeanServer.initialize(JmxMBeanServer.java:1193) at com.sun.jmx.mbeanserver.JmxMBeanServer.(JmxMBeanServer.java:225) at com.sun.jmx.mbeanserver.JmxMBeanServer.(JmxMBeanServer.java:170) at com.sun.jmx.mbeanserver.JmxMBeanServer.newMBeanServer(JmxMBeanServer.java:1401) at javax.management.MBeanServerBuilder.newMBeanServer(MBeanServerBuilder.java:93) at javax.management.MBeanServerFactory.newMBeanServer(MBeanServerFactory.java:311) at javax.management.MBeanServerFactory.createMBeanServer(MBeanServerFactory.java:214) at javax.management.MBeanServerFactory.createMBeanServer(MBeanServerFactory.java:175) at sun.management.ManagementFactory.createPlatformMBeanServer(ManagementFactory.java:302) at java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:504) at org.broadinstitute.sting.gatk.executive.MicroScheduler.(MicroScheduler.java:222) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.(LinearMicroScheduler.java:70) at org.broadinstitute.sting.gatk.executive.MicroScheduler.create(MicroScheduler.java:169) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.createMicroscheduler(GenomeAnalysisEngine.java:443) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:272) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.5-2-gf57256b):

Created 2013-07-09 14:31:07 | Updated | Tags: realignertargetcreator
Comments (1)

Hi,

I am currently running GATK-2.6.4 with Java-1.7. I am trying to run RTC. It seems to get almost to the end and then I get the following error which I believe is referring to the -o argument:

INFO 06:12:01,578 ProgressMeter - done 3.10e+09 92.2 m 1.0 s 100.0% 92.2 m 0.0 s INFO 06:12:01,578 ProgressMeter - Total runtime 5530.95 secs, 92.18 min, 1.54 hours INFO 06:12:01,580 MicroScheduler - 40 reads were filtered out during the traversal out of approximately 112022776 total reads (0.00%) INFO 06:12:01,581 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter INFO 06:12:01,582 MicroScheduler - -> 0 reads (0.00% of total) failing BadMateFilter INFO 06:12:01,583 MicroScheduler - -> 8 reads (0.00% of total) failing DuplicateReadFilter INFO 06:12:01,583 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter INFO 06:12:01,584 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 06:12:01,584 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter INFO 06:12:01,585 MicroScheduler - -> 32 reads (0.00% of total) failing MappingQualityZeroFilter INFO 06:12:01,585 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter INFO 06:12:01,586 MicroScheduler - -> 0 reads (0.00% of total) failing Platform454Filter INFO 06:12:01,586 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter INFO 06:12:02,411 GATKRunReport - Uploaded run statistics report to AWS S3 /local/scratch/1373342599.1240274.shell: line 7: -o: command not found

My RTC command is as follows:

java -Xmx2g -jar GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R human_g1k_v37.fasta \ -I aln.sorted.rg.markdup.bam -o forIndelRealigner.intervals \ -known 1000G_phase1.indels.b37.vcf \ -known Mills_and_1000G_gold_standard.indels.b37.sites.vcf \

Is there a new way to specify an outfile in this version of GATK?

Thanks very much!


Created 2013-07-09 02:16:17 | Updated | Tags: indelrealigner realignertargetcreator realignment
Comments (5)

I am doing exome sequencing in 700 individuals from a species with a large genome and I would like to use GATK to realign around indels. I am using a reduced reference, which is still about 3Gb. I tested out the target creator, but it is taking 5 days for 12 individuals when each is done individually and this time frame is not feasible for all 700 individuals. I tried to run more in parallel (~30 individuals), but there are RAM limitations on our 250G server. I am currently testing out the program by running all 12 test samples as input for the same run and the time estimate is very long (on the order of several hundred weeks). Based on a preliminary run I have also included a vcf file with likely indels to try and speed the process. Can you suggest another way in which I can make the time frame for all 700 individuals more reasonable? Otherwise we will not be able to use this tool.


Created 2013-06-27 04:13:31 | Updated | Tags: indelrealigner realignertargetcreator unifiedgenotyper
Comments (7)

Dear Developers,

I Run Unified genotyper for identify the indels and deletion. I used Realigner target creater got forIndelRealigner.intervals. used this output for Indel Realigner and get the output bam file which i gave as input toUnified Genotyper. I got result as

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BRCA1

gi|262359905|ref|NG_005905.2| 48155 . G A 387.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.466;DP=47;Dels=0.00;FS=2.721;HaplotypeScore=0.0000;MLEAC=1;MLEAF=0.500;MQ=29.31;MQ0=0;MQRankSum=-4.812;QD=8.25;ReadPosRankSum=0.535 GT:AD:DP:GQ:PL 0/1:25,22:45:99:416,0,676

But if i run without these steps (Realigner target creater, Indel Realigner) i am getting the same output. Also i used the same dataset and run it in samtools and got more SNP.

I am looking for a deletion in the dataset. I would like to get the result from GATK Could you please suggest on the same.

Thanks Sridhar


Created 2013-05-22 15:47:54 | Updated | Tags: indelrealigner realignertargetcreator
Comments (4)

Hi

I've followed the suggested protocol for local realignment - first using RealignerTargetCreator and then IndelRealigner, but have unexpected results.

Let's call the two BAMs I'm realigning "normal" and "tumour" or N and T for short. Once realigned, I've split the resulting NT BAM file (using readgroup tags, although I see from the docs that it can create separate files natively) back into the original N and T BAM files and discovered something odd. I was expecting the pre-realignment N and T files to contain the same number of reads as the post-realignment files, only the coordinates that reads are mapped to would be different.

However, I notice that post-realignment files contain significantly fewer reads because unaligned reads and reads not aligned to the autosomes or sex chromosomes have been removed. However, these reads alone do not account for the difference; large numbers of reads aligned to the 24 chromosomes are now missing.

Can you tell me more about the reads that are removed? I suspect it to be an alignment quality issue, but cannot find direct reference to this behaviour in the documentation. I'm currently keeping both my pre and post-realignment bam files, but ultimately there will be space constraints and I'll have to choose and would like to make the most informed decision possible.

Regards Chris


Created 2013-04-25 12:06:23 | Updated | Tags: realignertargetcreator
Comments (13)

Hi,

I want to run RealignerTargetCreator with this command line :

qsub -b Y -N RTC -q bigmem.q "/usr/local/java/latest/bin/java -Xmx36g -jar /home/sabotf/sources/GenomeAnalysisTK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /data/projects/assembling-glab/PacBio_test/XL/filtered_subreads_XL.fasta -o /data/projects/assembling-glab/mappingResults/Tog5681Clean_vs_CG14_XL/output.intervals -I /data/projects/assembling-glab/mappingResults/Tog5681Clean_vs_CG14_XL/rmdup.bam" but this return this error :

`##### ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java
ERROR ------------------------------------------------------------------------------------------`

I tried with -Xmx4g, then,-Xmx12g, then -Xmx48g, and always the same error. I don't know what to do ... any idea ? thanks


Created 2013-04-02 09:12:12 | Updated | Tags: realignertargetcreator
Comments (1)

Hello,

I am using Realigner target creator of GATK toolbox but it keeps on giving me this error:

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.3-6-gebbba25):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
** ERROR MESSAGE: SAM/BAM file Sample_27.sorted.bam is malformed: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups**
ERROR ------------------------------------------------------------------------------------------

How could I solve this problem.

Thanks a lot in advance.

Comments (5)

Hello,

I am having trouble calling variants using Haplotype Caller on simulated exome reads. I have been able to call reasonable-looking variants on the exome (simulated with dwgsim) with HaplotypeCaller before running it through the Best Practices Pre-Processing pipeline. The pre-processed data worked fine with UnifiedGenotyper but with HaplotypeCaller, though it runs without errors and seems to walk across the genome, only outputs a VCF header. I have tried calling variants with and without using -L to provide the exome regions (as recommended in this forum post: http://gatkforums.broadinstitute.org/discussion/1681/expected-file-size-haplotype-caller) but this hasn't made a difference - when we run the command with the pre-processed BAMs, we only get a VCF header. Everything has been tested with both 2.4-7 and 2.4-9.

Any help or guidance would be greatly appreciated!

Command Used for HaplotypeCaller:

java -Xmx4g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta -I exome.realigned.dedup.recal.bam -o exome.raw.vcf -D dbsnp_137.hg19.vcf -stand_emit_conf 10 -rf BadCigar -L Illumin_TruSeq.bed --logging_level DEBUG

Commands Used for pre-processing (run in sequence using a Perl script):

java -Xmx16g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -nt 8 -R ucsc.hg19.fasta -I exome.bam -o exome.intervals -known dbsnp_137.hg19.vcf

java -Xmx4g -jar GenomeAnalysisTK.jar -T IndelRealigner -R ucsc.hg19.fasta -I exome.bam -o exome.realigned.bam -targetIntervals intervals.bam -known dbsnp_137.hg19.vcf

java -Xmx16g -jar MarkDuplicates.jar I=exome.realigned.bam METRICS_FILE=exome.dups O=exome.realigned.dedup.bam

samtools index exome.realigned.dedup

java -Xmx4g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -nct 8 -R ucsc.hg19.fasta -I exome.realigned.dedup.bam -o exome.recal_data.grp -knownSites dbsnp_137.hg19.vcf -cov ReadGroupCovariate -cov ContextCovariate -cov CycleCovariate -cov QualityScoreCovariate

java -Xmx4g -jar GenomeAnalysisTK.jar -T PrintReads -nct 8 -R ucsc.hg19.fasta -I exome.realigned.dedup.bam -BQSR exome.recal_data.grp -baq CALCULATE_AS_NECESSARY -o exome.realigned.dedup.recal.bam


Created 2013-03-21 23:28:13 | Updated | Tags: indelrealigner realignertargetcreator
Comments (5)

Hi, I got errors when ran GATK RealignerTargetCreator and IndelRealigner in v2.4.9. I've checked many related discussions and comments. First, I got an error like "we encountered an extremely high quality score of 69" with option -S LENIENT and the GATK program stalled. So I added "--fix_misencoded_quality_scores", and then I got different error message "ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '0'" now. I tried older versions of GATK and both java 1.6 and 1.7. I'm hoping that you can help this. Please let me know if I'm missing something. Thanks!

--Giltae


Created 2013-03-18 20:59:06 | Updated | Tags: indelrealigner realignertargetcreator
Comments (5)

Hi,

I am trying to decide between two approaches for performing realignment around indels. I have ~600 samples that have been aligned to a very fragmented draft genome assembly. What is best:
1. take each sample and create a list of targets, followed by realignment on each sample.
2. combine all samples into one large bam file and create a list of targets, followed by realignment on the same large bam file.

Also, would there be any advantages in terms of speed with either approach?

Cheers,

Steve


Created 2013-03-13 12:13:22 | Updated 2013-03-13 12:40:59 | Tags: realignertargetcreator baserecalibrator realignment
Comments (13)

Hi,

I am currently working with a project where we have sequenced a library of approximately 70 bps insert sizes using 2x100 paired-end seq. While this can seem unnecessary, it can improve base qualities a lot.

I have used SeqPrep (https://github.com/jstjohn/SeqPrep) which strips adaptors and merges reads that overlap, in our case the entire read most of the times. This also boosts the base qualities, if a base was sequenced twice, the quality improves quite a bit. This way, base qualities can stretch up to 70 and over (probability of error 0.0001 x 0.0001 if both reads had Q40 at that base, it merged qual = 80). No funny business there. :)

However, this does not seem to play nicely with GATK. The realignment crashes (see below) saying the the base quals must be erroneous. In my case, but they are correct. Can I force GATK to work with these BQs? (--validation_strictness LENIENT didn't help as you can see below :)

cheers Daniel Klevebring

   INFO  13:04:07,408 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:07,411 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.4-7-g5e89f01, Compiled 2013/03/06 01:01:28 
   INFO  13:04:07,411 HelpFormatter - Copyright (c) 2010 The Broad Institute 
   INFO  13:04:07,411 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
   INFO  13:04:07,416 HelpFormatter - Program Args: -T RealignerTargetCreator -I /scratch/3041404/P394_102.prmdup.bam -R /bubo/proj/b2010040/private/GoldenPath/hg19/GATK_resource_bundle/human_g1k_v37_clean.fasta -o /scratch/3041404/P394_102.realn.intervals --intervals /bubo/proj/b2010040/private/GoldenPath/NG_design/1000G_REF_picard_custom_design_target_regions_HG19.bed.interval_list --validation_strictness LENIENT 
   INFO  13:04:07,416 HelpFormatter - Date/Time: 2013/03/13 13:04:07 
   INFO  13:04:07,416 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:07,416 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:08,461 GenomeAnalysisEngine - Strictness is LENIENT 
   INFO  13:04:08,632 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
   INFO  13:04:08,640 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
   INFO  13:04:08,655 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
   INFO  13:04:09,782 IntervalUtils - Processing 39772003 bp from intervals 
   INFO  13:04:10,001 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files 
   INFO  13:04:10,262 GenomeAnalysisEngine - Done creating shard strategy 
   INFO  13:04:10,262 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
   INFO  13:04:10,263 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
   INFO  13:04:18,482 GATKRunReport - Uploaded run statistics report to AWS S3 
   ##### ERROR ------------------------------------------------------------------------------------------
   ##### ERROR A USER ERROR has occurred (version 2.4-7-g5e89f01): 
   ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
   ##### ERROR Please do not post this error to the GATK forum
   ##### ERROR
   ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
   ##### ERROR Visit our website and forum for extensive documentation and answers to 
   ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
   ##### ERROR
   ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/scratch/3041404/P394_102.prmdup.bam} appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 70; please see the GATK --help documentation for options related to this error
   ##### ERROR ------------------------------------------------------------------------------------------

Created 2013-03-04 14:02:23 | Updated | Tags: realignertargetcreator runtime
Comments (5)

Hello,

I try to run the realigner target creator with the data divided on chromosomes. It seems to run smooth but the runtime for chromosome 1 seems to never end. I have checked my bam files and as expected the one with chr 1 is a little bit bigger than chr 2 but nothing proportional to the differences in predicted runtime. The version i use is GATK 2.3.0 and this is how i run it:

java -Xmx12g -jar programs/GenomeAnalysisTK-2.3-0/GenomeAnalysisTK.jar -l INFO -T VariantRecalibrator -R Homo_sapiens.GRCh37.57_dna_concat.fa -recalFile allchr_varrecal_BOTH_comb_ref.intervals -rscriptFile 15_allchr_varrecal_BOTH_comb_ref.intervals.plots.R -tranchesFile allchr_varrecal_BOTH_comb_ref.intervals.tranches -resource:hapmap,VCF,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.sites.vcf -resource:omni,VCF,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.b37.sites.vcf -resource:dbsnp,VCF,known=true,training=false,truth=false,prior=8.0 dbsnp_135.b37.vcf -resource:mills,VCF,known=true,training=true,truth=true,prior=12.0 Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ --mode BOTH -nt 8 -input allchr_real_recal_resrt_raw_BOTH_comb_ref.vcf -L Agilent_SureSelect.V4.GRCh37.70_targets_nochr.bed.pad100.interval_list --pedigreeValidationType SILENT --pedigree my_fam.fam

predicted runtime looks like:

INFO 15:11:33,778 ProgressMeter - 1:33587200 3.36e+07 2.3 h 4.1 m 1.1% 8.8 d 8.7 d INFO 15:11:33,778 ProgressMeter - 11:73495886 1.82e+09 2.3 h 4.5 s 61.0% 3.7 h 87.4 m INFO 15:11:33,778 ProgressMeter - 10:6623707 1.68e+09 2.3 h 4.9 s 54.5% 4.2 h 114.3 m INFO 15:11:33,778 ProgressMeter - 11:572298 1.82e+09 2.3 h 4.5 s 58.7% 3.9 h 96.4 m INFO 15:11:33,779 ProgressMeter - 11:122078003 1.82e+09 2.3 h 4.5 s 62.6% 3.6 h 81.7 m INFO 15:11:53,780 ProgressMeter - 11:10589211 1.82e+09 2.3 h 4.5 s 59.0% 3.9 h 95.3 m INFO 15:11:53,780 ProgressMeter - 11:105031869 1.82e+09 2.3 h 4.5 s 62.1% 3.7 h 83.9 m INFO 15:11:53,781 ProgressMeter - 11:46994477 1.82e+09 2.3 h 4.5 s 60.2% 3.8 h 90.8 m INFO 15:12:03,779 ProgressMeter - 11:8677862 1.82e+09 2.3 h 4.5 s 58.9% 3.9 h 95.7 m INFO 15:12:03,779 ProgressMeter - 11:130690257 1.82e+09 2.3 h 4.5 s 62.9% 3.6 h 81.1 m INFO 15:12:13,779 ProgressMeter - 11:84816026 1.82e+09 2.3 h 4.5 s 61.4% 3.7 h 86.4 m INFO 15:12:13,779 ProgressMeter - 10:17039143 1.68e+09 2.3 h 4.9 s 54.8% 4.2 h 113.3 m INFO 15:12:23,780 ProgressMeter - 11:18556991 1.82e+09 2.3 h 4.5 s 59.3% 3.9 h 94.6 m INFO 15:12:23,780 ProgressMeter - 11:113365440 1.82e+09 2.3 h 4.5 s 62.3% 3.7 h 83.2 m INFO 15:12:23,782 ProgressMeter - 11:55284794 1.82e+09 2.3 h 4.5 s 60.4% 3.8 h 90.1 m INFO 15:12:33,780 ProgressMeter - 1:33783808 3.38e+07 2.3 h 4.1 m 1.1% 8.8 d 8.7 d INFO 15:12:33,780 ProgressMeter - 12:3311608 1.95e+09 2.3 h 4.2 s 63.1% 3.6 h 80.5 m INFO 15:12:43,779 ProgressMeter - 11:93292307 1.82e+09 2.3 h 4.6 s 61.7% 3.7 h 85.8 m INFO 15:12:43,780 ProgressMeter - 10:24841443 1.68e+09 2.3 h 4.9 s 55.1% 4.2 h 112.5 m INFO 15:12:43,780 ProgressMeter - 11:19408689 1.82e+09 2.3 h 4.6 s 59.3% 3.9 h 94.8 m INFO 15:12:53,965 ProgressMeter - 11:26512308 1.82e+09 2.3 h 4.6 s 59.5% 3.9 h 94.0 m INFO 15:12:53,965 ProgressMeter - 11:121738354 1.82e+09 2.3 h 4.6 s 62.6% 3.7 h 82.6 m INFO 15:12:53,965 ProgressMeter - 11:63468191 1.82e+09 2.3 h 4.6 s 60.7% 3.8 h 89.4 m INFO 15:13:03,781 ProgressMeter - 12:11960820 1.95e+09 2.3 h 4.3 s 63.4% 3.6 h 79.8 m INFO 15:13:13,780 ProgressMeter - 11:101551879 1.82e+09 2.3 h 4.6 s 61.9% 3.7 h 85.1 m INFO 15:13:13,780 ProgressMeter - 10:32238974 1.68e+09 2.3 h 4.9 s 55.3% 4.2 h 111.8 m INFO 15:13:13,780 ProgressMeter - 11:27204534 1.82e+09 2.3 h 4.6 s 59.5% 3.9 h 94.1 m INFO 15:13:23,966 ProgressMeter - 11:71529440 1.82e+09 2.3 h 4.6 s 61.0% 3.8 h 88.8 m INFO 15:13:23,968 ProgressMeter - 11:34170083 1.82e+09 2.3 h 4.6 s 59.8% 3.9 h 93.4 m INFO 15:13:23,977 ProgressMeter - 11:129889715 1.82e+09 2.3 h 4.6 s 62.9% 3.7 h 81.9 m INFO 15:13:33,781 ProgressMeter - 1:33980416 3.40e+07 2.3 h 4.1 m 1.1% 8.8 d 8.7 d

Any suggestions?

Regards,

Måns


Created 2013-03-03 19:16:19 | Updated | Tags: indelrealigner realignertargetcreator
Comments (23)

Hi,

I have downloaded newest version of GATK (version 2.4-3) this week and tried to perform local realignment for my targeted sequencing data. Reference genome, SNP and indel data files were downloaded from resource bundle. However, I encountered two issues when I was doing the realignment.

First, in the step of RealignerTargetCreator. With the same command line, if I run it under version 2.4-3, I got an error message "MESSAGE: -49" (no other detail information provided); if I run it under an older version 2.3-9, it ran very well with no errors.

Second, in the step of IndelRealigner. I got error message "MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '13'". However, reference genome was downloaded from the bundle. I am not sure how to fix this issue.

I hope someone can help me with these issues. Let me know if more info is needed.

Thanks!


Created 2013-02-23 19:50:26 | Updated 2013-02-23 19:56:12 | Tags: realignertargetcreator picard
Comments (9)

Hi there,

I get an error when I try to run GATK with the following command:

java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa -I  merged_bam_files_indexed_markduplicate.bam -o reads.intervals

However I get this error:

SAM/BAM file SAMFileReader{/merged_bam_files_indexed_markduplicate.bam} is malformed: Read HWI-ST303_0093:5:5:13416:34802#0 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK.  Please use http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem

It suggest that it a header issue however my bam file has a header:

samtools view -h merged_bam_files_indexed_markduplicate.bam | grep ^@RG
@RG     ID:test1      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test   CN:japan
@RG     ID:test2      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test    CN:japan

when I grep the read within the error:

HWI-ST303_0093:5:5:13416:34802#0        99      1       1090    29      23S60M17S       =       1150    160     TGTTTGGGTTGAAGATTGATACTGGAAGAAGATTAGAATTGTAGAAAGGGGAAAACGATGTTAGAAAGTTAATACGGCTTACTCCAGATCCTTGGATCTC        GGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGFGGGGGGGGGDGFGFGGGGGFEDFGEGGGDGEG?FGGDDGFFDGGEDDFFFFEDG?E        MD:Z:60 PG:Z:MarkDuplicates     RG:Z:test1      XG:i:0  AM:i:29 NM:i:0  SM:i:29 XM:i:0  XO:i:0  XT:A:M

Following Picard solution:

java -XX:MaxDirectMemorySize=4G -jar picard-tools-1.85/AddOrReplaceReadGroups.jar I= test.bam O= test.header.bam SORT_ORDER=coordinate RGID=test RGLB=test  RGPL=Illumina RGSM=test/ RGPU=HWI-ST303  RGCN=japan CREATE_INDEX=True 

I get this error after 2 min.:

Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 12247781, Read name HWI-ST303_0093:5:26:10129:50409#0, MAPQ should be 0 for unmapped read.`

Any recommendation on how to solve this issue ?

My plan is to do the following to resolve the issue:

picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true VALIDATION_STRINGENCY=LENIANT
samtools  index test_markduplicate.bam

I see a lot of messages like below but the command still running:

Ignoring SAM validation error: ERROR: Record (number), Read name HWI-ST303_0093:5:5:13416:34802#0, RG ID on SAMRecord not found in header: test1

while running the command

then try the GATK RealignerTargetCreator

I already tried to do the following

java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa -I  merged_bam_files_indexed_markduplicate.bam -o reads.intervals --validation_strictness LENIENT

But I still got the same error

N.B: the same command run with no issue with GATK version (1.2)

My pipeline in short: mapping the paired end reads with

bwa aln -q 20 ref.fa read > files.sai
bwa sampe ref.fa file1.sai file2.sai read1 read2 > test1.sam
samtools view -bS test1.sam | samtools sort - test
samtools  index test1.bam
samtools merge -rh RG.txt test test1.bam test2.bam

RG.txt

@RG     ID:test1      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test   CN:japan
@RG     ID:test2      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test    CN:japan

samtools  index test.bam
picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true VALIDATION_STRINGENCY=SILENT
samtools  index test_markduplicate.bam

Created 2012-12-06 10:59:01 | Updated 2012-12-06 15:45:06 | Tags: realignertargetcreator markduplicates
Comments (1)

I've the following queries on running RealignerTargetCreator module in GATK1.4.

1) Is it recommended to provide the target capture BED file to RealignerTargetCreator in case of targeted/exome experiments? Without the bed file, the tool is taking long time (~6-7 hrs). What's the optimal way here?

2) Does running mark duplicates before or after 'RealignerTargetCreator' have any effect on the # of snps/indels? What is recommended?

Look forward to your comments. Raj


Created 2012-11-27 15:38:23 | Updated | Tags: indelrealigner realignertargetcreator
Comments (17)

I can't seem to run the IndelRealigner on reads that contain colons, ":" in the reference scaffold names. The RealignerTargetCreator step works correctly and generates the interval table, but the second, IndelRealigner, step fails. When I look at the generated interval table, I see the interval delimiter is a colon, which I imagine is the problem.

Unfortunately, I have a set of human references that have a colon in every scaffold name, so changing this would be a massive undertaking.

I believe this problem could be solved if you searched for the colon delimiter from the end of the interval string instead of from the beginning, so I'm hoping this a real simple fix.

Thanks!


Created 2012-11-08 12:31:03 | Updated 2012-11-08 18:05:56 | Tags: indelrealigner realignertargetcreator unifiedgenotyper variants file-size
Comments (1)

HI

I am using the following set of commands on GATK2.1.13 to generate a VCF file

echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -I B2_with_ReadGroup.ddup.sorted.bam -R human_g1k_v37.fasta -T RealignerTargetCreator  -o my.intervals -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key`
echo "Realignment Done at `date`"
echo "Starting IndelRealigner at `date`"

echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -I B2_with_ReadGroup.ddup.sorted.bam -R human_g1k_v37.fasta -T IndelRealigner -targetIntervals my.intervals -o myrealignedBam.bam  -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key`
echo "Realignment done at `date`"
echo "Starting UnifiedGenotyper at `date`"
echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -l INFO -R human_g1k_v37.fasta -T UnifiedGenotyper    -I myrealignedBam.bam    -o mygatk_vcf.vcf    --output_mode EMIT_ALL_SITES -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key`
echo "Gentoypxing complete at `date`"

When i do a 'mpileup' for B2_with_ReadGroup.ddup.sorted.bam , I get a devcent 10 MB VCF file. But on the last ste of the above pipeline, my " mygatk_vcf.vcf " is goinging into 81GBs !!

Do you know what is wrong ?


Created 2012-11-01 14:16:20 | Updated 2012-11-01 19:16:53 | Tags: realignertargetcreator
Comments (20)

java -jar GenomeAnalysisTK.jar -R exampleFASTA.fasta -I exampleBAM.bam -T RealignerTargetCreator -o exampleTarget.intervals

I used the example files, and I got no error report, but in the output, there is no data, that is to say. The exampleTarget.intervals is empty.

Why is this happening?

And when I add --known dbsnp_135.hg19.vcf( that is in the bundle), it will get an error

ERROR MESSAGE: Input files known and reference have incompatible contigs: Found contigs with the same name but different lengths:
ERROR contig known = chr1 / 249250621
ERROR contig reference = chr1 / 100000.

Created 2012-10-22 08:23:25 | Updated 2013-01-07 20:12:22 | Tags: realignertargetcreator bwa realignment
Comments (1)

Hello,

before I only used BWA and as you described in the best pratice I performed the realign step. Now I want to integrate in my pipeline Stampy associated with BWA.

Do you think, I should make the realign step ?

Thanks !


Created 2012-08-07 05:52:36 | Updated 2012-08-08 15:44:43 | Tags: realignertargetcreator
Comments (4)

Hi all,

We're doing some analysis on quite big data and time is an issue, so I did a bit of scaling testing on a subset of the data before beginning. The results were unexpected.

When I run GATK RealignerTargetCreator with -nt 8 and give it 8 cores to work with, it actually takes about 2.5 times LONGER than if I just run it single-threaded. I don't mean that the user or CPU time goes up - the real, walltime goes up. In the -nt 8 case, the 8 cores would have been on a single node of our cluster with shared memory.

I tried testing on two different kinds of subsets of the data and both performed worse when multithreaded. I first tried restricting the input data by genomic region, ie just analysing chr22. When multithreading didn't seem to be working as expected in this test, I thought that maybe GATK was trying to parallelise over genomic regions, so I instead tried testing on a single lane of input data (a 9.6G bam file spread over the whole genome). This also ran more slowly when multithreaded.

So my question is: should I use -nt 8 in my real analysis even though it was a bad option in testing? Is it possible that multithreading will be bad for small amounts of data, but good in the large-data case? Or, does this indicate that I'm doing something wrong when trying to run RealignerTargetCreator multithreaded?

I really would like to use the fastest option for the real data as it will be very big. Any help much appreciated.

Thanks, Clare


Created 2012-08-03 05:30:36 | Updated 2012-08-03 05:32:30 | Tags: indelrealigner realignertargetcreator mergeintervallists
Comments (1)

Hi all - I'm using GATK realigner which can take several hours on my samples. I'm trying to optimize my pipeline by dividing this up by chromosome for each node in my cluster. I can call RealignerTargetCreator using the -L parameter for each chromosome which results in a bunch of interval files. Now, I either want to call IndelRealigner using the -L parameter for each chromsome then merge the resulting BAM files, or merge the interval files into one then call IndelRealigner.

1) I don't see a way to merge interval files using GATK. Is this possible?

or

2) Can I call IndelRealigner and process each chromosome separately then merge the resulting BAM files together?