Tagged with #commandlinegatk
1 documentation article | 1 announcement | 34 forum discussions



Created 2014-10-02 17:29:07 | Updated 2014-10-02 18:22:46 | Tags: commandlinegatk commandline argument syntax

Comments (0)

Overview

This document describes how GATK commands are structured and how to add arguments to basic command examples.


Basic java syntax

Commands for GATK always follow the same basic syntax:

java [Java arguments] -jar GenomeAnalysisTK.jar [GATK arguments]

The core of the command is java -jar GenomeAnalysisTK.jar, which starts up the GATK program in a Java Virtual Machine (JVM). Any additional java-specific arguments (such as -Xmx to increase memory allocation) should be inserted between java and -jar, like this:

java -Xmx4G -jar GenomeAnalysisTK.jar [GATK arguments]

The order of arguments between java and -jar is not important.


GATK arguments

There are two universal arguments that are required for every GATK command (with very few exceptions, the clp-type utilities), -R for Reference (e.g. -R human_b37.fasta) and -T for Tool name (e.g. -T HaplotypeCaller).

Additional arguments fall in two categories:

  • Engine arguments like -L (for specifying a list of intervals) which can be given to all tools and are technically optional but may be effectively required at certain steps for specific analytical designs (e.g. the -L argument for calling variants on exomes);

  • Tool-specific arguments which may be required, like -I (to provide an input file containing sequence reads to tools that process BAM files) or optional, like -alleles (to provide a list of known alleles for genotyping).

The ordering of GATK arguments is not important, but we recommend always passing the tool name (-T) and reference (-R) first for consistency. It is also a good idea to consistently order arguments by some kind of logic in order to make it easy to compare different commands over the course of a project. It’s up to you to choose what that logic should be.

All available engine and tool-specific arguments are listed in the tool documentation section. Arguments typically have both a long name (prefixed by --) and a short name (prefixed by -). The GATK command line parser recognizes both equally, so you can use whichever you prefer, depending on whether you prefer commands to be more verbose or more succinct.

Finally, a note about flags. Flags are arguments that have boolean values, i.e. TRUE or FALSE. They are typically used to enable or disable specific features; for example, --keep_program_records will make certain GATK tools output additional information in the BAM header that would be omitted otherwise. In GATK, all flags are set to FALSE by default, so if you want to set one to TRUE, all you need to do is add the flag name to the command. You don't need to specify an actual value.


Examples of complete GATK command lines

This is a very simple command that runs HaplotypeCaller in default mode on a single input BAM file containing sequence data and outputs a VCF file containing raw variants.

java -Xmx4G -jar GenomeAnalysisTK.jar -R human_b37.fasta -T HaplotypeCaller -I sample1.bam -o raw_variants.vcf

If the data is from exome sequencing, we should additionally provide the exome targets using the -L argument:

java -Xmx4G -jar GenomeAnalysisTK.jar -R human_b37.fasta -T HaplotypeCaller -I sample1.bam -o raw_variants.vcf -L exome_intervals.list

If we just want to genotype specific sites of interest using known alleles based on results from a previous study, we can change the HaplotypeCaller’s genotyping mode using -gt_mode, provide those alleles using -alleles, and restrict the analysis to just those sites using -L:

java -Xmx4G -jar GenomeAnalysisTK.jar -R human_b37.fasta -T HaplotypeCaller -I sample1.bam -o raw_variants.vcf -L known_alleles.vcf -alleles known_alleles.vcf -gt_mode GENOTYPE_GIVEN_ALLELES

For more examples of commands and for specific tool commands, see the tool documentation section.


Created 2014-10-02 18:27:29 | Updated | Tags: commandlinegatk commandline syntax

Comments (0)

I'm not sure why it hadn't occurred to us to do this before, but we've finally done it: an FAQ article that formally explains how GATK commands are structured, what are the basic types of arguments, and how to string them all together.

We realized that command structure requirements can be confusing, if you are new to command line programs, if only because so many toolkits use fairly different ones. For example, Picard tools (which are also developed at the Broad!) have separate jar files for each tool in the toolkit, while GATK has one jar file containing all the tools. The Picard syntax for passing argument values is also different; they use = to join the argument name and value, while GATK commands just take a space.

So if that's something you need help with, check out the doc! We'd love to hear from people who are new to GATK about whether this is helpful and how we can improve it further.


Created 2016-04-15 22:01:44 | Updated | Tags: commandlinegatk haplotypecaller genotypegvcfs joint-calling

Comments (3)

I have been following the best practices outline for calling SNPs on our samples, but I'm a little confused as to what to do with the VCF file produced following the joint genotyping/genotypeGVCFs step.

I understand the principle of gVCF calling for the most part, but my confusion is what are we to do with the VCF file once we do the joint genotyping step? We are looking at a F1 mapping population of a non-model organism, so does this VCF file have individual progeny (bam file names) indicated within it? I think not since I can't find any of the sample names while scrolling through it.

Can this VCF file be used to construct a pedigree file to use during genotype refinement? Should it be somehow fed back into Haplotypecaller to inform on likely calls during a second round of variant calling? Do you use it to go back to the individual gVCF files to extract the high confidence variants?

There seems to be a good amount of literature on the Broad websites about what a gVCF file is and how to perform joint genotyping, but not much direction about what to do with the joint genotyped VCF file once it is produced.

Any advice or referral to other walkthroughs/guides would be very appreciated.

Michael

[extra project information: My project involves calling SNPs across a mapping population for a non-model organism with the intent of mapping a trait. The goal is to produce robust SNP calls for each individual progeny (of which we have 30 currently, and >60 in the near future) and the two parents. We only have halfway-decent sequencing coverage of ~10-20x for each sample, which is thus why doing gVCF calling and joint genotyping sounds attractive to us. Since we work on a non-model, we also lack previously produced "gold standard" SNP sets or other resources allowing us to refine genotypes.]


Created 2016-03-01 13:02:06 | Updated 2016-03-01 13:08:47 | Tags: baserecalibrator commandlinegatk queue qscript analyzecovariates

Comments (8)

Dear GATK team,

I'd like to ask a question about the possibility of HaplotypeCaller and AnalyzeCovariates running in parallel: I developed a QScript that runs indel realignment, BQSR, variant calling (obtaining gVCF file as a result) and then -- BaseRecalibrator for the second time followed by AnalyzeCovariates. Looking in QScript jobreport PDF file I noticed that the second run of BaseRecalibrator was performed after HaplotypeCaller, though as I can understand, HaplotypeCaller and the second run of BaseRecalibrator are independent regarding data and potentially can run in paralle.

Can I run them in parallel somehow in order to save time?


Created 2016-02-23 18:01:29 | Updated | Tags: commandlinegatk queue qscript analyzecovariates

Comments (2)

Dear GATK team,

I use Queue to build a pipeline with GATK tools: RealignerTargetCreator, IndelRealigner, BaseRecalibrator, AnalyzeCovariates, and HaplotypeCaller. So, I developed the corresponding QScript. When I tested it the very first time, all functions finished with no errors. But then, when I invoked it with -startfromScratch it failed to execute AnalyzeCovariates saying:

ERROR 19:08:14,037 FunctionEdge - Error: 'java' '-Xmx16384m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLi mit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=tmp' '-cp' 'Queue.jar' 'org.broadinstitute.gatk.engine.CommandLineGATK' '-T' 'AnalyzeCovariates' '-L' 'intervals_to_process.interval_list' '-R' 'Homo_sapiens_assembly38.fasta' '-before' 'recal-table1.txt' '-after' 'recal-table2.txt' '-plots' 'bqsr-report.pdf' '-csv' 'bqsr-report.csv' ERROR 19:08:14,045 FunctionEdge - Contents of bqsr-report.pdf.out: [...]

In bqsr-report.out file I can see no errors:

INFO 17:25:12,764 HelpFormatter - -------------------------------------------------------------------------------- INFO 17:25:12,767 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:40 INFO 17:25:12,767 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 17:25:12,767 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 17:25:12,771 HelpFormatter - Program Args: -T AnalyzeCovariates -L intervals_to_process.interval_list -R Homo_sapiens_assembly38.fasta -before recal-table1.txt -after recal-table2.txt -plots bqsr-report.pdf -csv bqsr-report.csv INFO 17:25:12,779 HelpFormatter - Executing as [...] INFO 17:25:12,779 HelpFormatter - Date/Time: 2016/02/23 17:25:12 INFO 17:25:12,780 HelpFormatter - -------------------------------------------------------------------------------- INFO 17:25:12,780 HelpFormatter - -------------------------------------------------------------------------------- INFO 17:25:12,838 GenomeAnalysisEngine - Strictness is SILENT INFO 17:25:13,079 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 17:25:13,196 IntervalUtils - Processing 3088286401 bp from intervals INFO 17:25:13,277 GenomeAnalysisEngine - Preparing for traversal INFO 17:25:13,287 GenomeAnalysisEngine - Done preparing for traversal INFO 17:25:13,287 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 17:25:13,288 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 17:25:13,289 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime INFO 17:25:13,764 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3 INFO 17:25:13,921 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3 INFO 17:25:13,931 AnalyzeCovariates - Generating csv file 'bqsr-report.csv' INFO 17:25:14,164 AnalyzeCovariates - Generating plots file 'bqsr-report.pdf' INFO 17:25:20,279 Walker - [REDUCE RESULT] Traversal result is: org.broadinstitute.gatk.tools.walkers.bqsr.AnalyzeCovariates$None@3ce295f9 INFO 17:25:20,282 ProgressMeter - done 0.0 6.0 s 11.6 w 100.0% 6.0 s 0.0 s INFO 17:25:20,283 ProgressMeter - Total runtime 7.00 secs, 0.12 min, 0.00 hours INFO 17:25:21,513 GATKRunReport - Uploaded run statistics report to AWS S3

I can see that AnalyzeCovariates and HaplotypeCaller start nearly at the same time:

INFO 17:25:09,264 FunctionEdge - Starting: 'java' '-Xmx16384m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=tmp' '-cp' 'Queue.jar' 'org.broadinstitute.gatk.engine.CommandLineGATK' '-T' 'AnalyzeCovariates' '-L' 'intervals_to_process.interval_list' '-R' 'Homo_sapiens_assembly38.fasta' '-before' 'recal-table1.txt' '-after' 'recal-table2.txt' '-plots' 'bqsr-report.pdf' '-csv' 'bqsr-report.csv' INFO 17:25:09,264 FunctionEdge - Output written to bqsr-report.pdf.out INFO 17:25:21,548 FunctionEdge - Starting: 'java' '-Xmx16384m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=tmp' '-cp' 'Queue.jar' 'org.broadinstitute.gatk.engine.CommandLineGATK' '-T' 'HaplotypeCaller' '-I' 'recaled.bam' '-L' 'intervals_to_process.interval_list' '-R' 'Homo_sapiens_assembly38.fasta' '-variant_index_type' 'LINEAR' '-variant_index_parameter' '128000' '-o' 'sample.gvcf' '-D' 'dbsnp_144.hg38.vcf' '-ERC' 'GVCF' '-pcrModel' 'CONSERVATIVE' INFO 17:25:21,548 FunctionEdge - Output written to sample.gvcf.out

and after HaplotypeCaller finishes successfully (I checked that it produced a gVCF file), this message about AnalyzeCovariates error is printed:

INFO 19:08:14,029 QGraph - 0 Pend, 2 Run, 0 Fail, 10 Done ERROR 19:08:14,037 FunctionEdge - Error: 'java' '-Xmx16384m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLi mit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=tmp' '-cp' 'Queue.jar' 'org.broadinstitute.gatk.engine.CommandLineGATK' '-T' 'AnalyzeCovariates' '-L' 'intervals_to_process.interval_list' '-R' 'Homo_sapiens_assembly38.fasta' '-before' 'recal-table1.txt' '-after' 'recal-table2.txt' '-plots' 'bqsr-report.pdf' '-csv' 'bqsr-report.csv'

Of course, AnalyzeCovariates didn't produce CSV and PDF reports this way.

When I invoked the QScript again (this time not from scratch) to reproduce the situation, it executed with no errors, and AnalyzeCovariates generated both CSV and PDF reports.

After that I executed AnalyzeCovariates manually with -l DEBUG and it produced only CSV report, no PDF. Here is the central part of the debug output of AnalyzeCovariates:

`DEBUG 20:38:28,593 RecalUtils - R command line: Rscript (resource)org/broadinstitute/gatk/engine/recalibration/BQSR.R bqsr-report.csv recal-table1.txt bqsr-report1.pdf DEBUG 20:38:28,607 RScriptExecutor - Executing: DEBUG 20:38:28,607 RScriptExecutor - Rscript DEBUG 20:38:28,607 RScriptExecutor - -e DEBUG 20:38:28,608 RScriptExecutor - tempLibDir = '/tmp/Rlib.4876869209103817519';source('/tmp/BQSR.3779158431229884689.R'); DEBUG 20:38:28,608 RScriptExecutor - bqsr-report.csv DEBUG 20:38:28,608 RScriptExecutor - recal-table1.txt DEBUG 20:38:28,608 RScriptExecutor - bqsr-report1.pdf

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

lowess

Warning messages: 1: NAs introduced by coercion 2: NAs introduced by coercion DEBUG 20:38:34,790 RScriptExecutor - Result: 0 INFO 20:38:34,792 Walker - [REDUCE RESULT] Traversal result is: org.broadinstitute.gatk.tools.walkers.bqsr.AnalyzeCovariates$Non e@44723d95 INFO 20:38:34,795 ProgressMeter - done 0.0 7.0 s 11.6 w 100.0% 7.0 s 0.0 s INFO 20:38:34,795 ProgressMeter - Total runtime 7.03 secs, 0.12 min, 0.00 hours `

So, every time, when I start my QScript from scratch AnalyzeCovariates fails, but it finished successfully when I invoke my script the next time without -startFromScratch. When I execute AnalyzeCovariates manually with -l DEBUG, it produces CSV report only.

Should I use BQSR.R script directly?

I will be very grateful for any tips and help.


Created 2016-02-19 10:02:26 | Updated | Tags: realignertargetcreator commandlinegatk too-high-quality-score

Comments (4)

Dear GATK team,

I process a human FASTQ file with paired-end Illumina reads according to GATK Best Practices. Everything was fine untill I launched RealignerTargetCreator on a BAM file with marked duplicates. RealignerTargetCreator output the following:

ERROR MESSAGE: SAM/BAM/CRAM file htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter@57ba286d appears to be usin

g the wrong encoding for quality scores: we encountered an extremely high quality score of 62. Please see https://www.broadinstit ute.org/gatk/guide?id=6470 for more details and options related to this error.

I generated quality score histogram with Picard QualityScoreDistribution and got the following disctribution:

HISTOGRAM java.lang.Byte

QUALITY COUNT_OF_Q 33 782589 36 3094 37 6348 38 34557 39 45918 40 28015 41 31737 42 11324 43 6256 44 20213 45 8860 46 20973 47 25098 48 27728 49 61243 50 38519 51 38296 52 26707 53 53208 54 65936 55 89735 56 113264 57 184975 58 154421 59 142914 60 221025 61 354233 62 585657 63 400468 64 592692 65 1466342 66 1599723 67 1129709 68 2347202 69 1681311 70 2746055 71 3321385 72 6099820

In my FASTQ file Illumina 1.5 quality encoding is used (according to FastQC output). I can see nothing special here regarding this encoding. Maybe, I should convert theses scores to Sanger Phred+33 encoding?

I found the following thread in bcbio-nextgen github repository: https://github.com/chapmanb/bcbio-nextgen/issues/190 Apparently (according to the errors discussed), they use GATK inside. They used the following sed code snippet to convert some old Illumina encoding to the Sanger encoding: https://github.com/chapmanb/bcbio-nextgen/commit/e99933ae6970e4cc11e75f7b809591d60d71e511.


Created 2016-02-03 15:40:20 | Updated | Tags: commandlinegatk haplotypecaller pcrmodel

Comments (6)

Dear GATK team,

I'm a bit confused by --pcr_indel_model argument in HaplotypeCaller. As a can see from the docs, this argument is not required, but in its description I still read the following: "VERY IMPORTANT: when using PCR-free sequencing data we definitely recommend setting this argument to NONE". Does some PCR-bias-oriented filtration is performed by default (so, for PCR-free datasets I should set it to NONE), or actually I don't need to set this argument to NONE (even processing PCR-free datasets) if I simply don't use it?

Best regards, Svyatoslav


Created 2016-01-20 16:24:38 | Updated | Tags: commandlinegatk markduplicates

Comments (5)

Dear GATK team,

Am I right that since MarkDuplicates considers only 5' coordinates of reads, it should work properly on reads (both paired-end and single-end) that have different lengths (due to quality trimming from 3')?


Created 2016-01-15 16:27:38 | Updated | Tags: indelrealigner bqsr commandlinegatk knownsites

Comments (2)

Dear GATK team,

I'd like to learn what files I should use for indel realignment and BQSR from hg38 bundle? (I read the manual on this topic -- https://broadinstitute.org/gatk/guide/article?id=1247 -- but just would like to be sure):

1) Am I right that for indel realignment I should use Mills_and_1000G_gold_standard.indels.hg38.vcf and 1000G_phase1.snps.high_confidence.hg38.vcf.gz ?

2) Am I right that for BQSR I should use Mills_and_1000G_gold_standard.indels.hg38.vcf , 1000G_phase1.snps.high_confidence.hg38.vcf.gz , and dbsnp_144.hg38.vcf ?

3) Are there any other files with known sites I should use for indel realignment and BQSR?


Created 2016-01-14 19:02:12 | Updated | Tags: commandlinegatk haplotypecaller multi-sample gatk error

Comments (8)

Hi,

I would like to force call a list of variants across my cohort using HaplotypeCaller to get more accurate QC metrics for each variant. I am using the following command:

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta -et NO_ET -K my.key -I my.cohort.list --alleles my.vcf -L my.vcf -out_mode EMIT_ALL_SITES -gt_mode GENOTYPE_GIVEN_ALLELES -stand_call_conf 30.0 -stand_emit_conf 0.0 -dt NONE -o final_my.vcf

Here is a link to the input VCF file: VCF File

Unfortunately, I keep running into the following error (I've tried GATK ver3.3 and ver3.5):

INFO 18:49:21,288 ProgressMeter - chr1:11177077 21138.0 49.5 m 39.0 h 69.4% 71.3 m 21.8 m ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR stack trace java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at htsjdk.variant.variantcontext.VariantContext.getAlternateAllele(VariantContext.java:845) at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:248) at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:1059) at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:221) at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709) at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274) at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107) ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.3.0-mssm-0-gaa95802): ##### ERROR ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem. ##### ERROR If not, please post the error message, with stack trace, to the GATK forum. ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk ##### ERROR ##### ERROR MESSAGE: Index: 3, Size: 3 ##### ERROR ------------------------------------------------------------------------------------------

Would appreciate your help in solving this issue.


Created 2016-01-14 16:12:55 | Updated | Tags: commandlinegatk markduplicates rmdup

Comments (2)

Dear GATK team,

I'm going to do variant calling for several tens of samples using hg38 reference with GATK. I have several questions about this process. They are partially covered on forums and in FAQs, but I'd like to clarify some points:

1) Am I right that MarkDuplicates can process a BAM file that contains both paired-end and single-end reads? (Picard FAQ hints it can, but just to be sure.)

2) Am I right that MarkDuplicates is significantly slower than samtools rmdup (because of its algorithm that marks not only dupes from the same chromosome, but also dupes from different chromosomes)?

3) Is there any evidence that use of MarkDuplicates is significantly better for the downstream analysis with GATK than use of samtools rmdup? (Of course, MarkDuplicates is used in the Best Practices, but Picard tools are used everywhere in that guide.)

Remarks:

1) I use bowtie2 --very-sensitive for read mapping.

2) I'd like to get a gVCF file for each sample.


Created 2015-11-13 19:02:51 | Updated | Tags: commandlinegatk developer development customwalker maven

Comments (2)

Hello.

Because the thread for distribution of custom walkers is retired and I found another thread that does not help me in the issue that I have, I would like to ask this question again.

I'm planning to write custom walkers for the GATK public framework (using a Maven dependency with jitpack.io) and distribute that software including the custom walkers as a jar file with dependencies, where the only available walkers from the command line will be the custom ones. Although in the logging for my command line will be the version of the GATK framework in use, I would like to use a different logging for the beginning of the program and the error (instead of the link to the GATK webpage), so I will need to create and/or extend classes from the GATK engine (for the command line).

Of course, the software will be open source (MIT license), but I wonder if it could be possible to distribute a software in such a way. In addition, I would like to know if it is possible for the developers to made a tutorial on how to start the GATK engine with a custom walker without using the GATK command line.

Thank you very much in advance.


Created 2015-09-14 15:21:42 | Updated | Tags: commandlinegatk selectvariants runtime-error

Comments (3)

Hi, I'm trying to extract a few samples from a large VCF file with many samples using SelectVariants and I keep running into this error.

Command: java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T SelectVariants -R /Volumes/odin/reference/16484/mafa5/mafa5. -V 16557.all.exons.mafa5.ann.vcf.gz -o 16580.mafa5.M1M1.vcf -sn CY0320 -sn CY0321 -sn CY0322 -sn CY0323 -sn CY0324 -sn CY0325

ERROR stack trace

java.lang.IllegalArgumentException at java.nio.ByteBuffer.allocate(ByteBuffer.java:334) at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:195) at org.broadinstitute.gatk.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:329) at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.initializeReferenceSequence(LocusReferenceView.java:150) at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.(LocusReferenceView.java:126) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:90) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------

Any thoughts as to why this might be happening?


Created 2015-07-31 22:11:45 | Updated | Tags: commandlinegatk haplotypecaller gatk error

Comments (4)

Hello,

I am receiving the following error. I am working with SAM files that were exported from CLC, then edited with Picard-tools to addReadGroups. I am not sure if I need to add an additional step to solve this problem, I cannot find any documentation regarding this error.

Please let me know what I need to do to correct this issue.

Thank you!

gatk -T HaplotypeCaller -R spinach_assembly-repeatdetect_PACBIO_V1.3_formated_60.fa -I .sam.list -drf DuplicateRead --alleles Unfiltered_Spinach_PacBio_Reseq_12_Geno_Assay_SNP.fixed.noblanks.vcf --genotyping_mode GENOTYPE_GIVEN_ALLELES --output_mode EMIT_ALL_SITES -o output_raw_unfiltered_spinach_snps_gbs.vcf INFO 14:48:44,450 HelpFormatter - --------------------------------------------------------------------------------- INFO 14:48:44,453 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12 INFO 14:48:44,454 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 14:48:44,454 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 14:48:44,458 HelpFormatter - Program Args: -T HaplotypeCaller -R spinach_assembly-repeatdetect_PACBIO_V1.3_formated_60.fa -I .sam.list -drf DuplicateRead --alleles Unfiltered_Spinach_PacBio_Reseq_12_Geno_Assay_SNP.fixed.noblanks.vcf --genotyping_mode GENOTYPE_GIVEN_ALLELES --output_mode EMIT_ALL_SITES -o output_raw_unfiltered_spinach_snps_gbs.vcf INFO 14:48:44,468 HelpFormatter - Executing as ahulse@jalapeno.genomecenter.ucdavis.edu on Linux 2.6.18-348.12.1.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_05-b13. INFO 14:48:44,469 HelpFormatter - Date/Time: 2015/07/31 14:48:44 INFO 14:48:44,469 HelpFormatter - --------------------------------------------------------------------------------- INFO 14:48:44,470 HelpFormatter - --------------------------------------------------------------------------------- INFO 14:48:45,102 GenomeAnalysisEngine - Strictness is SILENT INFO 14:48:45,385 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500 INFO 14:48:45,394 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 14:48:48,432 SAMDataSource$SAMReaders - Init 50 BAMs in last 3.04 s, 50 of 80 in 3.04 s / 0.05 m (16.46 tasks/s). 30 remaining with est. completion in 1.82 s / 0.03 m INFO 14:48:50,052 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 4.66 INFO 14:48:50,164 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 14:48:54,742 RMDTrackBuilder - Writing Tribble index to disk for file /local/scratch/scratch/Amanda/Spinach_GBS/Unfiltered_Spinach_PacBio_Reseq_12_Geno_Assay_SNP.fixed.noblanks.vcf.idx INFO 14:48:58,784 GenomeAnalysisEngine - Preparing for traversal over 80 BAM files INFO 14:49:00,054 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A BAM ERROR has occurred (version 3.4-46-gbc02625):
ERROR
ERROR This means that there is something wrong with the BAM file(s) you provided.
ERROR The error message below tells you what is the problem.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum until you have followed these instructions:
ERROR - Make sure that your BAM file is well-formed by running Picard's validator on it
ERROR (see http://picard.sourceforge.net/command-line-overview.shtml#ValidateSamFile for details)
ERROR - Ensure that your BAM index is not corrupted: delete the current one and regenerate it with 'samtools index'
ERROR
ERROR MESSAGE: Cannot retrieve file pointers within SAM text files.
ERROR ------------------------------------------------------------------------------------------

Created 2015-03-04 17:41:22 | Updated | Tags: unifiedgenotyper commandlinegatk

Comments (2)

I'm encountering a problem similar to what I experienced with a mismatch between reference and cosmic files. Specifically, I created .bam files using human_g1k_v3.fasta, which reference coordinates as 1...22,X,Y etc. When I try to run UnifiedGenotyper with this reference file, the .bam file I created, and the dbsnp_137.b37.vcf, I get an error on account of the fact that snp coordinates are listed as chr1...chr22,chrX,chrY etc. Other than writing a script to remove all occurrences of "chr" is there another way to get around this problem, i.e. a dbsnp reference file that has the desired coordinates without the "chr"?

I resolved the problem with reference/cosmic by finding a cosmic file with consistent notation, but can't find a similar fix for this one. I'd appreciate any suggestions.


Created 2015-02-24 23:21:25 | Updated | Tags: commandlinegatk intervals

Comments (3)

Hi, I hope this is a quick question; but does using the 'include intervals' command line option '-L' only include the region specified?

For instance if I have an file that includes reads for chromosomes 1,2,6,X,and Y and I specifiy "-L 6", will the walker only process chromosome 6, or will it include the rest of my data as well?

Thank you for the clarification!


Created 2014-12-31 10:00:00 | Updated | Tags: commandlinegatk

Comments (3)

Hi everyone,

I'm using GATK Haplotype Caller and recently I read a document about optimization and GATK evoking GPU, IBM Power8, etc... Conclusions are that GATK could run faster than the actual implementation. In this document you suggest a C or C++ development in parallel of the actual Java implementation. And I wonder if you have made any progress so far and if you have planned a release? I'm also interested to know about the technology you're considering for GATK HC : C, C++, MPI, OpenMP, GPU...?

Regards

Frédéric


Created 2014-11-13 20:22:25 | Updated 2014-11-13 20:27:19 | Tags: commandlinegatk depthofcoverage

Comments (2)

Hi,

I have used the following commands using DepthOfCoverage tool with two different bed files:

  java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R ucsc_hg19.fa -I WT_recalibrated.bam -L coverage_summary.bed -ct 1 -ct 10 -ct 20 -ct 30 -ct 50 -ct 100 -o WT_cov

The line count for the input and output:

$wc -l WT_cov.sample_interval_summary
4988 WT_cov.sample_interval_summary
$ wc -l coverage_summary.bed 
10585 coverage_summary.bed

In the other case:

  java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R ucsc_hg19.fa -I WT_recalibrated.bam -L exon.bed -ct 1 -ct 10 -ct 20 -ct 30 -ct 50 -ct 100 -o WT_exon

Line count for the input and output:

 $ wc -l WT_exon.sample_interval_summary 
 5065 WT_exon.sample_interval_summary
 $ wc -l exon.bed 
 5065 exon.bed

The input in both the cases is of the standard format as shown below:

 chr1    6529578 6529755  
 chr1    6530273 6530442
 chr1    6530543 6530721
 chr1    6530773 6530980 
 chr1    6531028 6531730
 chr1    6531768 6531914
 chr1    6532563 6532713
 chr1    6533023 6533273

Could anyone help to interpret the discrepancy between number of target regions in bed file and _interval_summary file in the above two cases?


Created 2014-10-29 10:11:15 | Updated | Tags: commandlinegatk variantfiltration filterexpression

Comments (2)

Hi all, I tried to apply the following command to my raw vcf file to filter it with the command: java -Xmx30g -jar ../GATK/GenomeAnalysisTK.jar -R ../ref.fa -T VariantFiltration --filterExpression " QD < 20.0 || ReadPosRankSum < -8.0 || FS > 10.0 || QUAL < $MEANQUAL || MQ <30.0 || DP< 10.0 " --filterName LowQualFilter --missingValuesInExpressionsShouldEvaluateAsFailing --variant ../s1.raw.vcf --logging_level ERROR -o ../s1.makered.raw.vcf

grep -v "Filter" s1.makered.raw.vcf >s1.flt.vcf

After that, I checked the result file s1.flt.vcf and found the following makered "PASS" .Obviously, the command doesn't work as ‘DP=8“ should be makered "LowQualFiter".

Chr01 231575 . A G 241.78 PASS AC=2;AF=1.00;AN=2;DP=8;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=29.00;MQ0=0;QD=30.22 GT:AD:DP:GQ:PL 1/1:0,8:8:24:270,24,0 Chr01 237476 . T C 238.78 PASS AC=2;AF=1.00;AN=2;DP=8;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=29.00;MQ0=0;QD=29.85 GT:AD:DP:GQ:PL 1/1:0,8:8:24:267,24,0

There is no error reported.Any suggestion will be appreciated.


Created 2014-09-15 08:35:02 | Updated | Tags: commandlinegatk haplotypecaller runtime-error

Comments (1)

Hi,

I'm trying to call variants on WGS data using the following command (on a high-performance cluster using 4 cores per job) :

java -Xmx6G -jar $CLASSPATH -T HaplotypeCaller --dbsnp GRCh37-lite.vcf -nct 4 -R GRCh37-lite.fa -I /user/data/gent/gvo000/gvo00027/vsc40035/StJude/001/001_D.bam -maxAltAlleles 10 --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o 001_D.vcf

$CLASSPATH contains the location of the .jar file.

For 20 out of 70 samples the script ended without a problem, for the other 50 samples the same error messages was returned, given below. Can you tell what's going wrong?

Kind regards, Steve

INFO 07:28:05,514 ProgressMeter - 2:92269030 2.49e+08 3.1 h 44.0 s 11.0% 27.9 h 24.8 h INFO 07:29:05,515 ProgressMeter - 2:92309304 2.49e+08 3.1 h 44.0 s 11.0% 28.1 h 25.0 h INFO 07:30:05,516 ProgressMeter - 2:92320228 2.49e+08 3.1 h 44.0 s 11.0% 28.2 h 25.1 h INFO 07:30:06,052 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.NullPointerException at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:443) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeDiploidHaplotypeLikelihoods(PairHMMLikelihoodCalculationEngine.java:417) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.calculateGLsForThisEvent(GenotypingEngine.java:385) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:222) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:880) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:141) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------

Created 2014-09-09 11:11:57 | Updated 2014-09-09 17:13:58 | Tags: combinevariants commandlinegatk

Comments (19)

Hi,

I have used CombineVariants to combine variants from GATK and samtools as shown below:

java -jar GenomeAnalysisTK.jar -T CombineVariants -R ref.fa --variant:GatkSNP GATKsnp.vcf --variant:GatkINDEL GATKind.vcf --variant:SamSNP Samsnp.vcf --variant:SamINDEL Samind.vcf -o allvar.vcf -genotypeMergeOptions PRIORITIZE -priority GatkSNP,GatkINDEL,SamSNP,SamINDEL --filteredrecordsmergetype KEEP_UNCONDITIONAL

This merges all the variants. However, with the above command, i do get the variants present in both GATK and samtools emitted from samtools.

I would like to get all the variants such that:

  • variants present in both GATK and samtools emitted from GATK vcf files
  • variants in only GATK
  • variants in only samtools

could someone suggest any ideas or of there is something to be fixed in the command.

Thanks


Created 2014-08-09 01:34:03 | Updated | Tags: indelrealigner commandlinegatk

Comments (3)

Dear GATK help team,

I have a cut chromosome file (cur 17) in which I have processed through sorting, alignment, adding headers, and even through the realignertargetcreator. Yet, when I would like to call my indels from the Indel realigned. I have received an error. ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The contig index 0 is bad, doesn't equal the contig index 17 of the contig from a string chr17

I have cut these chromosomes and processed the chr17 first since that is my region of interest, and did it since I thought it might save memory issues.

I am currently a newbie, have check the forum for help, yet only found one similar post with no solution. Please help-- stuck at this stage. My code for the index realigned is the following: java -jar /Users/yotsukurasohiya/build/softwares/GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar -T IndelRealigner -R /Volumes/Pegasus/broadref/ucsc.hg19.fasta -I /Volumes/Pegasus/tmp/mardup.pregatk.bam -targetIntervals 2_target_intervals.list -known /Volumes/Pegasus/broadref/Mills_and_1000G_gold_standard.indels.hg19.vcf -known /Volumes/Pegasus/broadref/dbsnp_138.hg19.vcf -known /Volumes/Pegasus/broadref/1000G_phase1.snps.high_confidence.hg19.vcf -o 2_realigned_reads.bam

The heads that I have added are through picard softwares addorreplacereadgroups. SO=coordinate CREATE_INDEX=true SM=temp PL=Illumina PU=barcode LB=bar ID=id


Created 2014-04-11 04:25:31 | Updated | Tags: commandlinegatk catvariants

Comments (2)

Using GATK on command-line the CatVariants command fails.

Program version: GATK 3.1-1-g07a4bf8.

ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: CatVariants

Code to invoke:

java -jar GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T CatVariant -R file.fasta

Note that in the current documentation for CatVariants the example lists the name as 'org.broadinstitute.sting.tools.CatVariants' rather than just CatVariants. Trying the listed string fails with the same error.


Created 2013-10-30 17:03:49 | Updated | Tags: commandlinegatk catvariants

Comments (3)

Below is the command:

java -cp $CLASSPATH/GenomeAnalysisTK.jar org.broadinstitute.sting.tools.CatVariants \
-R GATK_ref/hg19.fasta \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-1.vcf \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-2.vcf \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-3.vcf \
-out ../GATK/VQSR/parallel_batch/combined_raw.snps_indels.vcf \
-log ../GATK/VQSR/parallel_batch/log/combined.log \
-assumeSorted

After this, the combined_raw.snps_indels.vcf file only contains the header from raw.snps_indels-1.vcf, what might be wrong?


Created 2013-10-24 17:59:58 | Updated | Tags: commandlinegatk

Comments (9)

I'm running the latest GATK nightly build to process human exome-seq data (has 12 samples). It seemed be faster than the older version until I run the HaplotypeCaller. The run summary shows it will take 14 days to finish. I am wondering if there's anything in my below command: How to make it faster without losing data in the output?

java -Xmx10g -Djava.io.tmpdir=/temp/GATK_temp
-jar $CLASSPATH/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R ../GATK_ref/hg19.fasta \
-I ./compressedbam.list \
-L ../GATK_ref/hg19knownGene_UCSC_sorted.bed \
-log ../GATK/VQSR/log/HaplotypeCaller_20131018.log \
-o ../GATK/VQSR/raw.snps_indels.vcf

Created 2013-07-31 16:25:00 | Updated | Tags: commandlinegatk

Comments (2)

If I put my input files as a list in the file named "input.list", how do I set the output names? or do I just need to set the output folder and the output file names will be automatically named?


Created 2013-07-10 22:31:37 | Updated 2013-07-10 22:34:34 | Tags: realignertargetcreator commandlinegatk

Comments (5)

I started with BWA-MEM to do alignment, used Picard to process the .SAM files (converted to bam, reorder, addorreplacegroup, etc). The GATK version I'm using is version 2.5-2-gf57256b, I cannot run 2.6 because the server only has Java 6 and I cannot upgrade it to Java 7.

I got a huge stack of error message when I run this command line (RealignerTargetCrator):

java -Xmx2g -jar $CLASSPATH/GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R /Volumes/files/Users/user1/GATK_ref/hg19.fasta \ -I sorted_Deduped_reorder_grp.bam \ -o ./GATK/forIndelRealigner.intervals>

The error messages are these (sorry, a lot): I don't know why GATK needs to connect to window server? what permission problem? I am using a Mac OS X built server (remote). Thank you

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.InternalError: Can't connect to window server - not enough permissions. at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1827) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1724) at java.lang.Runtime.loadLibrary0(Runtime.java:823) at java.lang.System.loadLibrary(System.java:1045) at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:50) at java.security.AccessController.doPrivileged(Native Method) at java.awt.Toolkit.loadLibraries(Toolkit.java:1605) at java.awt.Toolkit.(Toolkit.java:1627) at sun.awt.AppContext$2.run(AppContext.java:240) at sun.awt.AppContext$2.run(AppContext.java:226) at java.security.AccessController.doPrivileged(Native Method) at sun.awt.AppContext.initMainAppContext(AppContext.java:226) at sun.awt.AppContext.access$200(AppContext.java:112) at sun.awt.AppContext$3.run(AppContext.java:306) at java.security.AccessController.doPrivileged(Native Method) at sun.awt.AppContext.getAppContext(AppContext.java:287) at com.sun.jmx.trace.Trace.out(Trace.java:180) at com.sun.jmx.trace.Trace.isSelected(Trace.java:88) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.isTraceOn(DefaultMBeanServerInterceptor.java:1830) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:929) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:916) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312) at com.sun.jmx.mbeanserver.JmxMBeanServer$2.run(JmxMBeanServer.java:1195) at java.security.AccessController.doPrivileged(Native Method) at com.sun.jmx.mbeanserver.JmxMBeanServer.initialize(JmxMBeanServer.java:1193) at com.sun.jmx.mbeanserver.JmxMBeanServer.(JmxMBeanServer.java:225) at com.sun.jmx.mbeanserver.JmxMBeanServer.(JmxMBeanServer.java:170) at com.sun.jmx.mbeanserver.JmxMBeanServer.newMBeanServer(JmxMBeanServer.java:1401) at javax.management.MBeanServerBuilder.newMBeanServer(MBeanServerBuilder.java:93) at javax.management.MBeanServerFactory.newMBeanServer(MBeanServerFactory.java:311) at javax.management.MBeanServerFactory.createMBeanServer(MBeanServerFactory.java:214) at javax.management.MBeanServerFactory.createMBeanServer(MBeanServerFactory.java:175) at sun.management.ManagementFactory.createPlatformMBeanServer(ManagementFactory.java:302) at java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:504) at org.broadinstitute.sting.gatk.executive.MicroScheduler.(MicroScheduler.java:222) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.(LinearMicroScheduler.java:70) at org.broadinstitute.sting.gatk.executive.MicroScheduler.create(MicroScheduler.java:169) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.createMicroscheduler(GenomeAnalysisEngine.java:443) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:272) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.5-2-gf57256b):

Created 2013-06-21 07:05:03 | Updated | Tags: unifiedgenotyper commandlinegatk

Comments (6)

Dear GATK Users,

Could anybody tell me how to identify the deletions from the bam file using GATK module?? Actually i used UnifiedGenotyper i am getting list like

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT human

gi|262 48155 . G A 80.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.103;DP=10;Dels=0.00;FS=0.000;HaplotypeScore =0.0000;MLEAC=1;MLEAF=0.500;MQ=28.61;MQ0=0;MQRankSum=-1.453;QD=8.08;ReadPosRankSum=-0.336 GT:AD:DP:GQ:PL 0/1:5,5:10:99:109,0,146

Thanks Sridhar


Created 2013-06-12 16:09:40 | Updated | Tags: commandlinegatk workflow rnaseq

Comments (15)

Hi all: I find that among all the work flows of GATK http://www.broadinstitute.org/gatk/guide/topic?name=methods-and-workflows there are no workflows for RNA-seq analysis. I understand that GATK mainly focuses on variant calling, can anyone tell me how to use GATK for RNA-seq analysis?

thanks daniel


Created 2013-06-07 18:45:10 | Updated 2013-06-07 18:45:59 | Tags: commandlinegatk phasebytransmission genotype-likelihood

Comments (8)

Hello Team,

I am attempting to run GATK's PhasebyTransmission command to phase a vcf file contains a father, mother, son trio generated from complete genomics mkvcf command.

After creating the ped file and running the command I generate the error: "MESSAGE: BUG: Attempted to get likelihoods as strings and neither the vector nor the string is set!". I am not exactly sure what this means.

When I check my file and the documentation I am able to see that the 'GL' field is contained in the file, but could this not be the case? I have attached a few lines from the vcf I am using.

Any help with resolving the this issue would be of great help.

Thank you

JumaQuar


Created 2013-06-06 21:08:30 | Updated | Tags: commandlinegatk queue hadoop mapreduce google

Comments (8)

Hello, I`m new to GATK and Queue. I understand that we can write a QScript in Queue to generate separate GATK jobs and run them on a cluster of several nodes. Can we implement GATK or Queue on google hadoop?


Created 2013-04-22 13:27:13 | Updated 2013-04-22 13:29:47 | Tags: commandlinegatk intervals

Comments (3)

I got this error message, when trying to use a file to specify at which positions to emit variants:

ERROR MESSAGE: Couldn't read file /lustre/scratch109/sanger/tc9/agv/wgs/pipeline/union4x.positions because The interval file /lustre/scratch109/sanger/tc9/agv/wgs/pipeline/union4x.positions does not have one of the supported extensions (.bed, .list, .picard, .interval_list, or .intervals). Please rename your file with the appropriate extension. Is there a GATK page describing those 5 file formats? Some of them are unknown to me; e.g. .list.

I asked my question here, but please ignore it: http://gatkforums.broadinstitute.org/discussion/2219/l-option

Thanks a lot.

Also, the error message does not mention support for vcf files, but the documentation does. Are vcf files supported?


Created 2013-03-25 10:37:06 | Updated 2013-03-25 10:42:06 | Tags: commandlinegatk printreadswalker

Comments (5)

hi all! I'm trying to complete my first GATK run, I'm doing the step in the "EXECUTION STEP" following section.

please tell me if the step execution are globally correct.

------------------------------------------------------------------ERRORS----------------------------------------------------------------

the step 4.1 isn't executed without -maxCycle 1500.

when try to execute 4.2 step I got the following error:

ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Key 1036 is too large for dimension 2 (max is 1001) at org.broadinstitute.sting.utils.collections.NestedIntegerArray.put(NestedIntegerArray.java:128) at org.broadinstitute.sting.utils.recalibration.RecalibrationReport.parseAllCovariatesTable(RecalibrationReport.java:157) at org.broadinstitute.sting.utils.recalibration.RecalibrationReport.(RecalibrationReport.java:68) at org.broadinstitute.sting.utils.recalibration.BaseRecalibration.(BaseRecalibration.java:74) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.setBaseRecalibration(GenomeAnalysisEngine.java:217) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:253) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Key 1036 is too large for dimension 2 (max is 1001)
ERROR ------------------------------------------------------------------------------------------

---------------------------------------------------------------EXECUTION STEP---------------------------------------------------------

2 MARKING PCR DUPLICATE

java -Xmx4g -Djava.io.tmpdir=/tmp -jar MarkDuplicates.jar INPUT=M9.bam OUTPUT=m9.marked.bam METRICS_FILE=metrics CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT

3 LOCAL REALIGNMENT AROUND INDEL

3.1

java -Xmx4g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta -knowndbsnp_137.hg19.vcf -o m9.list -I m9.marked.bam

3.2

java -Xmx4g -Djava.io.tmpdir=/tmp -jar GenomeAnalysisTK.jar -I m9.marked.bam -R ucsc.hg19.fasta -T IndelRealigner -targetIntervals m9.list -known dbsnp_137.hg19.vcf -o m9.marked.realigned.bam

3.2.1

java -Xmx4g -jar GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T ReduceReads -I m9.marked.realigned.bam -o m9.marked.realigned.reduce.bam

3.3

java -Djava.io.tmpdir=/tmp/flx-auswerter -Xmx4g -jar FixMateInformation.jar INPUT=m9.marked.realigned.reduce.bam OUTPUT=m9.marked.realigned.reduce.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

4 QUALITY SCORE RECALIBRATION

4.1

java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R ucsc.hg19.fasta -knownSites dbsnp_137.hg19.vcf -I m9.marked.realigned.reduce.fixed.bam -T BaseRecalibrator -maxCycle 1500 -cov ReadGroupCovariate -cov QualityScoreCovariate -o m9.recal_data.grp

4.2 ***

java -Xmx4g -jar GenomeAnalysisTK.jar -T PrintReads -R ucsc.hg19.fasta -I m9.marked.realigned.reduce.fixed.bam -BQSR m9.recal_data.grp -o m9.marked.realigned.reduce.fixed.recal.bam

5 SNP CALLING

5.1

java -Xmx4g -jar GenomeAnalysisTK.jar -nct 4 --num_threads 4 -glm BOTH -R ucsc.hg19.fasta -T UnifiedGenotyper --sample_ploidy 5 -I m9.marked.realigned.reduce.fixed.bam -D dbsnp_137.hg19.vcf -o m9.vcf -stand_call_conf 20.0 -stand_emit_conf 20.0
-A DepthOfCoverage -A AlleleBalance


Created 2012-10-31 01:51:31 | Updated 2012-10-31 22:20:05 | Tags: baserecalibrator commandlinegatk phone-home

Comments (1)

HI When I run Base recabrator with the following command:

java -Xmx4g -jar /usr/bin/GenomeAnalysisTK.jar -T BaseRecalibrator -I realignedBam.bam  -R /data1/human_g1k_v37.fasta --knownSites /data1/snp132.vcf -o recalibration_report.grp

I get the following error :

INFO  07:15:53,380 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,380 HttpMethodDirector - Retrying request 
INFO  07:15:53,386 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,387 HttpMethodDirector - Retrying request 
INFO  07:15:53,393 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,393 HttpMethodDirector - Retrying request 
INFO  07:15:53,398 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,398 HttpMethodDirector - Retrying request 
INFO  07:15:53,405 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,405 HttpMethodDirector - Retrying request 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.0-34-g07bda93): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:
##### ERROR          Name        FeatureType   Documentation
##### ERROR          BCF2     VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_bcf2_BCF2Codec.html
##### ERROR        BEAGLE      BeagleFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_beagle_BeagleCodec.html
##### ERROR           BED         BEDFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_bed_BEDCodec.html
##### ERROR      BEDTABLE       TableFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_table_BedTableCodec.html
##### ERROR EXAMPLEBINARY            Feature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_example_ExampleBinaryCodec.html
##### ERROR      GELITEXT    GeliTextFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_gelitext_GeliTextCodec.html
##### ERROR      OLDDBSNP    OldDbSNPFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_dbsnp_OldDbSNPCodec.html
##### ERROR     RAWHAPMAP   RawHapMapFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_hapmap_RawHapMapCodec.html
##### ERROR        REFSEQ      RefSeqFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_refseq_RefSeqCodec.html
##### ERROR     SAMPILEUP   SAMPileupFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html
##### ERROR       SAMREAD     SAMReadFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_samread_SAMReadCodec.html
##### ERROR         TABLE       TableFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_table_TableCodec.html
##### ERROR           VCF     VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_vcf_VCFCodec.html
##### ERROR          VCF3     VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_vcf_VCF3Codec.html
##### ERROR ------------------------------------------------------------------------------------------

Created 2012-10-30 14:27:46 | Updated 2012-10-30 16:36:58 | Tags: commandlinegatk user-error

Comments (1)

Hi I´ve a strange problem with the GATK. Everytime I try to run it my Console shows the following error Message.

30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR ------------------------------------------------------------------------------------------
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR A USER ERROR has occurred (version 2.1-13-g0f021e6): 
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR Please do not post this error to the GATK forum
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR Visit our website and forum for extensive documentation and answers to 
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR MESSAGE: Argument with name '--analysis_type' (-T) is missing.
30.10.12 15:10:06   [0x0-0xe20e2].com.apple.JarLauncher[1114]   ##### ERROR ------------------------------------------------------------------------------------------

Can you show me my mistakes please? With regards Oliver


Created 2012-10-19 02:07:55 | Updated 2012-10-19 02:32:55 | Tags: commandlinegatk java

Comments (2)

Hello,

During running of the depthOfCoverage tool, I get the error: /tmp/RsQHCt1W: No space left on device

I have tried changing the TMPDIR environment variable (and exporting) but eventually I get the same error. Is there a way to change the temporary directory that GATK uses?

I'm running GATK v2.1-8-g5efb575 on a Linux system.

Thanks, Rick