I'm trying to run the BaseRecalibrator tool on my data and am getting the following error:
INFO 14:58:17,399 HelpFormatter - --------------------------------------------------------------------------------- [33/222]
INFO 14:58:17,400 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-13-g1706365, Compiled 2012/10/12 19:21:06
INFO 14:58:17,400 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 14:58:17,400 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 14:58:17,401 HelpFormatter - Program Args: -T BaseRecalibrator -I /home/sheenams/gatk_test/LMG-206.GATKinitialrmdup.srt.bam -R /home/genetics/G
enomes/gatk-bundle/human_g1k_v37.fasta -knownSites /home/genetics/Genomes/gatk-bundle/dbsnp_135.b37.vcf -knownSites /home/genetics/Genomes/gatk-bundl
e/Mills_and_1000G_gold_standard.indels.b37.sites.vcf -knownSites /home/genetics/Genomes/gatk-bundle/1000G_phase1.indels.b37.vcf -o /home/sheenams/gat
k_test/LMG-206.recal_data.csv -log /home/sheenams/gatk_test/LMG-206.gatk_log
INFO 14:58:17,401 HelpFormatter - Date/Time: 2012/10/17 14:58:17
INFO 14:58:17,401 HelpFormatter - ---------------------------------------------------------------------------------
INFO 14:58:17,401 HelpFormatter - ---------------------------------------------------------------------------------
INFO 14:58:17,407 ArgumentTypeDescriptor - Dynamically determined type of /home/genetics/Genomes/gatk-bundle/dbsnp_135.b37.vcf to be VCF
INFO 14:58:17,409 ArgumentTypeDescriptor - Dynamically determined type of /home/genetics/Genomes/gatk-bundle/Mills_and_1000G_gold_standard.indels.b3
7.sites.vcf to be VCF
INFO 14:58:17,410 ArgumentTypeDescriptor - Dynamically determined type of /home/genetics/Genomes/gatk-bundle/1000G_phase1.indels.b37.vcf to be VCF
INFO 14:58:17,414 GenomeAnalysisEngine - Strictness is SILENT
INFO 14:58:17,463 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 14:58:17,479 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 14:58:17,487 RMDTrackBuilder - Loading Tribble index from disk for file /home/genetics/Genomes/gatk-bundle/dbsnp_135.b37.vcf
WARN 14:58:17,574 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has UNBOUND
ED but standard is A
INFO 14:58:17,575 RMDTrackBuilder - Loading Tribble index from disk for file /home/genetics/Genomes/gatk-bundle/Mills_and_1000G_gold_standard.indels
.b37.sites.vcf
WARN 14:58:17,589 VCFStandardHeaderLines$Standards - Repairing standard header line for field GQ because -- type disagree; header has Float but stan
dard is Integer
INFO 14:58:17,590 RMDTrackBuilder - Loading Tribble index from disk for file /home/genetics/Genomes/gatk-bundle/1000G_phase1.indels.b37.vcf
WARN 14:58:17,603 VCFHeader - Found GL format, but no PL field. As the GATK now only manages PL fields internally automatically adding a correspond
ing PL field to your VCF header
WARN 14:58:17,603 VCFStandardHeaderLines$Standards - Repairing standard header line for field AC because -- count types disagree; header has UNBOUND
ED but standard is A -- descriptions disagree; header has 'Alternate Allele Count' but standard is 'Allele count in genotypes, for each ALT allele, i
n the same order as listed'
WARN 14:58:17,603 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has INTEGER
but standard is A -- descriptions disagree; header has 'Global Allele Frequency based on AC/AN' but standard is 'Allele Frequency, for each ALT alle
le, in the same order as listed'
INFO 14:58:18,093 BaseRecalibrator - The covariates being used here:
INFO 14:58:18,093 BaseRecalibrator - ReadGroupCovariate
INFO 14:58:18,093 BaseRecalibrator - QualityScoreCovariate
INFO 14:58:18,094 BaseRecalibrator - ContextCovariate
INFO 14:58:18,094 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
INFO 14:58:18,094 BaseRecalibrator - CycleCovariate
INFO 14:58:18,136 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
INFO 14:58:18,137 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 14:58:35,886 GATKRunReport - Uploaded run statistics report to AWS S3
org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Key 2002 is too large for dimension 2 (max is 2001) at org.broadinstitute.sting.utils.collections.NestedIntegerArray.put(NestedIntegerArray.java:77) at org.broadinstitute.sting.gatk.walkers.bqsr.AdvancedRecalibrationEngine.updateDataForPileupElement(AdvancedRecalibrationEngine.java:97) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:244) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:106) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)
I didn't see any other questions in the forum that addressed this. Can you please guide me on how to fix this error? I'm running GATK 2.1.13.
Thanks,
Sheena