Tagged with #realignment
0 documentation articles | 0 announcements | 9 forum discussions


No posts found with the requested search criteria.
No posts found with the requested search criteria.
Comments (4)

Hi

I'm indel realigning with version 2.4-9 using generic commands such as:

java -Xmx4g -jar /path/to/GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R /path/to/reference.fasta \ -I /path/to/input.bam \ -o /path/to/realigner.intervals

java -Xmx4g -jar /path/to/GenomeAnalysisTK.jar \ -T IndelRealigner \ -R /path/to/reference.fasta \ -I /path/to/sample-level.bam \ -targetIntervals /path/to/realigner.intervals.from.rtc \ -o /path/to/realigned.bam \ -model USE_SW \ -LOD 0.4

In most cases this is working fine, but in a few cases it is introducing artefacts that subsequently cause the bam file to fail Picard's ValidateSamFile, and in a couple of cases it introduces errors that can't be fixed by CleanSam and/or FixMateInformation.

Here's an example of an error that can be fixed by CleanSam:

Before indel realignment:

HS24_08564:7:2311:4630:19372#87 69 AAKM01002546 471 0 * = 471 0 HS24_08564:7:2311:4630:19372#87 137 AAKM01002546 471 25 100M = 471 0

After indel realignment:

HS24_08564:7:2311:4630:19372#87 69 AAKM01002546 471 0 * = 471 0 HS24_08564:7:2311:4630:19372#87 137 AAKM01002546 471 35 91M1D9M = 471 0

Here's an example of an error that can't be fixed:

Before indel realignment:

HS24_10061:6:1312:10172:98346#54 69 AAKM01002280 649 0 * = 649 0 HS24_10061:6:1312:10172:98346#54 137 AAKM01002280 649 37 100M = 649 0

After indel realignment:

HS24_10061:6:1312:10172:98346#54 69 AAKM01002280 649 0 * = 649 0 HS24_10061:6:1312:10172:98346#54 137 AAKM01002280 649 47 91M2D9M0S = 649 0

Is this a known bug? Any chance of a fix?

Thanks!

Richard

Comments (1)

Hello

I am currently trying to run the RealignerTargetCreator on some bam files which were aligned to hg19 howver am getting this error `ERROR MESSAGE: Input files known and reference have incompatible contigs: Found contigs with the same name but different lengths:

ERROR contig known = chrM / 16571
ERROR contig reference = chrM / 16569.`

After some initial investigation I found that the supplied hg19 reference genome which was being used for mapping was using rCRS mtDNA. other then realigning to a different build of hg19 is there any way to easily fix this problem through GATK?

Comments (5)

I am doing exome sequencing in 700 individuals from a species with a large genome and I would like to use GATK to realign around indels. I am using a reduced reference, which is still about 3Gb. I tested out the target creator, but it is taking 5 days for 12 individuals when each is done individually and this time frame is not feasible for all 700 individuals. I tried to run more in parallel (~30 individuals), but there are RAM limitations on our 250G server. I am currently testing out the program by running all 12 test samples as input for the same run and the time estimate is very long (on the order of several hundred weeks). Based on a preliminary run I have also included a vcf file with likely indels to try and speed the process. Can you suggest another way in which I can make the time frame for all 700 individuals more reasonable? Otherwise we will not be able to use this tool.

Comments (14)

Hi, I'm currently working with bwa, samtools and GATK to make SNP calling on Medicago truncatula. I'm using my own reference sequence, with the 8 chromosoms in the same fasta file.

C1_lenght=155648 AAAGATAGAGA.. C2_lenght=125018 ATGGATC... etc.. I have done alignments without problem, but for GATK : I do rmdup --> CreateSequenceDictionary.jar (picard) --> samtools sort --> Read Group (picard) --> samtools index and then : Pre alignment with :

java -jar -Xmx4g /usr/local/bioinfo/src/GATK/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar -nt 8 -T RealignerTargetCreator -R REF.fa -o RTC.intervals -I INPUT_muq30_RMDUP_RG.bam

Here there is no problem, but when I want to make the realignement :

java -jar -Xmx4g /usr/local/bioinfo/src/GATK/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar -T IndelRealigner -R REF.fa -I INPUT_muq30_RMDUP_RG.bam -targetIntervals RTC.intervals -o INPUT_muq30_RMDUP_RG_REAL.bam

And I got this error message : ERROR MESSAGE: Bad input:We encountered a non-standard non-IUPAC base in the provided reference: '13'

I didn't find any explanation in google for this error. Could you please help me ?!

vschilling

Comments (11)

Hi,

I am currently working with a project where we have sequenced a library of approximately 70 bps insert sizes using 2x100 paired-end seq. While this can seem unnecessary, it can improve base qualities a lot.

I have used SeqPrep (https://github.com/jstjohn/SeqPrep) which strips adaptors and merges reads that overlap, in our case the entire read most of the times. This also boosts the base qualities, if a base was sequenced twice, the quality improves quite a bit. This way, base qualities can stretch up to 70 and over (probability of error 0.0001 x 0.0001 if both reads had Q40 at that base, it merged qual = 80). No funny business there. :)

However, this does not seem to play nicely with GATK. The realignment crashes (see below) saying the the base quals must be erroneous. In my case, but they are correct. Can I force GATK to work with these BQs? (--validation_strictness LENIENT didn't help as you can see below :)

cheers Daniel Klevebring

   INFO  13:04:07,408 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:07,411 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.4-7-g5e89f01, Compiled 2013/03/06 01:01:28 
   INFO  13:04:07,411 HelpFormatter - Copyright (c) 2010 The Broad Institute 
   INFO  13:04:07,411 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
   INFO  13:04:07,416 HelpFormatter - Program Args: -T RealignerTargetCreator -I /scratch/3041404/P394_102.prmdup.bam -R /bubo/proj/b2010040/private/GoldenPath/hg19/GATK_resource_bundle/human_g1k_v37_clean.fasta -o /scratch/3041404/P394_102.realn.intervals --intervals /bubo/proj/b2010040/private/GoldenPath/NG_design/1000G_REF_picard_custom_design_target_regions_HG19.bed.interval_list --validation_strictness LENIENT 
   INFO  13:04:07,416 HelpFormatter - Date/Time: 2013/03/13 13:04:07 
   INFO  13:04:07,416 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:07,416 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:08,461 GenomeAnalysisEngine - Strictness is LENIENT 
   INFO  13:04:08,632 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
   INFO  13:04:08,640 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
   INFO  13:04:08,655 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
   INFO  13:04:09,782 IntervalUtils - Processing 39772003 bp from intervals 
   INFO  13:04:10,001 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files 
   INFO  13:04:10,262 GenomeAnalysisEngine - Done creating shard strategy 
   INFO  13:04:10,262 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
   INFO  13:04:10,263 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
   INFO  13:04:18,482 GATKRunReport - Uploaded run statistics report to AWS S3 
   ##### ERROR ------------------------------------------------------------------------------------------
   ##### ERROR A USER ERROR has occurred (version 2.4-7-g5e89f01): 
   ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
   ##### ERROR Please do not post this error to the GATK forum
   ##### ERROR
   ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
   ##### ERROR Visit our website and forum for extensive documentation and answers to 
   ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
   ##### ERROR
   ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/scratch/3041404/P394_102.prmdup.bam} appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 70; please see the GATK --help documentation for options related to this error
   ##### ERROR ------------------------------------------------------------------------------------------
Comments (1)

Hello,

before I only used BWA and as you described in the best pratice I performed the realign step. Now I want to integrate in my pipeline Stampy associated with BWA.

Do you think, I should make the realign step ?

Thanks !

Comments (5)

I recently used GATK, the latest version to carry out Indel realignment for my bam files. I have read at few places that one needs to fix the mate position information as the reads mapping position may change during realignment. My question is that does this step is taken care of y GATK automatically or one needs to run Picard fixmates before going for subsequent analysis.

Thanks

Comments (9)

Dear Community and GATK's team,

I have one question about the cleaning step before SNP calling, mainly about local realignment around indels.

I read on some website describing their workflow that alignments may change during the realignment process, it would prefer to fix the mate information and Picard offers this utility to do that for us. is it true? Or are there only any insert sizes that can change? If there are some change of insert sizes, is there a tool that checks that these changes are ok?

What do you use Picard's tool, FixMateInformation.jar, to fix the mate information when using paired-end data ?

Up to now, I have not used in my pipeline. Maybe this is a mistake. If we have to add this step, should we add this step after the realignment step or recalibration step?

Thank you for your help,

Tiphaine

Comments (3)

Hello,

I am new at using GATK (v 2.1-3). I do exome sequencing by sample using the following steps: Alignment with BWA (0.6.2) GATK :Local realignment around INDELs PICARD (1.67): FixMateInformation GATK: Recalibration (BaseRecalibrator + PrintReads -BQSR) Samtools for calling variants

Samtools seems to run properly but no file (*.vcf and *.bcf) are created and no error message is prompted :

cd Sample_09 + samtools mpileup -BE -ug -q 20 -Q 20 -D -f human_g1k_v37.fasta realigned_fixed_recal.bam -C50 + bcftools view -bvcg - [mpileup] 1 samples in 1 input files Set max per-file depth to 8000 [bcfview] 100000 sites processed. [afs] 0:89274.054 1:6411.053 2:4314.893 [bcfview] 200000 sites processed. [afs] 0:89100.642 1:6125.883 2:4773.474 [bcfview] 300000 sites processed. [afs] 0:87374.996 1:7439.238 2:5185.766 [bcfview] 400000 sites processed. [afs] 0:87890.186 1:7087.628 2:5022.185 [bcfview] 500000 sites processed. [afs] 0:85322.061 1:8454.843 2:6223.096 [bcfview] 600000 sites processed. [afs] 0:85864.368 1:8410.777 2:5724.854 [bcfview] 700000 sites processed. [afs] 0:88813.814 1:6828.001 2:4358.185 [bcfview] 800000 sites processed. [afs] 0:89070.318 1:6302.924 2:4626.758 [bcfview] 900000 sites processed. [afs] 0:88364.380 1:6796.962 2:4838.658 [bcfview] 1000000 sites processed. [afs] 0:86892.531 1:7268.088 2:5839.381 [bcfview] 1100000 sites processed. [afs] 0:87184.845 1:7153.073 2:5662.081 [bcfview] 1200000 sites processed. [afs] 0:86762.756 1:7241.236 2:5996.008 [bcfview] 1300000 sites processed. [afs] 0:89346.143 1:6159.989 2:4493.868 [bcfview] 1400000 sites processed. [afs] 0:88658.355 1:7160.555 2:4181.089 [bcfview] 1500000 sites processed. [afs] 0:85985.305 1:8308.039 2:5706.656 [bcfview] 1600000 sites processed. [afs] 0:87346.636 1:7708.883 2:4944.480 [afs] 0:63097.202 1:3950.127 2:3572.670 + bcftools view .bcf

+ cd ..

I have seen that some groups use after realignment Picard:AddOrReplaceReadGroups and I wonder if I should use before calling the variant with samtools.

Thanks in advance for any advice you can give me.

Chris