Tagged with #realignment
0 documentation articles | 0 events or announcements | 5 forum discussions


Sorry, there are no publicly available documents of this type with the tag #realignment. Try one of the other types.
Sorry, there are no publicly available documents of this type with the tag #realignment. Try one of the other types.

Hello,

I am new at using GATK (v 2.1-3). I do exome sequencing by sample using the following steps: Alignment with BWA (0.6.2) GATK :Local realignment around INDELs PICARD (1.67): FixMateInformation GATK: Recalibration (BaseRecalibrator + PrintReads -BQSR) Samtools for calling variants

Samtools seems to run properly but no file (*.vcf and *.bcf) are created and no error message is prompted :

cd Sample_09 + samtools mpileup -BE -ug -q 20 -Q 20 -D -f human_g1k_v37.fasta realigned_fixed_recal.bam -C50 + bcftools view -bvcg - [mpileup] 1 samples in 1 input files Set max per-file depth to 8000 [bcfview] 100000 sites processed. [afs] 0:89274.054 1:6411.053 2:4314.893 [bcfview] 200000 sites processed. [afs] 0:89100.642 1:6125.883 2:4773.474 [bcfview] 300000 sites processed. [afs] 0:87374.996 1:7439.238 2:5185.766 [bcfview] 400000 sites processed. [afs] 0:87890.186 1:7087.628 2:5022.185 [bcfview] 500000 sites processed. [afs] 0:85322.061 1:8454.843 2:6223.096 [bcfview] 600000 sites processed. [afs] 0:85864.368 1:8410.777 2:5724.854 [bcfview] 700000 sites processed. [afs] 0:88813.814 1:6828.001 2:4358.185 [bcfview] 800000 sites processed. [afs] 0:89070.318 1:6302.924 2:4626.758 [bcfview] 900000 sites processed. [afs] 0:88364.380 1:6796.962 2:4838.658 [bcfview] 1000000 sites processed. [afs] 0:86892.531 1:7268.088 2:5839.381 [bcfview] 1100000 sites processed. [afs] 0:87184.845 1:7153.073 2:5662.081 [bcfview] 1200000 sites processed. [afs] 0:86762.756 1:7241.236 2:5996.008 [bcfview] 1300000 sites processed. [afs] 0:89346.143 1:6159.989 2:4493.868 [bcfview] 1400000 sites processed. [afs] 0:88658.355 1:7160.555 2:4181.089 [bcfview] 1500000 sites processed. [afs] 0:85985.305 1:8308.039 2:5706.656 [bcfview] 1600000 sites processed. [afs] 0:87346.636 1:7708.883 2:4944.480 [afs] 0:63097.202 1:3950.127 2:3572.670 + bcftools view .bcf

+ cd ..

I have seen that some groups use after realignment Picard:AddOrReplaceReadGroups and I wonder if I should use before calling the variant with samtools.

Thanks in advance for any advice you can give me.

Chris

I recently used GATK, the latest version to carry out Indel realignment for my bam files. I have read at few places that one needs to fix the mate position information as the reads mapping position may change during realignment. My question is that does this step is taken care of y GATK automatically or one needs to run Picard fixmates before going for subsequent analysis.

Thanks

Hi,

I am currently working with a project where we have sequenced a library of approximately 70 bps insert sizes using 2x100 paired-end seq. While this can seem unnecessary, it can improve base qualities a lot.

I have used SeqPrep (https://github.com/jstjohn/SeqPrep) which strips adaptors and merges reads that overlap, in our case the entire read most of the times. This also boosts the base qualities, if a base was sequenced twice, the quality improves quite a bit. This way, base qualities can stretch up to 70 and over (probability of error 0.0001 x 0.0001 if both reads had Q40 at that base, it merged qual = 80). No funny business there. :)

However, this does not seem to play nicely with GATK. The realignment crashes (see below) saying the the base quals must be erroneous. In my case, but they are correct. Can I force GATK to work with these BQs? (--validation_strictness LENIENT didn't help as you can see below :)

cheers Daniel Klevebring

   INFO  13:04:07,408 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:07,411 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.4-7-g5e89f01, Compiled 2013/03/06 01:01:28 
   INFO  13:04:07,411 HelpFormatter - Copyright (c) 2010 The Broad Institute 
   INFO  13:04:07,411 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
   INFO  13:04:07,416 HelpFormatter - Program Args: -T RealignerTargetCreator -I /scratch/3041404/P394_102.prmdup.bam -R /bubo/proj/b2010040/private/GoldenPath/hg19/GATK_resource_bundle/human_g1k_v37_clean.fasta -o /scratch/3041404/P394_102.realn.intervals --intervals /bubo/proj/b2010040/private/GoldenPath/NG_design/1000G_REF_picard_custom_design_target_regions_HG19.bed.interval_list --validation_strictness LENIENT 
   INFO  13:04:07,416 HelpFormatter - Date/Time: 2013/03/13 13:04:07 
   INFO  13:04:07,416 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:07,416 HelpFormatter - -------------------------------------------------------------------------------- 
   INFO  13:04:08,461 GenomeAnalysisEngine - Strictness is LENIENT 
   INFO  13:04:08,632 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
   INFO  13:04:08,640 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
   INFO  13:04:08,655 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
   INFO  13:04:09,782 IntervalUtils - Processing 39772003 bp from intervals 
   INFO  13:04:10,001 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files 
   INFO  13:04:10,262 GenomeAnalysisEngine - Done creating shard strategy 
   INFO  13:04:10,262 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
   INFO  13:04:10,263 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
   INFO  13:04:18,482 GATKRunReport - Uploaded run statistics report to AWS S3 
   ##### ERROR ------------------------------------------------------------------------------------------
   ##### ERROR A USER ERROR has occurred (version 2.4-7-g5e89f01): 
   ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
   ##### ERROR Please do not post this error to the GATK forum
   ##### ERROR
   ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
   ##### ERROR Visit our website and forum for extensive documentation and answers to 
   ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
   ##### ERROR
   ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/scratch/3041404/P394_102.prmdup.bam} appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 70; please see the GATK --help documentation for options related to this error
   ##### ERROR ------------------------------------------------------------------------------------------

Hello,

before I only used BWA and as you described in the best pratice I performed the realign step. Now I want to integrate in my pipeline Stampy associated with BWA.

Do you think, I should make the realign step ?

Thanks !

Dear Community and GATK's team,

I have one question about the cleaning step before SNP calling, mainly about local realignment around indels.

I read on some website describing their workflow that alignments may change during the realignment process, it would prefer to fix the mate information and Picard offers this utility to do that for us. is it true? Or are there only any insert sizes that can change? If there are some change of insert sizes, is there a tool that checks that these changes are ok?

What do you use Picard's tool, FixMateInformation.jar, to fix the mate information when using paired-end data ?

Up to now, I have not used in my pipeline. Maybe this is a mistake. If we have to add this step, should we add this step after the realignment step or recalibration step?

Thank you for your help,

Tiphaine