Hello,
I am new at using GATK (v 2.1-3). I do exome sequencing by sample using the following steps: Alignment with BWA (0.6.2) GATK :Local realignment around INDELs PICARD (1.67): FixMateInformation GATK: Recalibration (BaseRecalibrator + PrintReads -BQSR) Samtools for calling variants
cd Sample_09
+ samtools mpileup -BE -ug -q 20 -Q 20 -D -f human_g1k_v37.fasta realigned_fixed_recal.bam -C50
+ bcftools view -bvcg -
[mpileup] 1 samples in 1 input files
I have seen that some groups use after realignment Picard:AddOrReplaceReadGroups and I wonder if I should use before calling the variant with samtools.
Thanks in advance for any advice you can give me.
Chris
I recently used GATK, the latest version to carry out Indel realignment for my bam files. I have read at few places that one needs to fix the mate position information as the reads mapping position may change during realignment. My question is that does this step is taken care of y GATK automatically or one needs to run Picard fixmates before going for subsequent analysis.
Thanks
Hi,
I am currently working with a project where we have sequenced a library of approximately 70 bps insert sizes using 2x100 paired-end seq. While this can seem unnecessary, it can improve base qualities a lot.
I have used SeqPrep (https://github.com/jstjohn/SeqPrep) which strips adaptors and merges reads that overlap, in our case the entire read most of the times. This also boosts the base qualities, if a base was sequenced twice, the quality improves quite a bit. This way, base qualities can stretch up to 70 and over (probability of error 0.0001 x 0.0001 if both reads had Q40 at that base, it merged qual = 80). No funny business there. :)
However, this does not seem to play nicely with GATK. The realignment crashes (see below) saying the the base quals must be erroneous. In my case, but they are correct. Can I force GATK to work with these BQs? (--validation_strictness LENIENT didn't help as you can see below :)
cheers Daniel Klevebring
INFO 13:04:07,408 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:04:07,411 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.4-7-g5e89f01, Compiled 2013/03/06 01:01:28
INFO 13:04:07,411 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 13:04:07,411 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 13:04:07,416 HelpFormatter - Program Args: -T RealignerTargetCreator -I /scratch/3041404/P394_102.prmdup.bam -R /bubo/proj/b2010040/private/GoldenPath/hg19/GATK_resource_bundle/human_g1k_v37_clean.fasta -o /scratch/3041404/P394_102.realn.intervals --intervals /bubo/proj/b2010040/private/GoldenPath/NG_design/1000G_REF_picard_custom_design_target_regions_HG19.bed.interval_list --validation_strictness LENIENT
INFO 13:04:07,416 HelpFormatter - Date/Time: 2013/03/13 13:04:07
INFO 13:04:07,416 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:04:07,416 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:04:08,461 GenomeAnalysisEngine - Strictness is LENIENT
INFO 13:04:08,632 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 13:04:08,640 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 13:04:08,655 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
INFO 13:04:09,782 IntervalUtils - Processing 39772003 bp from intervals
INFO 13:04:10,001 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files
INFO 13:04:10,262 GenomeAnalysisEngine - Done creating shard strategy
INFO 13:04:10,262 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 13:04:10,263 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 13:04:18,482 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.4-7-g5e89f01):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/scratch/3041404/P394_102.prmdup.bam} appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 70; please see the GATK --help documentation for options related to this error
##### ERROR ------------------------------------------------------------------------------------------
Hello,
before I only used BWA and as you described in the best pratice I performed the realign step. Now I want to integrate in my pipeline Stampy associated with BWA.
Do you think, I should make the realign step ?
Thanks !
Dear Community and GATK's team,
I have one question about the cleaning step before SNP calling, mainly about local realignment around indels.
I read on some website describing their workflow that alignments may change during the realignment process, it would prefer to fix the mate information and Picard offers this utility to do that for us. is it true? Or are there only any insert sizes that can change? If there are some change of insert sizes, is there a tool that checks that these changes are ok?
What do you use Picard's tool, FixMateInformation.jar, to fix the mate information when using paired-end data ?
Up to now, I have not used in my pipeline. Maybe this is a mistake. If we have to add this step, should we add this step after the realignment step or recalibration step?
Thank you for your help,
Tiphaine