Tagged with #mappingqualityfilter
0 documentation articles | 0 announcements | 6 forum discussions

No articles to display.

No articles to display.

Created 2015-08-07 14:52:28 | Updated | Tags: vqsr mappingqualityfilter joint-calling

Comments (3)

Hi dear all, I went through the whole variant calling pipeline on my whole exome sequencing data.Now I have three questions here.Q1. Is it necessary to perform additional quality filter to remove low quality reads and barcode contamination before mapping? As there are dedupping and BQSR in downstream steps, can I assume that the effect brought by low quality bases and barcode contamination will be eliminated in downstream steps? Q2. Is it better to do joint calling than do variant calling individually? We aim to find pathological mutations by comparing SNPs between the affected and the normal in one family. For each family, we have data sets from 3-4 individuals. I marked each individual with different @RG tags. In my first trial, I just used the basic command calling SNPs one sample a time. I learned that VCF mode accepts multiple bam files. I can type -I No1.bam -I No2.bam -I No3.bam -I .... But gVCF mode only accepts one bam file a time. So I should merge multiple bams using 'printreads' before using 'HaplotypeCaller'. My confusion is that 'BaseRecalibrator' only accepts one bam file and output one BQSR table a time. So should I 'cat' all tables and use as -BQSR for 'printreads'? Which will be better? Still use VCF mode by inputting multiple bam files at a time or merge multiple bam files in advance and do gVCF calling? Q3.Should I use hard filters instead of VQSR? Though we are working on whole exome data, we are analyzing less than 30 samples a time. I saw in one of your answers that the minimum![]() sample number should reach 30 to fit gaussian model.Though no error was reported when I ran VQSR in my first trial, the Ti/Tv value came out to be bad in my tranches files and model plots seemed different from your example in the best practice. So I think maybe I should just use hard filters then?

Created 2014-10-08 07:06:57 | Updated | Tags: unifiedgenotyper mappingqualityfilter

Comments (6)

Hi I have a vcf that was generated using unified genotyper using output-mode EMIT_ALL_SITES. Several positions in the vcf with ALT as "." have QUAL as "." which I understand as "Reference" with unknown Quality. Howrever, FILTER for these is set to PASS.I am wondering how this is possible? Does this mean that Unified Genotyper did not print a QUAL even though it was score high enough to get it to PASS?

I am pasting some parts of the vcf below. Any help is appreciated.

##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=3,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not applicable if site is not biallelic">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Filtered Depth">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=Dels,Number=1,Type=Float,Description="Fraction of Reads Containing Spanning Deletions">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=HRun,Number=1,Type=Integer,Description="Largest Contiguous Homopolymer Run of Variant Allele In Either Direction">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SB,Number=1,Type=Float,Description="Strand Bias">
##UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[x.bam] sample_metadata=[] read_buffer_size=n
ull phone_home=STANDARD read_filter=[] intervals=[x.bed] excludeIntervals=null reference_sequence=hg19.fasta rodBind=[dbsnp_132.hg19.vcf] rodToIntervalTrackName=null BTI_merge_rule=UNION nonDeterministicRandomSeed=false DBSNP=null downsampling_type=null downs
ample_to_fraction=null downsample_to_coverage=null baq=CALCULATE_AS_NECESSARY baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false defaultBaseQualities=-1 validation_strictness=SIL
ENT unsafe=null num_threads=1 interval_merging=ALL read_group_black_list=null processingTracker=null restartProcessingTracker=false processingTrackerStatusFile=null processingTrackerID=-1 allow_int
ervals_with_unindexed_bam=false disable_experimental_low_memory_sharding=false logging_level=INFO log_to_file=null help=false genotype_likelihoods_model=BOTH p_nonref_model=EXACT heterozygosity=0.0
010 pcr_error_rate=1.0E-4 genotyping_mode=DISCOVERY output_mode=EMIT_ALL_SITES standard_min_confidence_threshold_for_calling=50.0 standard_min_confidence_threshold_for_emitting=10.0 noSLOD=false as
sume_single_sample_reads=null abort_at_too_much_coverage=-1 min_base_quality_score=17 min_mapping_quality_score=20 max_deletion_fraction=0.05 min_indel_count_for_genotyping=5 indel_heterozygosity=1
.25E-4 indelGapContinuationPenalty=10.0 indelGapOpenPenalty=45.0 indelHaplotypeSize=80 doContextDependentGapPenalties=true getGapPenaltiesFromData=false indel_recal_file=indel.recal_data.csv indelD
ebug=false dovit=false GSA_PRODUCTION_ONLY=false exactCalculation=LINEAR_EXPERIMENTAL ignoreSNPAlleles=false output_all_callable_bases=false genotype=false out=org.broadinstitute.sting.gatk.io.stub
s.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub debug_file=null metrics_file=null annotation=[DepthOfC
overage, RMSMappingQuality]"
chr1    14468   .       G       .       .       PASS    DP=110;HaplotypeScore=0.0000;MQ=2.10;MQ0=104    GT      ./.
chr1    14469   .       C       .       .       PASS    DP=109;HaplotypeScore=0.0000;MQ=2.11;MQ0=103    GT      ./.
chr1    14470   .       G       .       .       PASS    DP=105;HaplotypeScore=0.0000;MQ=2.15;MQ0=99     GT      ./.
chr1    14471   .       C       .       .       PASS    DP=103;HaplotypeScore=0.0000;MQ=2.17;MQ0=97     GT      ./.
chr1    14472   .       A       .       .       PASS    DP=106;HaplotypeScore=0.0000;MQ=2.14;MQ0=100    GT      ./.
chr1    14473   .       G       .       .       PASS    DP=103;HaplotypeScore=0.0000;MQ=2.17;MQ0=97     GT      ./.
chr1    14474   .       G       .       .       PASS    DP=98;HaplotypeScore=0.0000;MQ=2.23;MQ0=92      GT      ./.
chr1    14553   .       C       .       .       PASS    DP=98;HaplotypeScore=0.0000;MQ=2.33;MQ0=94      GT      ./.
chr1    14554   .       G       .       .       PASS    DP=99;HaplotypeScore=0.0000;MQ=2.32;MQ0=95      GT      ./.
chr1    14555   .       C       .       .       PASS    DP=101;HaplotypeScore=0.0000;MQ=3.17;MQ0=96     GT      ./.
chr1    14556   .       T       .       32.99   LowQual AC=0;AF=0.00;AN=2;DP=101;MQ=3.17;MQ0=96 GT:DP:GQ:PL     0/0:101:3:0,3,27
chr1    14557   .       C       .       32.99   LowQual AC=0;AF=0.00;AN=2;DP=102;MQ=3.15;MQ0=97 GT:DP:GQ:PL     0/0:102:3:0,3,27
chr1    14587   .       T       .       35.99   LowQual AC=0;AF=0.00;AN=2;DP=100;MQ=4.41;MQ0=90 GT:DP:GQ:PL     0/0:100:6:0,6,51
chr1    14640   .       C       .       50.96   PASS    AC=0;AF=0.00;AN=2;DP=123;MQ=5.73;MQ0=107        GT:DP:GQ:PL     0/0:123:20.97:0,21,174
chr1    14641   .       A       .       50.96   PASS    AC=0;AF=0.00;AN=2;DP=123;MQ=5.84;MQ0=106        GT:DP:GQ:PL     0/0:123:20.97:0,21,174

Created 2014-03-13 10:30:49 | Updated | Tags: mappingqualityfilter

Comments (3)

I want to filter out the reads with lower mapping quality, but I can't find args? Which args I should setting and I should filter the low mapping quality reads in which step will be better? I see the follow link has been described about this, but I can find the args in haplotypecaller,IndelRealigner,BaseRecalibrator and other tools.

http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_filters_MappingQualityFilter.html#--min_mapping_quality_score http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_haplotypecaller_HCMappingQualityFilter.html

Could you please help me out with this problem?

Created 2012-10-11 21:54:39 | Updated 2012-10-18 01:33:56 | Tags: unifiedgenotyper badmatefilter mappingqualityfilter

Comments (13)

It was a bit unclear what the BadMateFilter is doing in the documentation. Any information would be appreciated!

Created 2012-08-21 16:20:00 | Updated 2013-01-07 20:45:30 | Tags: mappingqualityfilter

Comments (5)

Dear all,

I fail retrieving variant calling (.vcf ) using the GATK2.0, although with a similar example it works well. I compared both and I find a difference in Mapping Quality (the mapping quality of the example that works has 60 whereas the other has 255 -this last one is performed using bfast and gives this quality-)

Googling, I already find that this could be caused because of GATK doesn't take into account qualities of 255. Is it true? http://www.biostars.org/post/show/43540/gatk-baq-and-dbsnp-option-in-countvariates/ (Note than this solution affects to GATK1.4 and I am using the GATK2.0)

I also checked the reference genome was ok (and also the .bed file with the exom position).

I repeat the process changing manually the "255" to "60" with the same result.

Any ideas of what could be the problem?

The executed command :

java -jar /home/public/biotools/GATK_2.0/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /home/public/mnt/cubix/public/biodata/hg19_norm/hg19.fa -L /home/public/mnt/cubix/public/biodata/hg19_norm/bed/all_captured_exomes_hg19.bed -I /home/public/test/outer_test/intermediate/AUT143_chr15_extract_cpy_sorted.bam -glm SNP -nt 4 -o /home/public/test/outer_test/intermediate/AUT143_chr15_extract_cpy_sorted.bam.vcf

$ samtools view AUT143_chr15_extract_cpy_sorted.bam | head 1629_189_1658_F3 0 15 20002423 255 2I48M 0 0 TATCCAAATATCCACTTGCAGATTCCACGAAAATAGTGTTTCAAAACTGC 5:=59B98@>!!4<8C?72!!!!92::!!1373!!00!!-00!!.!!!!% XA:i:2 MD:Z:48 XE:Z:-----------3--------300-----1-----2---2----1--0-0- PG:Z:bfast RG:Z:sample IH:i:1 NH:i:1 HI:i:1 CM:i:10 NM:i:2 CQ:Z:!7B90A;/:A@<%>.>?5'.:?7>+/=)9'&22%%%&%(%%1&/%(/%%% OQ:Z:ARZ`SR!!LUU`]E>!!!!RCUO!!6AM@!!44!!3?@!!6!!!!% AS:i:925 CS:Z:T03320100333301120131300020111200032211200211000203 1396_1160_190_F3 16 15 20006225 255 50M 0 0 CTCAATCTAAAGATAGGTTCAACTCTCTGAGATGAGTGCACACATCACAA !!!!!!6/206B1!!!!38!!!!!2535:D41426372@A?86!!!!!!! XA:i:2 MD:Z:26G4T18 XE:Z:0-310--------2-3---2-00--------------------13-2-2- PG:Z:bfast RG:Z:sample IH:i:1 NH:i:1 HI:i:1 CM:i:13 NM:i:2 CQ:Z:!'''1+5-1:<A5%4.+''%8@+%1))'?%,?%1&%8%<<-.018:-%)- OQ:Z:!!!!!!RJGDRJ!!!!?M!!!!!;C?9TF57;BKBC_`_TG!!!!!!! AS:i:475 CS:Z:T02121311111131122132221222200022013223220032201320 .... (I also try to change manually the 255 value in .sam file (and I added 60) with no values ...)


Created 2012-08-02 17:07:55 | Updated 2012-08-02 17:07:55 | Tags: unifiedgenotyper mappingqualityfilter

Comments (5)

HI GATK - I am still using the GenomeAnalysisTK-1.6-5-g557da77 version for UnifiedGenotyper. This is probably a silly question, but is there a way to set a parameter for minimum mapping quality score for reads, in deciding whether to evaluate them for variant detection. I know there is a --min_base_quality_score parameter, but I don't see on for mapping quality. http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_walkers_genotyper_UnifiedGenotyper.html