# Tagged with #strand-bias 0 documentation articles | 0 announcements | 10 forum discussions

No posts found with the requested search criteria.
No posts found with the requested search criteria.

Created 2015-09-21 08:14:52 | Updated | Tags: strand-bias mrnaseq

Hello,

I was wondering if it makes sense to filter for strand bias as stated in the Best Practice RNAseq Variant Calling guide as most of todays RNAseq data is strand specific. I would actually expect high strand biases of variants and be suspicious about variants which do NOT show strand bias =) ...or did i get something wrong with the Fisher Strand values?

Thank you

Created 2015-07-31 01:07:30 | Updated | Tags: documentation strand-bias

refRatio and altRatio should be calculated as min/max rather than max/min values. That is what is in code here: https://github.com/broadgsa/gatk-protected/blob/master/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/StrandOddsRatio.java assuming the two correspond to each other. I coded it the same way myself and get more precise results too.

Created 2015-07-23 11:26:42 | Updated | Tags: fisherstrand haplotypecaller downsampling strand-bias

Hi GATK team, Again thanks a lot for the wonderful tools you're offering to the community.

I have recently switched from UnifiedGenotyper to Haplotype Caller (1 sample at a time, DNASeq). I was planning to use the same hard filtering procedure that I was using previously, including the filter of the variants with FS > 60. However I am facing an issue probably due to the downsampling done by HC.

I should have 5000 reads, but DP is around 500/600 which I understood is due to downsampling (even with -dt NONE). I did understand that it does not impact in the calling itself. However it is annoying me for 2 reasons 1) Calculating frequency of the variant using the AD field is not correct (not based on all reads) 2) I get variants with FS >60 whereas when you look at the entire set of reads, there is absolutely no strand bias.

Example with this variant chr17 41245466 rs1799949 G A 7441.77 STRAND_BIAS; AC=1;AF=0.500;AN=2;BaseQRankSum=7.576;DB;DP=1042;FS=63.090;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.666;QD=7.14;ReadPosRankSum=-11.896;SOR=5.810 GT:AD:GQ:PL:SB 0/1:575,258:99:7470,0,21182:424,151,254,4

When I observe all reads I have the following counts, well shared on the + and - strands Allele G : 1389 (874+, 515-) Allele A : 1445 (886+, 559-)

Could you please tell me how to avoid such an issue ? (By the way, this variant is a true one and should not be filtered out).

Thanks a lot.

Created 2015-05-06 14:48:11 | Updated | Tags: unifiedgenotyper strand-bias

Hi there, I have been struggling with the interpretation of the SOR annotation. I do get that a higher value is the sign of a strand bias and that this value is never negative.

I did read the SOR annotation documention but still cannot figure out how you calculate the SOR.

Here is one of my example where a SB is present for sure REF + strand = 2185 reads REF - strand = 5 reads ALT + strand = 7 reads ALT - strand = 2370 reads.

When I calculate "R" as indicated in the documentation, I obtain a very high value of 147 955. But for this variant in the VCF file, SOR = 11.382

Thanks Manon

Created 2015-01-26 11:01:54 | Updated 2015-01-26 11:47:17 | Tags: fisherstrand haplotypecaller jexl strand-bias filtering hardfilters

Hi, I need to apply hard filters to my data. In cases where I have lower coverage I plan to use the Fisher Strand annotation, and in higher coverage variant calls, SOR (using a JEXL expression to switch between them: DP < 20 ? FS > 50.0 : SOR > 3).

The variant call below (some annotations snipped), which is from a genotyped gVCF from HaplotypeCaller (using a BQSR'ed BAM file), looks well supported (high QD, high MQ, zero MQ0). However, there appears to be some strand bias (SOR=3.3):

788.77 . DP=34;FS=5.213;MQ=35.37;MQ0=0;QD=25.44;SOR=3.334 GT:AD:DP:GQ:PL 1/1:2,29:31:35:817,35,0

In this instance the filter example above would be applied.

## My Question

Is this filtering out a true positive? And what kind of cut-offs should I be using for FS and SOR?

The snipped annotations ReadPosRankSum=-1.809 and BaseQRankSum=-0.8440 for this variant also indicate minor bias that the evidence to support this variant call also has some bias (the variant appears near the end of reads in low quality bases, compared to the reads supporting the reference allele).

## My goal

This is part of a larger hard filter I'm applying to a set of genotyped gVCFs called from HaplotypeCaller.

I'm filtering HomRef positions using this JEXL filter:

vc.getGenotype("%sample%").isHomRef() ? ( vc.getGenotype("%sample%").getAD().size == 1 ? (DP < 10) : ( ((DP - MQ0) < 10) || ((MQ0 / (1.0 * DP)) >= 0.1) || MQRankSum > 3.2905 || ReadPosRankSum > 3.2905 || BaseQRankSum > 3.2905 ) ) : false

And filtering HomVar positions using this JEXL:

vc.getGenotype("%sample%").isHomVar() ? ( vc.getGenotype("%sample%").getAD().0 == 0 ? ( ((DP - MQ0) < 10) || ((MQ0 / (1.0 * DP)) >= 0.1) || QD < 5.0 || MQ < 30.0 ) : ( BaseQRankSum < -3.2905 || MQRankSum < -3.2905 || ReadPosRankSum < -3.2905 || (MQ0 / (1.0 * DP)) >= 0.1) || QD < 5.0 || (DP < 20 ? FS > 60.0 : SOR > 3.5) || MQ < 30.0 || QUAL < 100.0 ) ) : false

My goal is true positive variants only and I have high coverage data, so the filtering should be relatively stringent. Unfortunately I don't have a database I could use to apply VQSR, henceforth the comprehensive filtering strategy.

Created 2014-07-23 12:33:37 | Updated | Tags: mutect strand-bias

Hi, Mutect does have filter for strand bias, but does not give strand imformation(like DP4 or other ) in its output .call file or .vcf file. But sometimes I wanna check the strand distribution of the SNV called and have further filtering, I wonder how could I get such imformation?

Thanks! Hartblue

Created 2014-06-30 09:16:53 | Updated | Tags: strand-bias

Hi, again, thanks a lot for the amazing workshop in Brussels! I have a question on dealing with strand bias, regarding the SB flag in the vcf file: The value of SB (strand bias) is calculated by Fisher exact test, using a 2X2 table that contains the reference, non-reference, fwd and reverse depths. Playing a little with the numbers given to Fisher exact test through web calculator I noticed that combinations which seem as clear strand bias receive non-significant value (e.g. 30,1,110,2 for ref-fwd, ref-reverse, non-ref-fwd, non-ref-reverse receive p-value of 0.52 or 2.84 when phred-scaled). Such variant are therefore considered as unbiased. The cases that are defined as bias are ones where 3 out of the 4 values are similar to each and only one is extremely different. As far as I understand, cases as the one I mentioned should be referred as biased. Do you recommend using this strand bias value and filter variants based on it?

Maya

Created 2012-11-13 16:05:23 | Updated | Tags: fisherstrand strand-bias exome

Hi,

I have seen the definition of strand bias on this site (below) but I need a little clarification. Does the FS filter (a) highlight instances where reads are only present on a single strand and contain a variant (as may occur toward the end of exome capture regions) or does it (b) specifically look for instances where there are reads on both strands but the variant allele is disproportionately represented on one strand (as might be indicative of a false positive), or does it (c) do both?

I had thought it did (b) but have encountered some disagreement.

** How much evidence is there for Strand Bias (the variation being seen on only the forward or only the reverse strand) in the reads? Higher SB values denote more bias (and therefore are more likely to indicate false positive calls.

Created 2012-10-04 05:22:20 | Updated 2013-01-07 20:32:11 | Tags: strand-bias

All,

The Strand Bias (SB) value on the vcf file looks so difference from the gatk1.5 & 2.1

1. On the gatk 1.5, SB=0 or 0.1 is acceptable, the smaller SB the better ... etc
2. On gatk 2.1 SB~-6.1e03 ???

My question is what is cut-off SB value (acceptable!)

Thanks, Q

Created 2012-10-03 05:37:48 | Updated 2013-01-07 20:31:53 | Tags: unifiedgenotyper strand-bias

Hi,

How to get all SNPs variants with Strand Bias (high SB value) ?

GenomeAnalysisTK.jar -T UnifiedGenotyper  \
-R  ucsc.hg19.fasta   -D dbsnp_135.hg19.vcf  \
-stand_emit_conf 1.5 -stand_call_conf 1.5 \
-o Sample_2.snv.vcf  -I Sample_2.recal.bam -nt 10

Thanks, Q