Tagged with #strand-bias
0 documentation articles | 0 announcements | 6 forum discussions


No posts found with the requested search criteria.
No posts found with the requested search criteria.

Created 2015-05-06 14:48:11 | Updated | Tags: unifiedgenotyper strand-bias
Comments (8)

Hi there, I have been struggling with the interpretation of the SOR annotation. I do get that a higher value is the sign of a strand bias and that this value is never negative.

I did read the SOR annotation documention but still cannot figure out how you calculate the SOR.

Here is one of my example where a SB is present for sure REF + strand = 2185 reads REF - strand = 5 reads ALT + strand = 7 reads ALT - strand = 2370 reads.

When I calculate "R" as indicated in the documentation, I obtain a very high value of 147 955. But for this variant in the VCF file, SOR = 11.382

How is the 11.382 obtained ? Could you please help me in understanding the computation ?

Thanks Manon


Created 2015-01-26 11:01:54 | Updated 2015-01-26 11:47:17 | Tags: fisherstrand haplotypecaller jexl strand-bias filtering hardfilters vcf-filtering
Comments (1)

Hi, I need to apply hard filters to my data. In cases where I have lower coverage I plan to use the Fisher Strand annotation, and in higher coverage variant calls, SOR (using a JEXL expression to switch between them: DP < 20 ? FS > 50.0 : SOR > 3).

The variant call below (some annotations snipped), which is from a genotyped gVCF from HaplotypeCaller (using a BQSR'ed BAM file), looks well supported (high QD, high MQ, zero MQ0). However, there appears to be some strand bias (SOR=3.3):

788.77 . DP=34;FS=5.213;MQ=35.37;MQ0=0;QD=25.44;SOR=3.334 GT:AD:DP:GQ:PL 1/1:2,29:31:35:817,35,0

In this instance the filter example above would be applied.

My Question

Is this filtering out a true positive? And what kind of cut-offs should I be using for FS and SOR?

The snipped annotations ReadPosRankSum=-1.809 and BaseQRankSum=-0.8440 for this variant also indicate minor bias that the evidence to support this variant call also has some bias (the variant appears near the end of reads in low quality bases, compared to the reads supporting the reference allele).

My goal

This is part of a larger hard filter I'm applying to a set of genotyped gVCFs called from HaplotypeCaller.

I'm filtering HomRef positions using this JEXL filter:

vc.getGenotype("%sample%").isHomRef() ? ( vc.getGenotype("%sample%").getAD().size == 1 ? (DP < 10) : ( ((DP - MQ0) < 10) || ((MQ0 / (1.0 * DP)) >= 0.1) || MQRankSum > 3.2905 || ReadPosRankSum > 3.2905 || BaseQRankSum > 3.2905 ) ) : false

And filtering HomVar positions using this JEXL:

vc.getGenotype("%sample%").isHomVar() ? ( vc.getGenotype("%sample%").getAD().0 == 0 ? ( ((DP - MQ0) < 10) || ((MQ0 / (1.0 * DP)) >= 0.1) || QD < 5.0 || MQ < 30.0 ) : ( BaseQRankSum < -3.2905 || MQRankSum < -3.2905 || ReadPosRankSum < -3.2905 || (MQ0 / (1.0 * DP)) >= 0.1) || QD < 5.0 || (DP < 20 ? FS > 60.0 : SOR > 3.5) || MQ < 30.0 || QUAL < 100.0 ) ) : false

My goal is true positive variants only and I have high coverage data, so the filtering should be relatively stringent. Unfortunately I don't have a database I could use to apply VQSR, henceforth the comprehensive filtering strategy.


Created 2014-06-30 09:16:53 | Updated | Tags: strand-bias
Comments (5)

Hi, again, thanks a lot for the amazing workshop in Brussels! I have a question on dealing with strand bias, regarding the SB flag in the vcf file: The value of SB (strand bias) is calculated by Fisher exact test, using a 2X2 table that contains the reference, non-reference, fwd and reverse depths. Playing a little with the numbers given to Fisher exact test through web calculator I noticed that combinations which seem as clear strand bias receive non-significant value (e.g. 30,1,110,2 for ref-fwd, ref-reverse, non-ref-fwd, non-ref-reverse receive p-value of 0.52 or 2.84 when phred-scaled). Such variant are therefore considered as unbiased. The cases that are defined as bias are ones where 3 out of the 4 values are similar to each and only one is extremely different. As far as I understand, cases as the one I mentioned should be referred as biased. Do you recommend using this strand bias value and filter variants based on it?

Maya


Created 2012-11-13 16:05:23 | Updated | Tags: fisherstrand strand-bias exome
Comments (11)

Hi,

I have seen the definition of strand bias on this site (below) but I need a little clarification. Does the FS filter (a) highlight instances where reads are only present on a single strand and contain a variant (as may occur toward the end of exome capture regions) or does it (b) specifically look for instances where there are reads on both strands but the variant allele is disproportionately represented on one strand (as might be indicative of a false positive), or does it (c) do both?

I had thought it did (b) but have encountered some disagreement.

** How much evidence is there for Strand Bias (the variation being seen on only the forward or only the reverse strand) in the reads? Higher SB values denote more bias (and therefore are more likely to indicate false positive calls.


Created 2012-10-04 05:22:20 | Updated 2013-01-07 20:32:11 | Tags: strand-bias
Comments (0)

All,

The Strand Bias (SB) value on the vcf file looks so difference from the gatk1.5 & 2.1 1. On the gatk 1.5, SB=0 or 0.1 is acceptable, the smaller SB the better ... etc 2. On gatk 2.1 SB~-6.1e03 ???

My question is what is cut-off SB value (acceptable!)

Thanks, Q


Created 2012-10-03 05:37:48 | Updated 2013-01-07 20:31:53 | Tags: unifiedgenotyper strand-bias
Comments (7)

Hi,

How to get all SNPs variants with Strand Bias (high SB value) ?

GenomeAnalysisTK.jar -T UnifiedGenotyper  \
    -R  ucsc.hg19.fasta   -D dbsnp_135.hg19.vcf  \
    -stand_emit_conf 1.5 -stand_call_conf 1.5 \
    -o Sample_2.snv.vcf  -I Sample_2.recal.bam -nt 10

Thanks, Q