Tagged with #non-model organisms
0 documentation articles | 0 announcements | 1 forum discussion

No posts found with the requested search criteria.
No posts found with the requested search criteria.

Created 2013-09-29 17:07:55 | Updated | Tags: bqsr
Comments (2)

I have been working primarily with non-model organisms (and mostly inbred-mapping populations, but that's a topic for a different discussion). To recalibrate base qualities, I have taken the approach of running through the Indel Realignment, SNP, and INDEL calling. Then, filtering around INDELs. I use multi-sample VCFs and have taken the following approach to recalibrate base quality: I grab the top 90th percentile SNPs from all SNPs in my filtered SNP VCF file (based on ALTQ), then I pull out these top SNPs for each SAMPLE in the VCF file (in my case I usually have between 100-300 samples) and write to SEPARATE VCF files for each SAMPLE if the GQ > 90 and it's a SNP for that sample. I then use these SAMPLE HQ VCF files for the BQSR tools.

I have a simple python script for this located here

usage: GetHighQualVcfs.py [-h] -i INFILE -o OUTDIR [--ploidy PLOIDY] [--GQ GQ]
                          [--percentile PERCENTILE]

Split multi-sample VCFs into single sample VCFs of high quality SNPs.

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE, --infile INFILE
                        Multi-sample VCF file
  -o OUTDIR, --outdir OUTDIR
                        Directory to output HQ VCF files.
  --ploidy PLOIDY       1 for haploid; 2 for diploid
  --GQ GQ               Filters out variants with GQ < this limit.
  --percentile PERCENTILE
                        Reduces to variants with ALTQ > this percentile.

Thoughts? Concerns? Perhaps I'm going about this in a completely wrong way?