Illumina sequencers perform an internal quality filtering procedure called chastity filter, and reads that pass this filter are called PF for pass-filter. According to Illumina, chastity is defined as the ratio of the brightest base intensity divided by the sum of the brightest and second brightest base intensities. Clusters of reads pass the filter if no more than 1 base call has a chastity value below 0.6 in the first 25 cycles. This filtration process removes the least reliable clusters from the image analysis results.

For additional information on chastity filters, please see:

  • Illumina, Inc. (2015). Calculating Percent Passing Filter for Patterned and Non-Patterned Flow Cells: A comparison of methods for calculating percent passing filter on Illumina flow cells. at http://www.illumina.com


I have been given ~2000 gVCFs generated by Illumina (one sample per gVCF). Though they are in standard gVCF format, they were generated by an Illumina pipeline (https://support.basespace.illumina.com/knowledgebase/articles/147078-gvcf-file if you're really curious) and not the Haplotype Caller. As a result (I think ... ), the GATK doesn't want to process them (I have tried CombineGVCFs and GenotypeGVCFs to no avail). Is there a GATK walker or some other tool that will make my gVCFs GATK-friendly? I need to be able to merge this data together to make it analyze-able because in single-sample VCF format it's pretty useless at the moment.

My only other thought has been to expand all the ref blocks of data and then merge everything together, but this seems like it will result in the creation of a massive amount of data.




Hi, I am analyzing human data from the new Illumina Hiseq X10. I was wondering, since the quality scores are binned and are not "standard" phred scores... is there any problems in using the tools (BaseRecalibrator, etc) for Variant Calling on these kind of data? I have already analyzed data coming from older Illumina platforms and everything worked just fine. I just wanted to be sure that I can keep using the same pipeline.



Hi, I use GATK for Variant Call in an Investigation Unit. When I use RealignerTargetCreator, I get an error: I'm using a wrong encoding for quiality scores. Which encoding (sanger, illumina, solexa) is ok to use with GATK? Do you have any tool to convert to that encoding? Thank you

Hi all, I am working with GATK on illumina data that was created from yeasts (SK1 strain). Since it is sequencing of a colony and not the exact same organism, I am using a filter that I wrote on the ratio of the reads that support the alternative allele. Is anyone else using something similar? Is there a build-in filter like that?