variantFiltration using PL numbers on multisample VCF file
Posted in Ask the team | Last updated on

Dear all,

I want to use GATK to filter a multisample VCF based on the PL numbers but I cannot figure out how to do this. I want to flag genotypes that are likely to be homozygous reference (with the aim of removing these eventually). I would like to look at the PL field and if the first PL number (which refers to the probability of a homozygous reference call) is 10 or less, flag these genotypes.

The variant Filtration page is quite helpful and suggests accessing arrays using things like: 'vc.getGenotype("NA12878").getAD().0 > 10'

but this is not quite what I want, as I want the filtering to be genotype based (not variant based) and I don't want to base the filtering on a single sample but on each sample separately. Basically I am looking for something like this:

--genotypeFilterExpression "PL.0 > 10"

where PL/0 is the first number of the PL array. I cannot figure out anywhere a way to do this. Can someone suggest a recipe to achieve this?

Thank you in advance.

