I want to use GATK to filter a multisample VCF based on the PL numbers but I cannot figure out how to do this. I want to flag genotypes that are likely to be homozygous reference (with the aim of removing these eventually). I would like to look at the PL field and if the first PL number (which refers to the probability of a homozygous reference call) is 10 or less, flag these genotypes.
The variant Filtration page is quite helpful and suggests accessing arrays using things like: 'vc.getGenotype("NA12878").getAD().0 > 10'
but this is not quite what I want, as I want the filtering to be genotype based (not variant based) and I don't want to base the filtering on a single sample but on each sample separately. Basically I am looking for something like this:
--genotypeFilterExpression "PL.0 > 10"
where PL/0 is the first number of the PL array. I cannot figure out anywhere a way to do this. Can someone suggest a recipe to achieve this?
Thank you in advance.