I'd like to be able to perform stratifications in a multi sample vcf, by values that are in the format fields. Almost all of the existing stratifications are based on site specific information rather than sample specific ones. One stratification in particular that I would like to perform is by ReadDepth. I would like to be able to differentiate for instance, all samples with ReadDepth greater than 20. This works in single sample vcfs, but it produces strange results in ones with multiple samples, since each VariantContext contains multiple genotypes.
Melting my vcfs and reporting multiple lines for each position seems possible, but ugly. Splitting vcfs so that each sample is in it's own vcf is also possible and ugly. What is the recommended method for dealing with this sort of stratification?