Using JEXL expressions
From GSA
Examples of JEXL expressions:
"QUAL / DP < 10.0"
"QUAL > 30.0 && DP == 10"
"MY_STRING_KEY == 'foo'" (note the need for single-quotes)
Working with complex expressions
It is very important to note that JEXL cannot correctly handle missing annotations and throws an exception when it encounters this scenario. The default behavior of the GATK is to handle this by having the entire expression evaluate to FALSE in such cases (although some tools provide options to change this behavior). This is extremely important especially when constructing complex expressions. For example, take the following expression:
"QD < 2.0 || ReadPosRankSum < -20.0 || FS > 200.0"
When run against a VCF record with INFO field "QD=10.0;FS=300.0;ReadPosRankSum=-10.0" it will evaluate to TRUE (because the FS value is greater than 200.0). But when run against a VCF record with INFO field "QD=10.0;FS=300.0" it will evaluate to FALSE (because there is no ReadPosRankSum value defined at all and JEXL fails to evaluate it). This means then that when e.g. trying to filter out records with VariantFiltration, the previous record would be marked as PASSing, even though it contains a bad FS value.
For this reason, we highly recommend that complex expressions involving OR operations be split up into separate expressions whenever possible. For example, the previous example would have 3 distinct expressions: "QD < 2.0", "ReadPosRankSum < -20.0", and "FS > 200.0". This way, although the ReadPosRankSum expression evaluates to FALSE when the annotation is missing, the record can still get filtered (again using the example of VariantFiltration) when the FS value is greater than 200.0.
Accessing the underlying VariantContext directly
If you are family with the VariantContext, Genotype and it's associated classes and methods, you can directly access the full range of capabilities of the underlying objects from the command line. The underlying VariantContext object is available through the vc variable.
For example, suppose I want to select with SelectVariants all of the sites where sample NA12878 is homozygous reference. This can be accomplished by assessing the underlying VariantContext as follows:
java -Xmx4g -jar dist/GenomeAnalysisTK.jar -T SelectVariants -R b37/human_g1k_v37.fasta --variant my.vcf -select 'vc.getGenotype("NA12878").isHomRef()'
Here's a more sophisticated example JEXL that finds all novel variants in the total set with allele frequency > 0.25 but not 1, is not filtered, and is non-reference in 01-0263 sample.
! vc.getGenotype("01-0263").isHomRef() && (vc.getID() == null || vc.getID().equals(".")) && AF > 0.25 && AF < 1.0 && vc.isNotFiltered() && vc.isSNP() -o 01-0263.high_freq_novels.vcf -sn 01-0263
Added June 16, 2011.
Known Issues
- 1) The types used in your expression must exactly match that of the value you are trying to match. In other words, if you have a QUAL field with non-integer values (e.g. 45.3) and your filter expression is written as an integer (e.g. "QUAL < 50"), the system will throw a Java exception.
- 2) Currently, VCF INFO field keys are case-sensitive. [This may actually be the correct behavior, but it's still under debate.]
