Leveraging the Sparsity of Epistatic Interactions to Understand and Improve Models of Fitness Functions
The Walsh-Hadamard transform provides a powerful tool to analyze fitness functions in terms of epistatic interactions between amino acids in a sequence. Empirical evidence suggests that many natural fitness functions display substantial sparsity when represented in terms of epistatic interactions. Here we examine several ways this observation may be leveraged to design experiments aimed at probing fitness functions and improve the modeling of such functions from fitness data. First, we explain how to extend the WH transform to larger alphabets with more than two elements using generalized Graph Fourier transforms, which makes possible the analysis of fitness functions of complete nucleotide and amino acid alphabets. Next, we consider how the natural sparsity of fitness functions in the Graph Fourier representation can be used with the Compressed Sensing theory to determine the number of experimental measurements that must be acquired to model a fitness function effectively. Finally, we describe Epistatic Net, a method for regularizing the training of a neural network model of fitness functions that encourages the model to maintain a sparse representation in terms of epistatic interactions. We show that applying this empirically-motivated inductive bias improves the accuracy of fitness models in predicting the fitness of unobserved sequences.