Eddy Lab, Harvard
Interpretable convolutional networks for regulatory genomics
Deep learning methods have the potential to make a significant impact in biology and healthcare, but a major challenge is understanding the reasons behind their predictions. In this talk, I will demonstrate how interpreting these “black box” models can: 1) provide novel biological insights and 2) help navigate better model design for big, noisy biological sequence data. In the first part, I will present results from interrogating a convolutional neural network (CNN) trained to infer sequence specificities of RNA-binding proteins. We find that in addition to sequence motifs, our CNN learns a model that considers the number of motifs, their spacing, and both positive and negative effects of RNA structure context. In the second part, I will discuss ongoing research which demonstrates how deep learning can help design better models for protein contact predictions. Specifically, we interpret a variational autoencoder (VAE) that is trained on aligned, homologous protein sequences. We find that our VAEs capture phylogenetic relationships with an approximate Bayesian mixture model of profiles, i.e. site-independent amino-acid probability models, a result that serves as a good null model for contact predictions.