Machine learning-based design of proteins (and small molecules and beyond)

Jennifer Listgarten
Dept. of Electrical Engineering and Computer Sciences, Center for Computational Biology, Berkeley AI Research Lab, UC Berkeley; Chan Zuckerberg Biohub
Machine learning-based design of proteins (and small molecules and beyond) 

Data-driven design is making headway into a number of application areas, including protein, small-molecule, and materials engineering. The design goal is to construct an object with desired properties, such as a protein that binds to a target more tightly than previously observed. To that end, costly experimental measurements are being replaced with calls to a high-capacity regression model trained on labeled data, which can be leveraged in an in silico search for promising design candidates. The aim then is to discover designs that are better than the best design in the observed data. This goal puts machine-learning based design in a much more difficult spot than traditional applications of predictive modelling, since successful design requires, by definition, some degree of extrapolation---a pushing of the predictive models to its unknown limits, in parts of the design space that are a priori unknown. In this talk, I will anchor this overall problem in protein engineering, and discuss our emerging computational approaches to tackle it.