Microsoft Research

Engineered proteins play increasingly essential roles in applications spanning pharmaceuticals, molecular tools, synthetic biology, and more. Deep generative models offer the ability to accelerate protein engineering for therapeutic and biological applications. Recently, a family of generative models called diffusion models has demonstrated the potential for unprecedented capability and control in de novo design. In this talk, we introduce biologically-grounded diffusion models for generation of protein structures and sequences.

We first share work in creating a new diffusion-based generative model that designs protein structures by mirroring the biophysics of the native protein folding process. To expand beyond the subset of protein biology captured in structural data, we reasoned that sequence – not structure – could serve as a universal design space for protein generation. We thus developed a general-purpose diffusion framework, EvoDiff, that combines evolutionary-scale data with the distinct conditioning capabilities of diffusion models for controllable protein design in sequence space alone. We envision that these modeling frameworks will enable new capabilities in protein engineering towards programmable, functional design.

MIA Talks Search