Marks Lab, Harvard Medical School

Antibodies are valuable tools for molecular biology and therapeutics because they can detect low concentrations of target antigens with high sensitivity and specificity. The increasing demand for and success with rapid and efficient discovery of novel antibodies and nanobodies using phage and yeast display methods have spurred interest in the design of optimal starting libraries. Synthetic libraries often contain a substantial fraction of non-functional proteins because current library construction methods lack higher-order sequence constraints. In order to overcome these limitations, we can design smart libraries of fit and diverse nanobodies by leveraging the information in sequences from natural repertoires and experimental assays. However, state-of-art generative models rely on sequence families and alignments, and alignment-based methods are inherently unsuitable for the statistical description of the variable length, hypermutated complementarity determining regions (CDRs) of antibody sequences, which encode the diverse specificities of binding to antigens. We developed a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments. By training on natural nanobody repertoires, we designed and tested a 105-nanobody library that shows better expression than a state-of-art, 1000-fold larger synthetic library. While natural repertoires contain examples of generally fit sequences, experimental assays can explicitly interrogate individual fitness features such as thermostability, poly-reactivity, and affinity, from which we can train statistical models and generate sequences optimized for each trait. With sequence models of both unlabeled natural repertoires and labeled experimental data, we can design a biased nanobody library to improve expression, stability, and capacity to bind target antigens.

MIA Talks Search