Determinants of base editing outcomes from target library analysis & machine learning

Max Shen
Liu Group, Regev Lab, Broad Institute; Dept. of Computational and Systems Biology, Massachusetts Institute of Technology
Determinants of base editing outcomes from target library analysis and machine learning|

Base editing is a powerful and popular genome editing approach that enables targeted conversion of single genomic nucleotides with high efficiency and purity, with broad scientific and therapeutic applications. However, designing base editing strategies has often relied on brute force or heuristics instead of quantitative, principled, or data-driven approaches. Design challenges include navigating a landscape of dozens of developed base editors with distinct activities, a lack of comparative studies beyond a handful of targets, and designing around bystander editing of nearby nucleotides which can frustrate applications requiring single-nucleotide precision. In this talk, we discuss our work in systematically characterizing 11 base editors at 40,000 target sequences enabling a fine-grained and detailed view of base editing activity. We design and train BE-Hive, a suite of machine learning models that accurately predict editing efficiency and editing patterns using a deep conditional autoregressive model to solve a sequence to sequence problem. We uncover sequence determinants of rare transversion base editing outcomes, and discuss how we use our trained machine learning models to achieve highly efficient and pure correction of pathogenic transversion SNPs, a class of disease-related alleles not previously thought to be correctable with high efficiency or purity by canonical base editors.