New machine learning tool predicts base editing outcomes

BE-Hive can tell which base editor performs best to repair thousands of disease-causing mutations

Susanna M. Hamilton, Broad Communications
Credit: Susanna M. Hamilton, Broad Communications

Researchers at the Broad Institute of MIT and Harvard and Harvard University have invented a tool to identify which base editors are most likely to achieve desired genetic edits. By analyzing experimental data on the single-letter changes made by 11 of the most popular base editors (BEs) at more than 38,000 target sites in human and mouse cells, the team created a machine learning model that accurately predicts which edits each of the base editors will make. 

The new tool, called BE-Hive, is available for public use as a web app and described this month in Cell.

Base editors are molecular machines that can switch out one base, or letter, for another in the genome. They are a promising way to treat genetic diseases in humans, and researchers have developed many new and improved versions of base editors. But the base editor boom comes with a challenge: Scientists can sink huge amounts of time into searching for the best editor to solve genetic malfunctions like those that cause sickle cell anemia or progeria.

“New base editors come out seemingly every week,” said senior author and Broad core institute member David Liu, Richard Merkin Professor and director of the Merkin Institute of Transformative Technologies in Healthcare at the Broad Institute, a professor of chemistry and chemical biology at Harvard University, and a Howard Hughes Medical Institute investigator.. “The progress is terrific, but it leaves researchers with a bewildering array of choices for what base editor to use.”

“If you set out to use base editing to correct a single disease-causing mutation,” said Mandana Arbab, a postdoctoral fellow in the Liu lab and co-first author on the study, “you’re left with a mountain of possible ways to do it and it is difficult to know which ones are most likely to work.”

Now, researchers can use the interactive BE-Hive web app to enter a target DNA sequence and see the predicted edits that the 11 different base editors will create on that target. “BE-Hive predicts, down to the individual DNA sequence level, what will be the distribution of products that results from each of those base editors acting on that target site,” said Liu.

Teasing apart editing activity

Base editors offer some advantages compared to other forms of gene editing, but they can still cause unwanted, often unpredictable, edits outside the intended genetic target. Each editor also has its own eccentricities. Different types operate within smaller or larger editing “windows,” stretches of DNA about two to five letters wide. Some editors might overshoot or undershoot their targets; others might change just one of two As in a given window.

“If the sequence within the window is GACA,” Liu said, “and you’re using an adenine base editor to change one of those As, will one be preferentially edited over the other?”

The answer depends on the base editor, its paired guide RNA — the chaperone that ferries the editor to the appropriate DNA work site — and the surrounding DNA sequence. To corral all these complicating factors, the team first collected a massive amount of data. Over about a year, Arbab said, she and her colleagues equipped cells with over 38,000 DNA target sites and then treated them with the 11 most popular base editors, paired with guide RNAs. After the treatment, they sequenced the DNA of the cells to collect billions of data points on how each base editor impacted each cell.

To analyze this bounty, co-first author Max Shen, a PhD student at the Massachusetts Institute of Technology’s Computational and Systems Biology program, designed and trained a machine learning model to predict each base editor’s particular eccentricities. This enabled the model to be able to tell which edits each base editor would make at a particular target site. (In a previous study, Shen and his lab mates trained a different machine learning model to analyze data from another common gene editing tool, CRISPR-Cas9, and dispelled a popular misconception that the tool yields unpredictable and generally useless insertions and deletions, Shen said. Instead, they showed that even if humans can’t predict where those insertions and deletions occur, machine learning could.)

New base editing properties

The development of BE-Hive also produced more than a catalog of outcomes; the machine learning model revealed new and surprising properties and capabilities of the analyzed base editors.

“Sometimes,” Liu said, “for reasons that our primate brains aren't sufficiently sophisticated to predict, the model could accurately tell us that even though there are two Cs right in the editing window, this particular editor will only edit the second one, for example.”

BE-Hive learned when base editors can make so-called transversion edits: Instead of changing a C to a T, some base editors changed a C to a G or an A — rare and abnormal, but potentially valuable, quirks. The researchers then used BE-Hive to correct 174 disease-causing transversion mutations with minimal byproducts. And, they used BE-Hive to discover unknown base editor properties, which they used to design novel tools with new capabilities, adding a few more genetic tools to the ever-growing toolbox.

This research was funded in part by the National Institutes of Health, St. Jude Collaborative Research Consortium, the Howard Hughes Medical Institute, the National Human Genome Research Institute, an NWO Rubicon Fellowship, an NSF Graduate Research Fellowship, and a Marion Abbe Fellowship of the Damon Runyon Cancer Research Foundation.

Additional authors on the paper include Beverly Mok, Christopher Wilson, Żaneta Matuszek, and Christopher Cassa.

This story adapted from Harvard University, Department of Chemistry and Chemical Biology.

Paper(s) cited

Arbab M, Shen MW, et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell. Online June 12, 2020. DOI: 10.1016/j.cell.2020.05.037