A collage showing 3D models of AAV

How one lab uses machine learning to solve a key gene therapy problem

A bioengineer and a computer scientist team up to find better adeno-associated viruses for gene delivery.

When Ben Deverman joined the Broad Institute of MIT and Harvard in 2018, he was tackling a longstanding challenge in his research. Deverman had spent years at CalTech building a technology that could quickly screen large numbers of inactivated adeno-associated virus (AAV) — viral vectors that don’t cause disease but are engineered to deliver potentially life-changing gene therapies to specific cells in the body. 

Deverman’s technology could find AAVs that can cross the blood-brain barrier and deliver their therapeutic cargo to the brain — a potential way to treat certain neurological disorders. But it had only identified AAVs that work in mice. To find AAVs that can deliver gene therapies in humans and to specific organs in the body such as the brain, Deverman knew he needed a new approach. He wanted to harness machine learning — computational tools that can be “trained” to analyze and detect important signals in data — to find AAVs that could one day lead to more and better gene therapies.

It wasn’t until Deverman met and joined forces with Broad researcher and machine learning expert Fatma Elzahraa Eid that the Deverman lab was finally able to work toward this elusive goal. 

Eid was well suited for the job. Before coming to Broad as a postdoc, she had spent her PhD building machine learning models to study viruses and proteins. 

“She just really had the right background to approach this problem in the way it needed to be approached,” said Deverman, senior director of vector engineering at the Stanley Center for Psychiatric Research at Broad.

That approach was to start from scratch. Eid, Deverman, and their colleagues redesigned key experiments to generate data in an unbiased way — data that machine learning algorithms could more easily analyze. Now their four-year long collaboration is paying off. 

In a recent preprint, Eid, Deverman, and their co-authors describe how their new machine learning approach, called Fit4Function, was about 90 percent successful at finding AAVs that have multiple desirable traits, such as the ability to target a specific cell type while avoiding others and to work in more than one species. With a much higher success rate than many traditional methods, the technology could potentially accelerate the development of new gene therapies for more diseases and with fewer side effects. 

The study also demonstrates the benefits of biologists and computer scientists coming together at the beginning of a project and rethinking experiments to make the best use of machine learning models. 

“This is like opening a goldmine,” Eid said. “It’s really, really exciting.”

Viruses for good

Gene therapy and CRISPR-based gene-editing medicines hold tremendous promise to treat, even cure, genetic diseases, by replacing disease-causing genes with functional ones or by correcting pathogenic mutations. A few gene therapies have already received regulatory approval in the United States and Europe. But the field is still dogged by a major challenge: how to effectively and safely deliver therapeutic genes or gene-editing machinery to specific cells and organs in the body. 

AAVs are a commonly used gene-delivery vehicle. Because they are derived from viruses, they are very good at entering cells and delivering their cargo. Engineers strip them of their infective contents and replace them with payloads such as CRISPR, therapeutic genes, or cargos designed to turn off production of disease-causing proteins. 

A portrait of Ben Deverman at the Stanley Center for Psychiatric Research

Ben Deverman and his lab are engineering better viral vectors for the next generation of gene therapies. Credit: Juliana Sohn

However, AAVs don’t reach certain tissues and organs very efficiently and can cause immune responses. So researchers have used large doses of AAVs to get high enough levels of their cargo to the intended cells, which has led to serious adverse effects in gene therapy clinical trials. 

To better target AAVs to specific tissues or organs, researchers can modify them by designing custom capsids — the outer protein shell of the virus. Conventional AAV engineering approaches are laborious and slow with low success rates. Scientists have traditionally screened large libraries of randomly generated AAV variants in mice to identify promising candidates, but the AAVs often don’t work in other species. Even the technology developed by Deverman in 2013 — called Cre Recombinase-based Targeted Evolution (CREATE), which can screen millions of AAVs at a time in just weeks — is still most effective at identifying AAVs that work in mice.   

Moreover, these methods typically optimize AAVs for one desirable function at a time, such as targeting the mouse brain, making it inefficient to engineer an AAV with multiple traits.

When Deverman spoke about these challenges during a talk at Broad in the spring of 2018, Eid was in the audience. She was already brainstorming ways to apply machine learning to tough protein engineering problems. 

Finding Fatma

Eid was born and raised in Egypt, where she studied computer science and systems engineering at Al-Azhar University in Cairo. Halfway through her studies, she started delving into advancements in machine learning. Eid said she and her classmates were “doing machine learning by hand.” With limited machine learning software and libraries, they wrote algorithms from scratch and coded their models with pen and paper.

Eid moved to the United States in 2012 to pursue a PhD at Virginia Tech, where she built machine learning programs to study proteins and viruses. She began exploring how machine learning algorithms, as powerful as they are, can still fail to find new patterns or trends in biomedical data because of inherent biases in the data. 

This was running through Eid’s mind as she sat in the audience during Deverman’s talk at Broad. Eid decided to attend the event only after her Broad colleagues, who knew she had an interest in viruses, convinced her to go. She was finishing up a one-year postdoc at Broad at the time and was looking for important problems to solve at the intersection of proteins and viruses.

Eid was fascinated as she listened to Deverman. She wondered whether he had considered using machine learning, but as she and other scientists went up to talk to him after the presentation, she grew shy.

“I just stood with the crowd, just watching and hoping that someone would ask him if he was interested in using machine learning,” Eid said.

Eid’s friend asked the question, and introduced her to Deverman as an expert. Eid and Deverman met again the following day for two hours, and soon they were spending every Friday afternoon building the foundation for a new AAV screening technology. 

They came up with a plan: create a machine learning pipeline that could automatically sort through millions of capsid sequences on the computer and narrow down the best AAVs that would work in humans and target desired organs, including the brain. The pipeline should still be able to identify AAVs that work in mice, since gene therapies are usually tested in mice before humans. The duo envisioned a system that could be instructed to find AAVs with specific attributes.

“We’re trying to use machine learning to find viruses that have multiple enhanced functions,” Deverman explained. “We’d like them to work in multiple species and work in human models, and we want to be able to manufacture them.”

Once they had their blueprint, the duo, led by Eid, presented their idea to a panel of Broad scientists and core faculty in 2019 as part of a competition that funds collaborative projects within the Broad. Their idea won, and with the $200,000 grant, they started putting together a team of wet lab biologists and computer scientists to build the technology.

Fitting the functions

Eid has spent much of her career thinking about why some machine learning models used in biology haven’t lived up to expectations. She explained to Deverman during their early discussions that many are trained on biomedical data, such as cell or MRI images, that were collected for humans to easily read and understand. This introduces biases into the data that can trip up machine learning models, causing them to focus on the wrong signals and leading researchers astray in their work. 

Deverman and Eid decided that rather than asking machine learning algorithms to interpret existing biased datasets, they should instead design new experiments that generate data specifically for machine learning algorithms to analyze. This meant that Deverman’s group would need to create new “machine-learning friendly” libraries of AAV capsid sequences. 

Ben Deverman examines a sample in his lab with two other lab members working alongside him.

Deverman (middle) with members of his lab Nuria Romero (front) and Ken Chan (back). Credit: Juliana Sohn

Ken Chan and Isabelle Tobey from the Deverman group took on this monumental task, which the team described in their recent preprint. They made multiple copies of these “Fit4Function” libraries in the lab, with help from Simon Pacouret who scaled up AAV production, to generate enough data. They populated the libraries with only sequences that are known to properly form capsids. Then they screened the capsid sequences for a variety of desirable functions, producing the reproducible data they needed to train and build their machine learning models. 

“This massive effort was unlike anything we had done before and was necessary to learn how to make this approach successful,” Deverman said.

The team trained each of their machine learning models to predict capsid sequences for AAVs that could perform a particular function, such as targeting the liver or brain. They then combined six models into one that could simultaneously predict AAVs capable of multiple functions relevant to liver gene therapy. With tireless help from Albert Chen and Alina Chan in the Deverman lab, the group prepared their data and findings for publication.

Eid and Deverman were optimistic that their approach would be able to find desirable AAVs, but Eid kept her expectations in check. 

“Machine learning in biology is very hard, and there are a lot of confounding factors,” she said. “We would have been very happy if we got a 10 percent success rate.”

So it came as a huge surprise when the team found that their models were about 90 percent successful at predicting AAV variants that simultaneously performed multiple desired functions. She thought there had to be a mistake —  a bug in the code or a miscalculated variable somewhere.

It was not until they repeated the calculations multiple times that they realized what they had achieved. Their models enabled them to find AAVs that can deliver cargo to the mouse liver and to human liver cells more efficiently than the natural, unmodified AAV, and this finding has translated to additional species as well. 

Deverman said this work is just the beginning. His lab has built a variety of Fit4Function AAV capsid libraries that they’re using for new applications. 

“We expect this same approach could be used for machine learning-guided engineering of any protein that can be screened in a high throughput manner like nanobodies,” Deverman said. “We will be excited to see others use Fit4Function for their protein engineering objectives.” 

Collaboration from the start

Deverman’s group is continuing to test the most promising AAVs they discovered with their new technology, in hopes of further developing them as vectors to treat a variety of genetic diseases. In addition to the Fit4Function project, the team is using other approaches to develop AAVs that can reach the brain, including engineering the viruses to bind to specific proteins in the blood brain barrier.

A closeup photo shows a technician's hands at the lab bench preparing tissue samples for microscopic analysis.

Nuria Romero, a member of the Deverman lab, is preparing AAV-treated mouse brain sections to examine under the microscope. Credit: Juliana Sohn

The biotech industry is taking notice. In the spring of 2022, the Broad Institute and Deerfield Management Company launched a gene therapy startup named Apertura Gene Therapy, based in part on the technology from the Deverman lab, including the Fit4Function approach. 

The success of the project also shows the power of close collaboration between computer scientists and biologists. “We work as a family,” Eid said. “There’s a very high sense of respect in the lab. The Fit4Function project was a huge team effort. Almost everyone in the lab contributed over the course of the four years of work.” 

Deverman credits Eid as a cornerstone for the lab’s success — without her, he wouldn’t have been able to properly guide the computational team, he said.

“It’s been challenging for me to be a non-computational scientist and trying to build such a computational group, because I can’t necessarily judge the details of their work,” Deverman said. “It’s been great having Fatma in the lab because she has a very good sense for that.”

Eid, meanwhile, said that Deverman’s mentorship and humility helped propel innovation in the lab. 

“He encourages us to ask questions, contribute our ideas, and be open about our mistakes,” Eid said. “The culture that Dr. Deverman has created and maintains in the lab has led to a high level of growth and productivity.”


This research was funded in part by a Broad Shark Tank Award, the Stanley Center for Psychiatric Research, the National Institute of Neurological Disorders and Stroke, the NIH Common Fund through the Somatic Cell Genome Editing program, and Apertura Gene Therapy.