Wengong Jin planned to research language processing for his computer science PhD. But when Jin learned about research on machine-learning for drug discovery at the MIT Computer Science and Artificial Intelligence Laboratory, he told his advisor, Regina Barzilay, that he’d had a change of heart.
“She thought I was jet lagged, because I’d just come over from China and that was a really big switch,” he said.
Jin, now a fellow at the Eric and Wendy Schmidt Center (EWSC), stayed the course. Six years later, he and a team of researchers have come up with a new kind of model to automatically design antibodies — holding huge potential for immunotherapy.
Meanwhile, another EWSC fellow, PhD candidate Adit Radhakrishnan, recently developed a simple yet powerful method for virtually screening new drug candidates. That framework appears in a study published on April 11 in Proceedings of the National Academy of Sciences. Jin and Radhakrishnan are among ten graduate student and post-doctoral fellows at the EWSC — a recently launched center at the Broad Institute of MIT and Harvard.
“A number of research institutes have started using machine-learning to answer key questions in biology. But at the EWSC, as Jin’s and Radhakrishnan’s research shows, our goal is also going in the other direction, by using biomedical problems to drive advances in machine-learning,” said Caroline Uhler, Co-Director of the EWSC, a core member of the Broad Institute, and Associate Professor in the Department of Electrical Engineering and Computer Science and the Institute for Data, Systems and Society at MIT.
Game-changer for antibody design
Discovering drugs has traditionally been a labor-intensive process, with researchers toiling away for years to test millions of molecules only to come up with a handful of candidates. Now, researchers like Jin and Radhakrishnan are working on automating that process.
“The idea is that we don't need experts to get a cup of coffee and then work all night trying to figure out a new molecule, but rather, to let the machine do the heavy lifting,” Jin said.
During his PhD, Jin was part of a research team that developed a new machine-learning algorithm to speed up antibiotic discovery. The researchers found a new antibiotic that was effective against bacteria that are resistant to multiple drugs. In this instance, the team provided the model with roughly a million possible compounds to sort through.
That left Jin and other researchers wondering: Could they use artificial intelligence to design molecules from scratch?
The answer was yes. Jin and other researchers developed a generative model that designed antibodies — Y-shaped proteins that bind to viruses, bacteria, and other pathogens, activating our bodies’ immune response — that can neutralize the SARS-CoV-2 virus. Their findings were published earlier this year in a paper at the International Conference on Learning Representations.
It used to take researchers hours to manually design just one antibody that might not even work. “The new model can propose in a couple of seconds an antibody that has a high likelihood of working — totally changing the game,” said Jin.
While researchers had worked on generative models for antibody discovery before, those models could only come up with a protein’s amino acid sequence — not its shape. In contrast, the new model, which represents the antibody as a graph, simultaneously designs both the sequence and structure of its binding region. “Whether or not the antibody is the right shape to bind to a virus or other pathogen is crucial to its success,” said Jin.
"While human experts have methods to generate neutralizing antibodies, it takes time and effort. The task becomes even more challenging when additional properties need to be enforced. As our understanding of disease biology and immune system deepens, the number of such desired characteristics will continue to grow. Computational methods for antibody design are particularly useful to address this challenge,” said Regina Barzilay, the AI faculty lead for the MIT Jameel Clinic for Machine Learning in Health.
And, because so many types of data are structured as networks, the model also represents an advance in the field of machine learning. “It’s an example of how biology proposed a new problem for machine learning to solve,” said Jin.
An old machine-learning method repurposed for virtual drug screening
Adit Radhakrishnan's father had pursued mathematics in India prior to immigrating to the U.S. He instilled in his son a love of mathematics, which led the younger Radhakrishnan to pursue a PhD of his own in electrical engineering and computer science at MIT.
Radhakrishnan researches the fundamentals of deep-learning — a kind of artificial intelligence modeled after the human brain that processes unstructured data. Understanding why deep-learning is successful, and using that knowledge to build novel models for the healthcare and genomic space underpin much of Radhakrishnan’s research as an EWSC fellow.
Over the past few years, deep-learning has become widely adopted in biological applications, and researchers are increasingly turning to it to screen potential new drugs. In order to perform well on such tasks, researchers use extremely large deep-learning models that often require significant computing power. Moreover, the complexity of this approach makes it hard for scientists to understand why these models make a given prediction.
To get around the complexities of deep-learning, Radhakrishnan and other researchers, including Uhler and Mikhail Belkin, a professor at the Halıcıoğlu Data Science Institute at the University of California, San Diego, turned to an older class of machine learning models: kernel methods. Prior to the recent wave of deep-learning, kernel methods were a prominent and computationally simple approach for machine learning tasks. These models have recently become popular again since they can serve as a proxy for using very large deep-learning models without the computational burden.
The team came up with a simple yet highly adaptable kernel framework that was able to predict the effect that a drug has on gene expression, which is a measure of how cells change in response to a drug. “In contrast to the expertise needed to train large deep-learning models to solve a particular problem, it takes about three lines of code to train the kernel method to do this,” said Radhakrishnan.
The framework has uses beyond biology; the researchers demonstrated, for example, that it could be used by video streaming providers to predict how a viewer would rank a particular movie they hadn’t yet seen. And the framework allows researchers to gain insights into how more complex deep-learning models function.
According to Radhakrishnan, who is not trained as a biologist, the best part of being a fellow at the EWSC is that the center puts machine learning experts and biologists in constant conversation with each other.
“You don’t just have computational researchers running their methods on a biology data set without a biologist in the mix. You can get continuous feedback on: Is this actually useful?” said Radhakrishnan. “So it gives you a much more guided focus on what biological problems are important and what computational methods are missing.”
The Eric and Wendy Schmidt Center brings together a global network of scientists from academia and industry to promote interdisciplinary research between the data and life sciences to transform biology and ultimately improve human health.