A marriage of microscopy and machine learning
Anne Carpenter and her lab are helping to transform cell imaging into a data science.
With a dish of cells as a canvas, Anne Carpenter’s collaborators apply layers of color. Each one highlights a different cellular feature: A fluorescent blue dye to stain the nuclei. Orange to label the cell membranes. Red tints on the mitochondria.
This approach, called “Cell Painting,” uses six biological dyes to stain eight major cell structures. Together, they create not just beautiful images, but also a detailed portrait of the cells’ size, shape or morphology, and—if you can read the signs—physiological state. Seeing and interpreting those clues about cellular function is nearly impossible for even the best observer. But where human eyes may struggle to see subtle differences in hundreds of features across thousands of cells, computers can excel—not least because machines are unburdened by hopes, expectations, or assumptions about what the data might, or might not, reveal.
Human cells imaged using Cell Painting are treated with fluorescent dyes that mark different cellular components including the nucleus, nucleoli, cytoplasmic RNA, Golgi, endoplasmic reticulum, mitochondria, actin cytoskeleton, and plasma membrane. Credit: Carpenter lab
Researchers in the Carpenter lab and Stuart Schreiber's group, both at the Broad Institute of MIT and Harvard, started developing Cell Painting in 2009. And so began Carpenter’s journey towards using computer vision and machine learning to turn microscope images into rich data sources. “Even though we use only six stains, the response of the cell as reflected in those six stains contains a tremendous amount of information,” said Carpenter, an institute scientist and senior director of the Imaging Platform at the Broad.
The efforts of Carpenter and her lab to make Cell Painting openly accessible and easy to use for biologists without a background in data science are bringing large-scale image-based cell profiling to a growing number of researchers in a range of fields spanning cancer biology, toxicology, and drug discovery. Cell Painting has helped researchers find drug candidates faster, learn about the effects of chemicals in the environment, and gain insight into how cells function. “Images as a readout are extremely flexible in the kinds of things that we are able to detect,” Carpenter said. “There aren't many aspects of biology that can't be captured using imagery.”
“I've always been enamored by the morphology of cells. It's what got me interested in biology in the first place,” Carpenter said. As a postdoctoral fellow during the mid-2000s in David Sabatini’s lab at the Whitehead Institute for Biomedical Research, she realized she needed software that could reliably quantify cell features from microscopy images. Nothing suitable existed, she said, so she started writing programs herself and quickly discovered that she loved both software engineering and distilling quantitative data from qualitative images. While in Sabatini’s lab, she began developing the open-source software programs CellProfiler and CellProfiler Analyst, which have since been cited in more than 8,000 publications.
“Once I got into the computational world, I realized, wow, we could measure all kinds of things from cells that would report on the state of the cell and how the cell is responding to its environment,” she said. Carpenter wanted to turn these measurements into robust datasets that could be systematically collected and analyzed.
When she joined the Broad Institute in 2007, she deliberately created a space focused on melding biological imaging and advanced data science techniques. “I saw a real need for a lab that was focused fully on getting the most data possible out of those images,” she said.
Carpenter has worked with Shantanu Singh (left), senior group leader with the Broad Institute’s Imaging Platform, and many others to develop the computational methods needed to analyze Cell Painting images. Credit: Bearwalk Cinema
Cell Painting is based on the idea that environmental or biological conditions, such as disease, a genetic tweak, or exposure to a drug or toxin leads to changes in a cell’s morphology that can be captured by a microscope. Researchers grow batches of cells in multi-well plates, treat each well with a drug or other specific perturbation, then apply the six stains and image the cells with a fluorescent microscope. Automated image collection and analysis can extract and measure more than 1,500 morphological features—for example, organelle size and shape, texture, staining intensity, and so on—to create a profile for each cell that captures the effect of the perturbation.
Comparing the profiles of cells that have received different treatments can highlight similarities or differences in how each drug, disease, or other change affects the cells.
In theory, researchers could make these comparisons by eye. But given the high level of inherent variability between cells, a person comparing two cells—never mind hundreds or thousands—would have trouble identifying tiny but consistent differences.
Instead, Carpenter’s team enlists machine-learning algorithms to evaluate the profiles and cluster them based on similar patterns of changes that the algorithms detect in the features. This clustering can help distinguish healthy from diseased cells, identify drugs that restore diseased cells to a healthy state, or group genetic changes that have similar effects.
The researchers are using “unsupervised” machine learning, which means that they don’t teach the computer algorithms ahead of time what to look for, but simply ask the computer to cluster the samples based on which look most similar. This approach allows the strongest biological signals to emerge without a researcher’s bias toward their favorite gene or molecular pathway, Carpenter said.
IMAGING FOR ALL
Since developing the method, which was published in 2013, Carpenter and her team have worked to make image-based profiling as accessible as possible, especially for researchers who may not have experience with data science or machine learning. The software for Cell Painting is open-source and the assay deliberately uses common, inexpensive biological dyes that can be visualized with a conventional fluorescence microscope.
A follow-up paper in 2016 caught the eye of Joshua Harrill, a researcher with the U.S. Environmental Protection Agency’s Center for Computational Toxicology and Exposure in Durham, N.C. With a background in high-content imaging of neurons, he was immediately drawn to the technique.
Harrill and Johanna Nyffeler, an Oak Ridge Institute for Science and Education-supported postdoctoral fellow in his research group, are now using Cell Painting for toxicology screens to learn how chemicals in the environment affect different human cell types. Nyffeler said Cell Painting is ideal for her work because she doesn’t always know what those effects will be. “A lot of these chemicals don't necessarily have a specific target in human cells, unlike drug compounds,” Nyffeler said. She hopes that profiling will give a better picture of what types of biological effects she and her collaborators may need to look for and how those effects vary with concentration levels.
Human U2OS cells expressing the mutant form of a cancer-related protein are stained with the Cell Painting dyes. Each of the nine sections of the image shows a different combination of fluorescent markers (Rohban et al, eLife 2017 DOI: 10.7554/eLife.24060). Credit: Broad Institute’s Cancer Program and Center for the Development of Therapeutics, and the Carpenter lab
Drug developers are also attracted to Cell Painting. Carpenter and the Broad are partnering with a consortium of pharmaceutical companies to create a publicly available dataset containing profiles of cells treated with one of around 100,000 chemical compounds. Researchers will use machine learning algorithms to compare the profiles and learn what molecular targets potential drugs might be hitting. ”You might see a cluster of 10 drugs grouping together,” Carpenter said. “It may be that those drugs all have the same molecular target and so that tells us something about how those drugs are working.” Such comparisons might help identify new drug targets, explore gene function, or elucidate molecular pathways underlying disease.
One biotech company, Utah-based Recursion Pharmaceuticals, was an early adopter of Cell Painting to accelerate drug discovery and development. After just six years of using the technique, the company already has two drug candidates in early-stage clinical trials. (Carpenter is a scientific advisor for Recursion.)
The company used the method to screen their library of compounds and find ones that restored a healthy appearance in cells carrying the disease mutation of interest. Image-based profiling allows the company to work on more diseases and screen molecules more quickly and at lower cost than they might with traditional screening methods, said Ron Alfa, Recursion’s senior vice president of translational discovery.
Alfa credits Cell Painting’s unbiased approach with helping his team look for treatments even for conditions where the biology is not yet well understood.
“Humans can look at how the cells are going to change, but we don't really know what to pay attention to,” Alfa said. “Instead we can let the computer algorithms tell us what to pay attention to. And similarly we let them tell us what things are being reversed when we add the right drugs.”
Carpenter’s lab is now combining Cell Painting with other techniques, such as approaches that can alter gene expression. “One experiment we would really like to do is to knock out every gene in the genome, one by one, using CRISPR, and similarly overexpress every gene in the genome, and see what the impact is on cell morphology,” Carpenter said.
The goal would be to identify the effect of each gene on the cell’s shape or activity — in essence, creating a morphological atlas of functional genomics that researchers could reference as a baseline in their studies of the effects of mutations, drugs or other stressors on cells. “Our aim over the next few years is to make a publicly available dataset that contains as many genetic and chemical perturbations as we can,” Carpenter said.
Maria Alimova (right), senior research scientist at the Broad, works with her colleagues at the Broad’s Center for the Development of Therapeutics to capture imaging data using the Cell Painting method. Credit: Bearwalk Cinema
Carpenter’s team is also taking Cell Painting’s machine learning component a step further, by analyzing cell images using a type of machine learning called deep learning. A deep learning-based method extracts cellular features directly from the raw pixels of the image rather than relying on researcher-defined features such as size and shape.
An advantage is that the method is not limited by what scientists define as meaningful and so could capture new or little-known features, Carpenter said. But that lack of definition also makes the results challenging to interpret. “Those features are not so readily understood by a biologist, but they have the potential to represent the image in a very objective and hopefully very biologically informative way,” she said. “That's the cutting edge of the field right now.”
According to Carpenter, similar profiling methods could likely be applied to other types of biological images in the near future. Different dyes could be used to measure gene or protein expression levels in cells, or scientists could adapt Cell Painting to image larger sections of tissue or whole organisms. Even 3D imaging or video could be classified based on extracted features. Most aspects of biology can be captured using images, she said. “We want images to become as computable as the genome.”
Gustafsdottir, SM, Ljosa, V, et al. Multiplex cytological profiling assay to measure diverse cellular states. PLoS ONE. 2013. DOI: 10.1371/journal.pone.0080999
Bray, M, Singh, S, Han, H, et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nature Protocols. 2016. DOI: 10.1038/nprot.2016.105
Anne Carpenter shows that, in biomedicine, a picture can be worth a million data points