New type of map connects the dots in cellular reprogramming

A new use of an old mathematical method analyzes a massive single-cell RNA sequencing experiment to explore how cells move from one state to another.

Susanna M. Hamilton, Broad Communications
Credit: Susanna M. Hamilton, Broad Communications

Single-cell RNA sequencing (scRNA-seq) shows which genes an individual cell is expressing at a given moment, and can deliver an enormous amount of data on how cells develop over time.

However, scRNA-seq destroys cells, so scientists cannot precisely trace the path a cell takes as it moves from one state to another. As a result, there is still a great deal we do not know about, for example, how cells transform during normal embryonic development, or when they are reprogrammed from a mature to a stem-cell-like state.

Seeking to fill these knowledge gaps, Broad scientists have leveraged a powerful mathematical method called “optimal transport” to create a framework dubbed Waddington-OT. They subsequently used this approach to predict how populations of cells transition from one state to another in a massive scRNA-seq time-course study of stem cell reprogramming.

This work provides both new analytic capabilities and a vast cornucopia of developmental data to the biology community.

"In developmental biology, we want to be able to understand the origins and fates of cells at every stage of development, and recognize the regulatory programs that control those fates," said Geoffrey Schiebinger, a postdoctoral fellow with Broad core institute member and Klarman Cell Observatory director Aviv Regev. "By stitching snapshots of data together into movies, optimal transport helps reveal how fine details of developmental processes unfold."

Schiebinger is one of the co-first authors on a paper presenting the work in Cell, along with Jian Shu, a postdoctoral fellow with Broad Institute president and founding director Eric Lander; Regev lab postdoctoral fellow Marcin Tabaka;  and Lander and Regev lab graduate student Brian Cleary.

"Previously, there was no large-scale scRNA-seq roadmap for the reprogramming process," said Shu, who also works with Whitehead Institute biologist Rudolf Jaenisch. "Now we are providing two resources, a new method for analyzing this reprogramming process and the first high-resolution lab data for the process."

"Single-cell RNA sequencing is enormously powerful, both in the unprecedented amount of detail it can reveal about a developmental process and its high resolution in time," Regev said. "Making sense of it all not only requires creating new mathematical approaches from scratch, but also that we look back to see how we can adapt mathematical innovations from the past in new ways."

Finding the "optimal" paths

Waddington's "epigenetic landscape." (Credit: Waddington, CH. The Strategy of the Genes. 1957.)


The classic metaphor for how cells move forward into specific lineages is an "epigenetic landscape" proposed by British biologist C.H. Waddington in 1957, where cells roll like marbles down a ski slope with ridges and valleys.

Waddington’s metaphor does not, however, trace the probable paths a group of cells might take as they mature. Which cells at the top of the slope give rise to the cells in each valley?

"All we can see are the cells that exist at one time point, and then the cells that exist at the next time point," explained Schiebinger, who is also a postdoctoral fellow with Philippe Rigollet in Massachusetts Institute of Technology's Center for Statistics, and also works closely with Lander. "There are many ways to connect the cells present at the beginning to those at the end."

To predict the connections, Schiebinger, Tabaka, Cleary, and their colleagues turned to optimal transport. This mathematical technique was first explored in the 1780s by a French mathematician, Gaspard Monge, to calculate the most efficient ways for soldiers to move dirt while building fortifications; Napoleon made use of the method during his campaign in Egypt. Schiebinger and his colleagues reasoned that they could use it to see how cells find the optimal developmental routes open to them.

By incorporating cell growth and death into the equations, the scientists adapted optimal transport for biology to create the Waddington-OT framework. "It looks for the simplest possible explanation of the data," Schiebinger said. "To the best of our knowledge, this may be the first application of optimal transport in biology."

In this short video, Geoff Schiebinger explains how 18th century math can help unearth new insights in biology.

Tracking time points around the clock

Shu and others on the team applied Waddington-OT to a massive scRNA-seq study that traced the process of how mature cells, in this case mouse fibroblasts, are reprogrammed into what are called induced pluripotent stem cells (iPSCs).

The study involved two separate reprogramming experiments. Initially, the team collected samples every 48 hours over 16 days, generating about 65,000 scRNA-seq gene expression profiles. When this gave promising results, Shu and his collaborators collected a second, super-high-density time course, sampling every 12 hours over an even longer period of 18 days. Overall, they collected over 315,000 profiles, by far the largest study of its kind.

Applying Waddington-OT to the resulting dataset, the investigators found that cellular reprogramming unleashes a much broader range of developmental programs and state changes than scientists had previously known about. A day and a half into the reprogramming process, for instance, they saw that the cells started to divide into two main groups: one that gave rise to stroma-like cells (supportive structural and connective cells), and another that underwent what is called a mesenchymal-to-epithelial transition and gave rise to cells resembling iPSCs, neurons, and placental cells.

In addition, they saw that those early fates were not necessarily fixed: some cells that started developing in one of the main groups later shifted to the other.

Waddington-OT also highlighted many stages at which different subclasses bloomed and dwindled, and even revealed genomic aberrations in some cell types.

The team conducted follow-up experiments to test two of Waddington-OT’s predictions, examining how adding a transcription factor called Obox6 and a cytokine called GDF9 to the reprogramming cells might affect reprogramming efficiency. As Waddington-OT suggested, both compounds enhanced stem-cell proliferation, showing that the framework could reveal opportunities to improve the reprogramming process.

Transport mapping for all

A Waddington-OT visualization of the iPS cell reprogramming process, from mouse fibroblasts (lower right) to stem-like cells (upper left). (Credit: Schiebinger G, Shu J, Tabaka M, Cleary B, et al. Cell. 2019.)


The research team is making both their data and a downloadable interactive Waddington-OT viewer freely available to the research community.

"With the viewer we can visualize the descendants of cells over time, and then their descendants," Schiebinger said, "and visualize these trajectories forwards and backwards. Anyone can type in a favorite gene and see the gene expression pattern for that gene.

"Moreover, it’s possible to do this for signatures of biological pathways such as cell proliferation, or any other process," he added.

"It’s wonderful that a 230-year old mathematical method for rearranging dirt and a five-year old experimental method for studying cells can combine to help provide a method to explore the frontiers of developmental biology,” Lander said.

Support for this work came from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, the National Institute for Mental Health, National Institute of Neurological Disorders and Stroke, the Klarman Family Foundation, and other sources.

Paper(s) cited

Schiebinger G, Shu J, Tabaka M, Cleary B, et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell. Online January 31, 2019. DOI: 10.1016/j.cell.2019.01.006.