Single-cell analysis in the age of LLMs
Assistant Professor,
Dept. of Computer Science & Dept. of Int. Medicine,
Yale University
In this talk, I will argue that biology itself operates like a language, where systems like the immune response communicate through combinatorial interactions, much like words forming sentences. I will present recent work from our lab, starting with CINEMA-OT, a causal inference method applied to combinatorial cytokine stimulation, revealing nonlinear interactions between cytokines. I will then focus on Cell2Sentence, a project that transforms single-cell data into 'cell sentences' to train LLMs for generating and predicting cellular behaviors. Finally, I will briefly discuss CaLMFlow, where LLMs are adapted to model continuous systems, highlighting their versatility beyond discrete language tasks. Together, these projects illustrate how LLMs are advancing single-cell analysis and biological research.
Relevant Resources:
- CINEMA-OT: https://www.nature.com/articles/s41592-023-02040-5
- Cell2Sentence: https://www.biorxiv.org/content/10.1101/2023.09.11.557287v3
- CaLMFlow: https://arxiv.org/abs/2410.05292