Computational Biology and Data Science group, Vesalius Therapeutics

The high proportion of zeros in typical scRNA-seq datasets has led to widespread but inconsistent use of terminology such as "dropout" and "missing data". Here, we argue that much of this terminology is unhelpful and confusing, and outline simple ideas to help reduce confusion. These include: (1) observed scRNA-seq counts reflect both true gene expression levels and measurement error, and carefully distinguishing these contributions helps clarify thinking; and (2) method development should start with a Poisson measurement model, rather than more complex models, because it is simple and generally consistent with existing data. We outline how several existing methods can be viewed within this framework and highlight how these methods differ in their assumptions about expression variation. We also illustrate how our perspective helps address questions of biological interest, such as whether mRNA expression levels are multimodal among cells.

MIA Talks Search