You are here

Multimodal single-cell data, open benchmarks, and a NeurIPS 2021 competition

Jonathan Bloom
Cellarity

Alexandra-Chloé Villani
Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital; Harvard Medical School; Broad Institute

Angela Pisco
Chan Zuckerberg Biohub

Daniel Burkhardt
Cellarity
Multimodal single-cell data, open benchmarks, and a NeurIPS 2021 competition

Single-cell measurement of chromatin accessibility (DNA), gene expression (RNA), and proteins has revealed rich cellular diversity across tissues, organisms, and disease states. However, single-cell data poses significant modeling challenges: datasets are high-dimensional in both observations and features with complex sparsity; biological signals are mixed with donor and technical batch effects; and ground truth is scarce relative to other fields where machine learning has shined. Here we leverage recent advances in multi-modal single-cell technologies which, by simultaneously measuring two layers of cellular processing, provide ground truth analogous to language translation. We formalize tasks to predict one modality from another and learn integrated representations of cellular state. We also generate a novel dataset of the human bone marrow specifically designed for benchmarking methods. The dataset and tasks are accessible through an open-source framework that facilitates centralized evaluation of community-submitted methods, and form the basis for a competition at NeurIPS 2021 (openproblems.bio/neurips).