Capturing cell heterogeneity in representations of cell populations for image-based profiling using contrastive learning.

PLoS computational biology
Authors
Abstract

Image-based cell profiling is a powerful tool that compares perturbed cell populations by measuring thousands of single-cell features and summarizing them into profiles. Typically a sample is represented by averaging across cells, but this fails to capture the heterogeneity within cell populations. We introduce CytoSummaryNet: a Deep Sets-based approach that improves mechanism of action prediction by 30-68% in mean average precision compared to average profiling on a public dataset. CytoSummaryNet uses self-supervised contrastive learning in a multiple-instance learning framework, providing an easier-to-apply method for aggregating single-cell feature data than previously published strategies. Interpretability analysis suggests that the model achieves this improvement by downweighting small mitotic cells or those with debris and prioritizing large uncrowded cells. The approach requires only perturbation labels for training, which are readily available in all cell profiling datasets. CytoSummaryNet offers a straightforward post-processing step for single-cell profiles that can significantly boost retrieval performance on image-based profiling datasets.

Year of Publication
2024
Journal
PLoS computational biology
Volume
20
Issue
11
Pages
e1012547
Date Published
11/2024
ISSN
1553-7358
DOI
10.1371/journal.pcbi.1012547
PubMed ID
39527652
Links