Causal representation learning of genetic perturbations: identifiability and combinatorial extrapolation

Jiaqi Zhang

Causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. Such a representation is identifiable if the latent model that explains the data is unique. Inspired by single-cell assays and CRISPR experiments, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. When the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. We here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness.

Apart from identifiability, our results also guarantee that we can predict the effect of unseen combinations of interventions, in the limit of infinite data. In the finite data regime, we implement our causal disentanglement framework by developing a scalable algorithm based on autoencoding variational Bayes. We discuss this framework in the context of single-cell biology, showing that it can be used to extrapolate to predict combinatorial perturbation effects.

MIA Talks Search