NNF Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
Our group has recently shown how variational autoencoders (VAEs), a deep-learning-based generative model, can be leveraged to provide insights into the complex interplay and relationships present in large biological datasets (namely, namely, associations between drugs and omics profiles retrieved from newly diagnosed T2 diabetes patients). In this primer, we will take a deep dive into our MOVE (multi-omics variational autoencoders) pipeline (Allesøe et al., Nature Biotechnology, 2023). We will start by briefly reviewing the steps of data pre-processing and model optimization, which should allow us to generate a model that can compress and integrate multi-modal data (both categorical and continuous variables, such as clinical measurements, microbiome census data, transcriptomics, proteomics, diet and lifestyle records) into meaningful latent space. Next, we will focus on two approaches we followed on the method we devised to determine the associations between omics variables and categorical labels (such as drug intake). After perturbing the original dataset, we inspected our model’s output and identified significant differences between the baseline and perturbed results through two approaches. In one approach, we rely on univariate statistical methods and ensemble modeling, whereas, in another approach, we draw from Bayesian decision theory. Finally, we discuss the outlook of our pipeline and the forthcoming improvements and additions we are working on.