NNF Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark

Unsupervised machine learning is a powerful technique for learning patterns in large data
sets. In this talk, I will present my group's journey into developing and applying VAEs for
analysis of various types of multi-omics data. First, I will describe our work on integrating
microbiome derived data for identification and reconstruction of bacterial and viral genomes
in metagenomics data (Nissen et al., Nature Biotechnology, 2021; Johansen et al., Nature
Communications, 2022). Second, I will present how we used VAEs for data-driven
stratification of major depressive disorder (MDD) and schizophrenia (SCZ) for a large cohort
of 42,000 individuals integrating genotype and multiple registry data (Allesøe et al., Science
Advances, 2022). Finally, I will describe how we integrate patient level multi-omics data,
extensive clinical characterization, diet, accelerometry and medication data from a Type 2
Diabetes cohort (Allesøe et al., Nature Biotechnology, 2023). Our framework (MOVE) can
integrate these to a meaningful latent representation, is resistant to missing data and able to
identify cross modality associations. To achieve this, we used virtual perturbations, similar to
gendankenexperiments, of an ensemble of trained models, to estimate the effect of one
feature across the omics data. We use this to identify drug-omics associations, compare
predicted drug-omics responses, and estimate the overall effect of each drug in across omics

MIA Talks Search