Multitask learning approaches to biological network inference

Dayanne Castro
Bonneau Lab, NYU
Multitask learning approaches to biological network inference: linking model estimation across diverse related datasets

Abstract:  Due to increasing availability of biological data, methods to properly integrate data generated across the globe become essential for extracting reproducible insights into relevant research questions. We developed a framework to reconstruct gene regulatory networks from expression datasets generated in separate studies — and thus, because of technical variation (different dates, handlers, laboratories, protocols etc…), challenging to integrate. In this talk, I will introduce how we currently learn regulatory networks from gene expression data, and then, how we extend our methods to learn multiple networks from related datasets jointly through multitask learning. In particular, our method aims to be able to detect weaker patterns that are conserved across datasets, while also being able to detect dataset-unique interactions. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. Since underlying regulatory mechanisms are often shared across conditions and/or cohorts, we hypothesized that multitask approaches, where conclusions are drawn from various data sources, would improve performance of network inference. Using two unicellular model organisms, we show that joint network inference outperforms inference from a single dataset. Finally, we also demonstrate that our method is robust to false edges in the prior and to low condition overlap across datasets. Because of the increasing practice of data sharing in Biology, we speculate that cross-study inference methods will be largely valuable in the near future, increasing our ability to learn more robust and generalizable hypotheses and concepts.

Richard Bonneau
Center for Genomics and Systems Biology, New York University
Primer: Inference of biological networks with biophysically motivated methods

Abstract:  Via a confluence of genomic technology and computational developments the possibility of network inference methods that automatically learn large comprehensive models of cellular regulation is closer than ever. This talk will focus on enumerating the elements of computational strategies that, when coupled to appropriate experimental designs, can lead to accurate large-scale models of chromatin-state and transcriptional regulatory structure and dynamics. We highlight four research questions that require further investigation in order to make progress in network inference: using overall constraints on network structure like sparsity, use of informative priors and data integration to constrain individual model parameters, estimation of latent regulatory factor activity under varying cell conditions, and new methods for learning and modeling regulatory factor interactions. We conclude with examples of applying this strategy to: 1) human and mouse lymphocyte development and function and 2) inference from single-cell and spacial transcriptomics aimed at healthy and diseased brain and spinal tissues.