Price Lab, Harvard School of Public Health
Linking gut microbiomes, genomes and phenotypes via linear mixed models and kernel methods
Abstract: The gut microbiome is increasingly recognized as having fundamental roles in human physiology and health, and is often referred to as our second genome. However, the associations between microbiome, our genome, our environment and our health are not well understood. I will discuss our recent work to elucidate these relations, using a cohort of ~1,000 Israeli individuals with detailed microbiome, genotype, clinical and environmental measurements, with an emphasis on methods to handle the large dimensionality and heterogeneity of such data. Our approaches combine linear mixed models – the statistical backbone of GWAS and phenotype prediction methods – with common techniques from statistical ecology, and with kernel regression approaches from machine learning.
In the first part of the talk, I will describe approaches to investigate the role of host genetics in shaping the gut microbiome. In the second part, I will describe approaches to investigate how host genetics and the microbiome interact with traits such as obesity and glucose levels. I will show that the fraction of phenotypic variance explained by the microbiome is often comparable to that of host genetics, which provides a positive outlook towards microbiome-based therapeutics of metabolic disorders.
This is a joint work with Daphna Rothschild and Elad Barkan from Eran Segal's group at the Weizmann Institute of Science. It has recently been accepted for publication in Nature. [preprint]
Lander Lab, Broad Institute
Primer: Kernel Methods and the Kernel "Trick"
Abstract: We have a variety of linear methods for data analysis and machine learning that are familiar & intuitive, but our data are often nonlinear in complicated ways, or come in a form where the idea of "linear" doesn't have an obvious meaning, such as DNA sequences or graphs based on protein interactions.
Kernel methods allow us to apply some of our familiar linear tools to nonlinear and structured data, using similarities between data points as the basis for classification, regression, and other analyses like PCA. I'll explain the "kernel trick" as a principled way to extend linear methods to work with similarities, talk about algorithms based on kernels (support vector machines, support vector regression, & kernelized PCA), introduce example kernels for a variety of data types (e.g., vectors, graphs, strings), and discuss approximations that allow kernels to be applied to very large datasets.