You are here

Tamara Broderick, The kernel interaction trick; Raj Agrawal, Intro to GPs for regression

Raj Agrawal
Broderick Group, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology; ArbiLex
Primer: Gaussian processes: An introduction

This primer introduces Gaussian Processes (GPs) as a practical tool for data science and machine learning. We focus on GPs' popular use as a flexible tool for regression, with a fully nonparametric relation between covariates and response --- as well as a coherent mechanism for reporting uncertainty. We will discuss how GPs are an example of "Bayesian nonparametrics" (BNP) and able to learn more nuance from data as data set size increases. We will cover the standard GP model; the covariance (or kernel) function that specifies the GP; GP inference; benefits and limitations of GPs; and uses of GPs as a tool or module in data analyses beyond GP regression.

 

Dept. of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Laboratory, Statistics and Data Science Center, Massachusetts Institute of Technology
Meeting: Fast discovery of pairwise interactions in high dimensions using Bayes
Tamara Broderick

Discovering interaction effects on a response of interest is a fundamental problem in medicine, economics, and many other disciplines. In theory, Bayesian methods for discovering pairwise interactions enjoy many benefits such as coherent uncertainty quantification, the ability to incorporate background knowledge, and desirable shrinkage properties. In practice, however, Bayesian methods are often computationally intractable for problems of even moderate dimension p. Our key insight is that many hierarchical models of practical interest admit a particular Gaussian process (GP) representation; the GP allows us to capture the posterior with a vector of O(p) kernel hyper-parameters rather than O(p^2) interactions and main effects. With the implicit representation, we can run Markov chain Monte Carlo (MCMC) over model hyper-parameters in time and memory linear in p per iteration. We focus on sparsity-inducing models; on datasets with a variety of covariate behaviors, we show that our method: (1) reduces runtime by orders of magnitude over naive applications of MCMC, (2) provides lower Type I and Type II error relative to state-of-the-art LASSO-based approaches, and (3) offers improved computational scaling in high dimensions relative to existing Bayesian and LASSO-based approaches.