Broad Institute of MIT and Harvard

Generalized linear models are a mainstay of applied statistics and data analysis, due in large part to their interpretability. However, applying these tools in the high-dimensional setting with a large number of covariates or features brings a number of computational and statistical challenges. In this primer I give an introduction to Bayesian variable selection, which is a powerful approach for inferring generalized linear models in the high-dimensional setting that is formulated in terms of a Bayesian model selection problem. One benefit of this approach is that by construction it makes it possible to compute a Posterior Inclusion Probability or PIP, an interpretable feature-wise score that encodes the statistical evidence for the importance of each feature in explaining the response variable. I place a special emphasis on comparing Bayesian variable selection to alternative approaches to high-dimensional statistics, including Lasso and continuous shrinkage priors like the Horseshoe.

MIA Talks Search