You are here

MIA Talks

DM I. Bayesian logistic regression and mixed models: Revenge of the Gibbs

October 13, 2015
Broad Institute

Our aim is to give background and motivation for Scott's talk next week. Consider SNP association testing against a binary phenotype (disease vs. no disease). While linear regression enjoys very efficient inference, the simplest version is lacking due to:

  • erroneous hard calls of variants (go with probabilities)
  • multiple testing (go Bonferonni, FDR)
  • confounding by ancestry, batch effects (go add PCs)
  • cryptic relatedness (go full mixed model)
  • binary phenotype (go logistic)
  • overfitting (go Bayesian)
  • nonlinear dependence of phenotype on covariates (go Gaussian process?)
  • admixture (go topic model?)
  • non-normal distribution of effect sizes (go GMM prior?)
  • sparsity (go lasso?)
  • epistatis (go neural net?)
  • ascertainment bias (go do some research)
  • high-dimensional phenotypes, both continuous and categorical (go do some modeling)

We will describe models addressing some of these points including Bayesian probit, logit, and mixed logit models, and time-permitting, some fancier models mixing continuous and discrete structure. Our emphasis will be on how exponential-family conjugacy makes inference easy via Gibbs sampling in certain cases, whereas its absence leads one toward despair (at least for six more days).