Inference on generalized bilinear models/Generalized linear models and latent factor models

Jeff Miller
Dept. of Biostatistics, Harvard University
Meeting: Inference in generalized bilinear models

Latent factor models are widely used to discover and adjust for hidden variation in modern applications. However, most methods do not fully account for uncertainty in the latent factors, which can lead to mis-calibrated inferences such as overconfident p-values. In this article, we develop a fast and accurate method of uncertainty quantification in generalized bilinear models, which are a flexible extension of generalized linear models to include latent factors as well as row covariates, column covariates, and interactions. In particular, we introduce delta propagation, a general technique for propagating uncertainty among model components using the delta method. Further, we provide a rapidly converging algorithm for maximum a posteriori GBM estimation that extends earlier methods by estimating row and column dispersions. In simulation studies, we find that our method provides approximately correct frequentist coverage of most parameters of interest. We demonstrate on RNA-seq gene expression analysis and copy ratio estimation in cancer genomics.

Will Townes
Engelhardt Group, Princeton University
Primer: Generalized linear models and latent factor models

Generalized linear models (GLMs) are widely used in the statistical analysis of data with non-normally distributed errors. Examples include logistic regression for binary outcomes and negative binomial regression for over-dispersed counts. In this primer, we review the fundamental components of the GLM as well as standard algorithms for optimizing the unknown parameters. GLMs are a form of supervised learning- they describe the effect of one or more predictor variables X on a single outcome variable Y. However, many modern datasets consist of large numbers of measurements that are all jointly of interest and unsupervised learning is more appropriate. Latent factor models such as principal component analysis are a popular approach to dimension reduction in this setting. We examine their basic properties from a probabilistic perspective. This lays the foundation for generalized bilinear models, which enable latent factor models to be fit to non-Gaussian data just as GLMs are in the supervised setting.