Andrew Gordon Wilson
Courant Institute, NYU
Loss Valleys, Uncertainty, and Generalization in Deep Learning
In this talk we discuss how to exploit the geometry of training objectives for scalable Bayesian model averaging, leading to better point predictions, as well as uncertainty and calibration in deep learning. We will focus primarily on five works, which include the surprising discovery of mode connectivity, and its implications.
Bayesian methods can provide full-predictive distributions and well-calibrated uncertainties in modern deep learning. The Bayesian approach is especially relevant in scientific and healthcare applications --- where we wish to have reliable predictive distributions for decision making, and the facility to naturally incorporate domain expertise. With a Bayesian approach, we not only want to find a single point that optimizes a loss, but rather to integrate over a loss landscape to form a Bayesian model average. The geometric properties of the loss surface, rather than the specific locations of optima, therefore greatly influence the predictive distribution in a Bayesian procedure. By better understanding loss geometry, we can realize the significant benefits of Bayesian methods in modern deep learning, overcoming challenges of dimensionality. In this talk, we review work on Bayesian inference and loss geometry in modern deep learning, including challenges, new opportunities, and applications.