This talk is part of the NLP Seminar Series.

Backpropagation Through the Void: Optimizing control variates for black-box gradient estimation

Will Grathwohl, University of Toronto
Date: 12:00 pm - 1:20 pm, July 26 2018
Venue: NLP Lunch (open only to the Stanford NLP group)


Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.


Will Grathwohl is a first-year PhD student at the University of Toronto where he is co-supervised by Richard Zemel and David Duvenaud. He received his undergraduate degree in mathematics from MIT in 2014. His research focuses on approximate inference and generative models for high-dimensional data. Prior to graduate school he spent 4 years working on the applied side of machine learning in silicon valley. He is currently interning at OpenAI in San Francisco.