To generate language, we model what to say, why not also model how listeners will react? We show how pragmatic inference in Rational Speech Acts-style models can be used to both generate and interpret natural language instructions for complex, sequential tasks. Our pragmatics-enabled models reason about how listeners will react upon hearing instructions, and reason counterfactually about why speakers produced the instructions they did. We find that this inference procedure improves state-of-the-art listener models (at correctly interpreting human instructions) and speaker models (at generating instructions correctly interpreted by humans) in diverse settings, including navigating through indoor environments conditioned on real visual imagery.
Daniel Fried is a fourth-year PhD student at UC Berkeley advised by Dan Klein. His research interests currently focus on grounded semantics, instruction following, and structured prediction. Previously, he graduated with a BS from the University of Arizona and an MPhil from the University of Cambridge. His work has been supported by Churchill, NDSEG, Huawei, and Tencent fellowships.