This talk is part of the NLP Seminar Series.

Telling Embodied Agents What To Do (And The Agents Sometimes Doing It In Reality)

Stefan Lee, Oregon State University
Date: 11:00am - 12:00 noon PT, Nov 11 2021
Venue: Zoom (link hidden)


Embodied tasks where agents navigate or manipulate the world based on natural language instructions have received growing interest in the language grounding community. In contrast to tasks based on static web imagery, these tasks offer perceptual experiences closer to robotic agents and require grounding instructions in both the visual world and the agent's physical capabilities. But how different are the visual grounding needs of these tasks from those on static web imagery? And how transferable are models for these tasks to the real world anyway? In this talk, I'll focus on Vision-and-Language Navigation to examine (1) how large-scale pretraining on static web imagery-text pairs can improve agent performance, (2) how performance on this task translates to the real world, and (3) what we might do to bridge the remaining gap.


Stefan Lee is an assistant professor in the School of Electrical Engineering and Computer Science at Oregon State University and a member of the Collaborative Robotics and Intelligent Systems (CoRIS) Institute there. His work addresses problems at the intersection of computer vision, natural language processing, and control. He is the recipient of the DARPA Rising Research Plenary Speaker Selection (DARPA 2019), two best paper awards (EMNLP 2017, CVPR 2014 Workshop on Egocentric Vision), multiple awards for review quality (CVPR 2017,2019,2020; ICCV 2017; NeurIPS 2017-2018; ICLR 2018-2019, ECCV 2020), the Bradley Postdoctoral Fellowship (Virginia Tech), and an Outstanding Research Scientist Award (Georgia Tech).