This talk is part of the NLP Seminar Series.

Doing More with Less: Behavioral Efficiency in Language Model Agents

Daniel Fried, Carnegie Mellon University
Date: 11:00am - 12:00 noon PT, Thursday, May 7
Venue: Room 287, Gates Computer Science Building
Zoom: https://stanford.zoom.us/j/93941842999?pwd=vH7x9wB9bfuIaV1HnQthRmqA8BKTGh.1
Sign-ups for 1:1s: https://docs.google.com/spreadsheets/d/1CEJl6LiZ09El-EjSuBD5wE5pQH-rw9vuc1A39PLJMqQ/edit?usp=sharing

Abstract

Current language model agents accomplish increasingly complex tasks, but they do so wastefully -- taking far more steps and producing far more language than necessary. I present our work making agents more efficient at multiple levels. First, I show how multimodal models can learn to form linguistic conventions, reducing message length by up to 41% while improving communicative success with people, by training on simulated interactions with simple notions of success and cost. Second, I present agent skill induction, where agents learn reusable programmatic skills (tools) online, compressing multi-step procedures into single function calls and improving both success rate and efficiency on web navigation tasks. Finally, I introduce Odysseys, a benchmark of 200 long-horizon web tasks derived from real browsing behavior, which often take hours for models to complete -- highlighting the need for agents that succeed not just eventually, but economically.

Bio

Daniel Fried is an assistant professor in the Language Technologies Institute at Carnegie Mellon University. His research focuses on NLP, grounding and interaction, and modeling the strategic use of language, with a particular focus on language interfaces such as LLM agents and code generation. Previously, he was a postdoc at Meta AI and the University of Washington and completed a PhD at UC Berkeley. His research has been supported by a Microsoft Faculty Fellowship, an NSF CAREER Award, and the Okawa Research Award.