The Stanford Natural Language Processing Group

This talk is part of the NLP Seminar Series.

Note: This talk is internal and for Stanford affiliates only.

Learning from contrastive and adaptive data

Pang Wei Koh, University of Washington
Date: 11:00am - 12:00 noon PT, Thursday, Oct 30
Venue: Room 287, Gates Computer Science Building
Zoom (internal, please do not share outside of Stanford): https://stanford.zoom.us/j/97840340995?pwd=K1iGX608WXzZGP0nvcSUqU9AYwRozk.1
Sign-ups for 1:1s: https://docs.google.com/spreadsheets/d/1KcajV6qEwAS9UgWGMzeHqXXlhPa3eRA_u6jFUZ517xs/edit?gid=0#gid=0

Abstract

Language models are tackling tasks so complex that solving or even verifying them requires significant time and expertise. This makes it challenging to acquire training data at scale. In this talk, I will present three ongoing approaches to this problem. First, we will show that models can learn a surprising amount from relative quality differences between paired data, even if this training data is of worse absolute quality than what our model is already capable of producing. Second, we will explore using verifiable environments to procedurally generate training problems with a difficulty level that adapts to our model, enabling efficient reinforcement learning. Finally, we will discuss our work on training a long-form "deep research" model by iteratively and adaptively constructing rubrics that can provide discriminative training signals on complex long-form tasks.

Bio

Pang Wei Koh is an assistant professor in the Allen School of Computer Science and Engineering at the University of Washington and a research lead at the Allen Institute for AI. His research has been recognized by the AI2050 Early Career Fellowship, MIT Tech Review Innovators Under 35 Asia Pacific award, Google ML and Systems Junior Faculty Award, and best paper awards at ICML, KDD, and ACL. He received his PhD and BS in Computer Science from Stanford University. Prior to his PhD, he was the 3rd employee and Director of Partnerships at Coursera.

Excited to see everyone at the seminar!

Thanks,
Stanford NLP Seminar Organizers