Note: This talk is internal and for Stanford affiliates only.
Language models are tackling tasks so complex that solving or even verifying them requires significant time and expertise. This makes it challenging to acquire training data at scale. In this talk, I will present three ongoing approaches to this problem. First, we will show that models can learn a surprising amount from relative quality differences between paired data, even if this training data is of worse absolute quality than what our model is already capable of producing. Second, we will explore using verifiable environments to procedurally generate training problems with a difficulty level that adapts to our model, enabling efficient reinforcement learning. Finally, we will discuss our work on training a long-form "deep research" model by iteratively and adaptively constructing rubrics that can provide discriminative training signals on complex long-form tasks.
Pang Wei Koh is an assistant professor in the Allen School of Computer Science and Engineering at the University of Washington and a research lead at the Allen Institute for AI. His research has been recognized by the AI2050 Early Career Fellowship, MIT Tech Review Innovators Under 35 Asia Pacific award, Google ML and Systems Junior Faculty Award, and best paper awards at ICML, KDD, and ACL. He received his PhD and BS in Computer Science from Stanford University. Prior to his PhD, he was the 3rd employee and Director of Partnerships at Coursera.
Excited to see everyone at the seminar!
Thanks,
Stanford NLP Seminar Organizers