This talk is part of the NLP Seminar Series.

Successful evals for large language models

Jason Wei, OpenAI
Date: 11:00am - 12:00pm, May 23rd 2024
Venue: Room 287, Gates Computer Science Building


Evals have been undervalued in recent years. In this talk I will discuss what makes a successful eval and examples of successful evals. Then I will discuss the most common mistakes that make evals not successful. Finally I will discuss some thoughts on recent evals in the LLM space.


Jason Wei is an AI researcher living in San Francisco, currently working at OpenAI. He was previously a research scientist at Google Brain, where he popularized key ideas in large language models such as chain-of-thought prompting, instruction tuning, and emergent phenomena.