Evals have been undervalued in recent years. In this talk I will discuss what makes a successful eval and examples of successful evals. Then I will discuss the most common mistakes that make evals not successful. Finally I will discuss some thoughts on recent evals in the LLM space.
Jason Wei is an AI researcher living in San Francisco, currently working at OpenAI. He was previously a research scientist at Google Brain, where he popularized key ideas in large language models such as chain-of-thought prompting, instruction tuning, and emergent phenomena.