This talk is part of the NLP Seminar Series.

Note: This talk is internal and for Stanford affiliates only.

Toward Scientific Discovery: Lessons from an AI's IMO Gold Performance

Noam Brown, OpenAI
Date: 11:00am - 12:00 noon PT, Thursday, Oct 23
Venue: Room 287, Gates Computer Science Building
Zoom (internal, please do not share outside of Stanford): https://stanford.zoom.us/j/97840340995?pwd=K1iGX608WXzZGP0nvcSUqU9AYwRozk.1
There are no 1:1s this week, but Noam will attend NLP Lunch.

Abstract

In July 2025 an experimental OpenAI large language model achieved gold-medal performance (35/42 points) on that year's International Mathematical Olympiad (IMO) problems by outputting natural language proofs, a long-standing grand challenge in the field of AI. Moreover, this was done without any tool use or added context, and within the standard 4.5-hour time windows. When evaluated on a broader set of benchmarks, we found this experimental IMO model achieved state-of-the-art performance not just on math but on a wide variety of our hardest benchmarks. In this talk I will explain what reasoning models are, how they have progressed over the past year, and the potential for these models to assist researchers on general scientific reasoning. I will also discuss a broader perspective on the trajectory of AI progress and what the future may hold for the field.

Bio

Noam Brown is a research scientist at OpenAI, specializing in multi-step reasoning, self-play, and multi-agent AI. Previously at Meta’s FAIR, he co-developed CICERO, the first AI to achieve human-level performance in the strategy game Diplomacy. His prior work includes Libratus and Pluribus, which defeated top human poker professionals in Human vs. Machine competitions. Brown’s honors include the Marvin Minsky Medal for Outstanding Achievements in AI and recognition as one of MIT Technology Review’s 35 Innovators Under 35; Pluribus was named a runner-up for Science magazine’s Breakthrough of the Year in 2019. He earned his PhD in computer science from Carnegie Mellon University and previously worked at the Federal Reserve Board researching algorithmic trading in financial markets.

Excited to see everyone at the seminar!

Thanks,
Stanford NLP Seminar Organizers