Note: This talk is internal and for Stanford affiliates only.
In July 2025 an experimental OpenAI large language model achieved gold-medal performance (35/42 points) on that year's International Mathematical Olympiad (IMO) problems by outputting natural language proofs, a long-standing grand challenge in the field of AI. Moreover, this was done without any tool use or added context, and within the standard 4.5-hour time windows. When evaluated on a broader set of benchmarks, we found this experimental IMO model achieved state-of-the-art performance not just on math but on a wide variety of our hardest benchmarks. In this talk I will explain what reasoning models are, how they have progressed over the past year, and the potential for these models to assist researchers on general scientific reasoning. I will also discuss a broader perspective on the trajectory of AI progress and what the future may hold for the field.
Noam Brown is a research scientist at OpenAI, specializing in multi-step reasoning, self-play, and multi-agent AI. Previously at Meta’s FAIR, he co-developed CICERO, the first AI to achieve human-level performance in the strategy game Diplomacy. His prior work includes Libratus and Pluribus, which defeated top human poker professionals in Human vs. Machine competitions. Brown’s honors include the Marvin Minsky Medal for Outstanding Achievements in AI and recognition as one of MIT Technology Review’s 35 Innovators Under 35; Pluribus was named a runner-up for Science magazine’s Breakthrough of the Year in 2019. He earned his PhD in computer science from Carnegie Mellon University and previously worked at the Federal Reserve Board researching algorithmic trading in financial markets.
Excited to see everyone at the seminar!
Thanks,
Stanford NLP Seminar Organizers