The Stanford Natural Language Processing Group

This talk is part of the NLP Seminar Series.

Evaluation is Power. How can we use it well?

Ziang Xiao, Johns Hopkins University
Date: 11:00am - 12:00 noon PT, Thursday, May 21st
Venue: Room 287, Gates Computer Science Building

Abstract

Evaluation steers the field of AI. It shapes what is worth building, what counts as progress, and what gets deployed and regulated. However, are our evaluation practices keeping pace with that responsibility? Benchmarks often fail to measure what they claim, and practitioners rarely find that these evaluations translate into actionable improvements. In this talk, I argue that good AI evaluation rests on two foundations. First, validity. Drawing on measurement science, we developed a conceptual framework with tools that treat benchmark design as the disciplined construction of measurement instruments. This approach exposes hidden assumptions and makes evaluation accountable to the constructs it intends to capture. Second, human-centeredness. A methodologically rigorous evaluation can widen the sociotechnical gap if the construct was chosen without the people it concerns. I show how HCI methods can help us reveal frictions in human-AI interaction that benchmarks often overlook. I will close by introducing OpenEval, an ongoing infrastructure effort to realize these two foundations, and show how it enables more valid, auditable, and participatory evaluation. Evaluation is power. We should use it well.

Bio

Ziang Xiao is an assistant professor at Johns Hopkins University's Department of Computer Science. His research is motivated by the fundamental question of understanding humans at scale. Through his research, Ziang aims to create a more connected research community and democratize novel technologies to operationalize intuitions and curiosities about how we think and behave. His current research focuses on three topics: AI for social science, human-centered model evaluation, and information seeking. Broadly, Ziang's work lies at the intersection of human-computer interaction, natural language processing, and social psychology. He has authored award-winning papers in top-tier HCI, NLP, and AI conferences and journals. He was a postdoc at Microsoft Research Montreal and completed his PhD in computer science at the University of Illinois Urbana-Champaign.