This talk is part of the NLP Seminar Series.

How to Know What Language Models Know

Jennifer Hu, Harvard University
Date: 11:00am - 12:00pm, Mar 7th 2024
Venue: Room 287, Gates Computer Science Building

Abstract

Evaluation has long been essential to progress in AI and NLP. But as language models (LMs) become more sophisticated, their performance on benchmarks is increasingly being interpreted as evidence for reasoning, commonsense, or even intelligence itself. As such, one of the most important questions for our field is: how can we know what language models know? In this talk, I will first describe a framework for interpreting the outcomes of LM evaluations, inspired by the concept of linking hypotheses in cognitive science. I will then illustrate how different linking hypotheses can lead to vastly different conclusions about LMs' competence. In particular, prompt-based evaluations (e.g., "Is the following sentence grammatical? [sentence]") yield systematically lower performance than direct measurements of token probabilities. These results underscore the importance of specifying the assumptions behind our evaluation design choices before we draw conclusions about LMs' capabilities.

Bio

Jennifer Hu is a Research Fellow at the Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University, and an incoming Assistant Professor of Cognitive Science at Johns Hopkins University (starting July 2025). She earned a PhD from MIT in the Department of Brain and Cognitive Sciences, where she studied how language models can inform theories of human linguistic knowledge. Her research combines computational and experimental approaches to investigate how language works in minds and machines.