The Stanford Natural Language Processing Group

This talk is part of the NLP Seminar Series.

Rethinking LLM Alignment: Mechanisms, Data, and Evaluation

Pratyush Maini, CMU, DatalogyAI
Date: 11:00am - 12:00 noon PT, May 8 2025
Venue: Room 287, Gates Computer Science Building

Abstract

As large language models scale, our understanding of "safety" has remained frustratingly surface-level—often limited to red teaming, RLHF, and output filtering. In this talk, I'll discuss how research into memorization provides a fundamentally different lens on safety. I'll trace the development of methods to precisely quantify memorization, detect unauthorized use of training data, and rigorously evaluate attempts at unlearning. This perspective illuminates broader societal and legal concerns, including emerging questions around copyright and ownership in a rapidly evolving technological landscape. Ultimately, this research underscores the need for natively safe models, as opposed to those with patches of alignment added on top of a harmful model.

Bio

Pratyush is a Ph.D. candidate in the Machine Learning Department at Carnegie Mellon University and a founding member of DatologyAI. In his work, he has developed scalable and performant methods for improving the quality of data that we train machine learning models on. He has also developed methods that allow us to evaluate, locate, and mitigate the memorization of data points by neural networks. His works have been recognized through a best paper award nomination at NeurIPS, and multiple oral and spotlight talks at major ML conferences.