The Stanford Natural Language Processing Group

This talk is part of the NLP Seminar Series.

Rethinking LLM Alignment: Mechanisms, Data, and Evaluation

Bill Yuchen Lin, Ai2, UW
Date: 11:00am - 12:00 noon PT, Nov 21 2024
Venue: Room 287, Gates Computer Science Building

Abstract

Aligning LLMs with human intents and values is a critical focus for the AI community, and this talk explores three essential aspects of LLM alignment: mechanism, synthetic data generation, and evaluation, which together form a continuous loop for enhancing model alignment. First, we examine the alignment mechanism to understand how post-training adjustments impact base LLMs, analyzing token distribution shifts to reveal the specific effects of alignment training on model behavior. Supporting these findings, we introduce URIAL, a simple in-context alignment method that aids data curation and sheds light on training dynamics. Next, we present Magpie, an efficient method for generating high-quality, open-source alignment data. This scalable approach synthesizes extensive datasets by prompting LLMs with minimal input, and, combined with an automated data curation pipeline, facilitates the creation of one of the most effective open-source instruction-learning datasets with minimal licensing restrictions. Finally, we focus on evaluation-centric alignment, exploring robust methods to assess the performance of aligned LLMs and leverage feedback for continuous improvement. We demonstrate how challenging, real-world user tasks can serve as evaluation benchmarks, along with an analysis of existing metrics, including those in the ZeroEval leaderboard. The goal of this talk is to provide new insights into these three pillars of LLM research and inspire future directions in the pursuit of better-aligned models.

Bio

Bill Yuchen Lin is a Research Scientist at the Allen Institute for AI (Ai2) and an Affiliate Assistant Professor at the University of Washington (UW). His research centers on aligning large language models (LLMs), training AI agents, reasoning, and multimodal LLMs, with a focus on post-training, evaluation, reward modeling, and synthetic data generation. He also aims to deepen the core understanding of language models and explore their limitations, with experience in enhancing the safety, generalization, robustness, and efficiency of LLMs. Lin has received several honors, including the Best Paper Award at the LangRob Workshop at CoRL 2024, the Best Paper Award Runner-up at The Web Conference 2020, the Best Paper Award at TrustNLP 2021, and recognition as an AI Rising Star by Baidu Scholar. He serves as a Senior Area Chair for the Association for Computational Linguistics (ACL) and an Area Chair for the International Conference on Learning Representations (ICLR). Lin earned his Ph.D. from the University of Southern California in 2022. He completed his bachelor’s degree in the IEEE Honor Class at Shanghai Jiao Tong University (2014–2018), where he received the Best Thesis Award.