The Stanford Natural Language Processing Group

This talk is part of the NLP Seminar Series.

Controlling and Editing Knowledge in Large Language Models

Peter Hase, University of North Carolina at Chapel Hill
Date: 11:00am - 12:00pm, Feb 15th 2024
Venue: Room 287, Gates Computer Science Building

Abstract

The success of large-scale pretraining has equipped LLMs with extensive world knowledge, but developing methods for controlling and editing model knowledge remains a challenging task. In this talk, I discuss recent work on model editing and model generalization that helps us understand how we can steer and control the knowledge (and beliefs) expressed by LLMs. Specifically, I (1) frame model editing in terms of propagating factual information within model belief graphs and review methods in this area, (2) show how model editing can be used for unlearning sensitive information in LLMs, after formalizing an adversarial threat model for LLM unlearning, and (3) argue that easy-to-hard generalization is an important problem for understanding how we can effectively supervise models in domains where it would be useful for them to be knowledgeable (e.g. STEM domains) even when labeling data in these domains is especially difficult. Collectively, these problems exemplify the important challenge of understanding and controlling the knowledge and beliefs of LLMs.

Bio

Peter Hase is a fifth-year PhD student in the UNC-NLP lab at the University of North Carolina at Chapel Hill. His research focuses on interpretable ML and NLP, and he is particularly interested in techniques for explaining model behavior and improving model safety. His work at UNC is supported by a Google PhD Fellowship. He has previously worked at AI2, FAIR, and Google Research.