This talk is part of the NLP Seminar Series.

The shape of AI accountability and its contours in copyright

Johnny Wei, USC
Date: 11:00am - 12:00 noon PT, Thursday, Jan 29
Venue: Room 287, Gates Computer Science Building
Zoom: https://stanford.zoom.us/j/93941842999?pwd=vH7x9wB9bfuIaV1HnQthRmqA8BKTGh.1

Abstract

How do we establish accountability for AI? While the shape of AI accountability at large remains amorphous, its contours are revealed in the ongoing copyright challenge to AI. In this talk, I’ll outline a legal theory of change and situate two works in this context. The first work focuses on the legal setup, theorizing how the judiciary can establish copyright accountability for LLMs by interrogating LLM training decisions and examining how they affect the model's memorization. Further progress in copyright then depends on deriving best practices for auditing and mitigating undesirable memorization. The second work focuses on scientific follow up and our release of Hubble, a model suite to advance the study of LLM memorization. Hubble models are trained on English but also with controlled insertions of text designed to emulate key memorization risks. I’ll summarize the main findings and conclude on the potential of controlled insertions for safety-critical concerns beyond copyright.

Bio

Johnny Tian-Zheng Wei is a final year PhD student at USC, and his interdisciplinary research spans machine learning, statistics, and law. He has published in a range of conferences including AIES, FAccT, and ACL, and recently led the open-source release of Hubble, which was supported by NVIDIA through the NAIRR pilot program. He also co-organized the First Workshop on LLM Memorization at ACL 2025.