Zhengxuan Wu

ReFT: Representation Finetuning for Language Models

April, 05, 2024

ReFT represents a novel approach to parameter-efficient, powerful, and interpretable fine-tuning of language models. It draws inspiration from our interpretability work in distributed alignment search (DAS). Instead of training any model weights, we train interventions that edit representations on-the-fly. We demonstrate that editing a very limited number of representations is sufficient to achieve or get close to the state-of-the-art (SoTA) performance across a wide range of tasks.

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

May, 09, 2023

Obtaining robust, human-interpretable explanations of large, general-purpose language models is an urgent goal for AI. Building on the theory of causal abstraction, we release this generic library encapsulated Boundless DAS introduced in our paper for find representations that play a given causal role in LLMs with billions of parameters.

Computer Science Ph.D. Statement of Purpose

June, 30, 2022

It takes me years to transition from an aerospace engineering student to a NLP Ph.D. student. I want to share my experience as much as I can, so people can build on top of it to make their experience even better. For my SOP, I have to credit my good friend Nelson F. Liu. I wrote my SOP based on his! I applied twice, and I am also pretty liberal to share the version of my failed attempt. One takeaway for me is that - you need to have a big vision that is grounded with specific past experience.