ReFT: Representation Finetuning for Language Models
April, 05, 2024
ReFT represents a novel approach to parameter-efficient, powerful, and interpretable fine-tuning of language models. It draws inspiration from our interpretability work in distributed alignment search (DAS). Instead of training any model weights, we train interventions that edit representations on-the-fly. We demonstrate that editing a very limited number of representations is sufficient to achieve or get close to the state-of-the-art (SoTA) performance across a wide range of tasks.