The Stanford Natural Language Processing Group

This talk is part of the NLP Seminar Series.

Robust and accurate fine-tuning for large neural networks

Mitchell Wortsman, University of Washington
Date: 11:00am - 12:00pm, April 6th 2023
Venue: Room 287, Gates Computer Science Building; Zoom (link hidden)

Abstract

I'll discuss methods for fine-tuning which improve model robustness and accuracy. These methods leverage the observation that fine-tuned models often appear to lie in a single low error region. To improve robustness, we therefore interpolate the weights of the pre-trained and fine-tuned models. This achieves the best of both worlds: capturing the robustness of the pre-trained model and the in-distribution accuracy of the fine-tuned model. We then generalize this approach to interpolate the weights of multiple fine-tuned models. The conventional procedure for maximizing model performance is to try many different hyperparameters, then select the best model and discard the remainder. We propose an alternative to this procedure in the context of fine-tuning: we average the weights of multiple fine-tuned models, and are often able to produce a better model with no added inference cost.

Bio

Mitchell is a fourth year PhD student at the University of Washington advised by Ludwig Schmidt and Ali Farhadi. His research interests include large models, transfer learning, and robustness.