Recent work has shown that multi-task instruction tuning significantly enhances the usability and robustness of pre-trained language models. However, the understanding of the performance trade-offs associated with different decisions made during the instruction-tuning process remains limited. These decisions include the scale and diversity of the instruction-tuning data, the variation in task representations, the utilization of specialized datasets, and ultimately, the fine-tuning objectives and hyperparameters. We present project OPT-IML, a study that characterizes the effect of supervised instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a comprehensive benchmark for Instruction Meta-Learning (IML), consolidating 2000 NLP tasks derived from 8 existing NLP benchmarks. We design an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first share insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, known as the instructable counterparts of OPT. OPT-IML demonstrates all three generalization abilities at both scales across diverse tasks and input formats, proving highly competitive with existing models fine-tuned on each specific benchmark. The OPT-IML models have been publicly released, and we are in the progress of releasing our evaluation framework. We conclude the talk with a discussion that contrasts task-oriented and the latest conversational instruction-tuned language models.
Victoria Lin is a Research Scientist at Meta’s Foundational AI Research (FAIR) Team, while also pursuing her doctoral studies at the University of Washington. Her primary research interest centers on the development of general intelligent systems that efficiently process massive amounts of information and assist humans in various knowledge-intensive tasks. Her latest work focuses on large-scale language modeling and instruction tuning. Prior to Meta, she was a research scientist at Salesforce Research where her work spanned language-to-code generation, natural language interfaces and question answering over structured data.