Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. I will talk about GPT-3, an autoregressive language model with 175 billion parameters, which demonstrates how scaling up language models can greatly improve task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. GPT-3 can be applied to tasks without any gradient updates or fine-tuning, with few-shot demonstrations specified purely via text interaction with the model. I will give an overview of what GPT-3 is and how it works, talk through the capabilities we see from such a system and how they enable a new way of interacting with language models, and additionally focus on the limitations and broader questions these interactions raise.
Melanie is a Computer Science PhD student at Columbia University working on Natural Language Processing with Professor Kathleen McKeown. Prior to starting her PhD, she was a Research Engineer first at Apple and then at OpenAI. At OpenAI, she worked on GPT-3 and subsequently the API built on GPT-3. Melanie's research focuses on natural language generation and understanding.