This talk is part of the NLP Seminar Series.

Towards Pretraining Language Models on Extremely Long Sequences

Mike Lewis, Meta
Date: 11:00am - 12:00pm, November 2nd 2023
Venue: Room 287, Gates Computer Science Building

Abstract

Pretraining large language models (LLMs) to reason over extremely long contexts would unlock many new applications, but faces significant challenges for both modelling and data. First, I will describe the MegaByte architecture for efficient long sequence modelling. MegaByte is a multiscale decoder that scales to sequences of over 1 million bytes, and shows strong results on text, image and audio modalities. A second challenge is that LLM training data is predominantly made of short web documents, limiting how well pre-training can learn to reason over long sequences. I will describe the recent In Context Pretraining method, which constructs pre-training sequences by concatenating together many short related documents, thereby training LLMs to read and reason across document boundaries. In Context Pretraining gives large improvements on many downstream tasks.

Bio

Mike Lewis is a research scientist at Meta working on the next generation of open foundation models. Research interests include pretraining language models (e.g. Bart and Roberta), retrieval augmentation (e.g. kNN-LM and RAG) and negotiation dialogue agents (such as the Cicero Diplomacy model). Previously he was a postdoc at the University of Washington (working with Luke Zettlemoyer), and has a PhD from the University of Edinburgh (advised by Mark Steedman). He received a Best Paper Award at EMNLP 2016, Best Resource Paper at ACL 2017, and Best Paper Honourable Mention at ACL 2018. His work has been extensively covered in the media, with varying levels of accuracy.