In this talk I will present work related to two trends in NLP: ever larger models and the use of self-attention in modern models. Increased model size can improve accuracy but not all inputs benefit equally from higher capacity models. Moreover, current models perform the same amount of computation regardless of whether processing the sample is easy or hard. To address this, I will present the idea of adaptive-depth sequence models which dynamically adjust the number of layers used to process an input example. This applies the full model when appropriate and reduces the model size otherwise. Next, I will talk about self-attention which is regarded as an important factor to the good performance of modern models. To better understand the role of self-attention, I will present a simple type of convolution which can outperform self-attention on a range of NLP tasks. This enables more efficient models and shows that self-attention is not as important as commonly believed in achieving good performance.
Michael Auli is a research scientist at Facebook AI Research in Menlo Park. During his PhD he worked on CCG parsing at the University of Edinburgh where he was advised by Adam Lopez and Philipp Koehn. While at Microsoft Research, he did some of the early work on neural machine translation and neural dialogue models. After this, he led the team which developed convolutional sequence to sequence models. Currently, Michael works on semi-supervised and self-supervised learning applied to natural language processing and speech recognition. His team ranked first in the WMT news translation task for the past two years in the most competitive language direction. http://michaelauli.github.io