Sequence-to-sequence neural networks, especially LSTM-based RNNs, have been applied to various NLP tasks with great success. But they suffer from a number of drawbacks, including slow training, problems with long sequences, and implementation issues. Tensor2Tensor Transformers are a set of new architectures that combine research on neural attention mechanisms (alignments) with fast parallel autoregressive training. We will explain these notions and a few basic tensor2tensor architecutres and show how they address the slow training problem and allow to handle long sequences without deterioration. We will also introduce the open-source tensor2tensor library that aims to alleviate some of the engineering problems.
Lukasz Kaiser is a senior research scientist in Google Brain, where he works on machine learning with deep neural networks. Earlier, he worked on semantic parsing at Google, and before that he was a tenured researcher at University Paris 7, working on logic and automata theory. Lukasz got his PhD from RWTH Aachen University in 2008 and earlier had two masters degrees (in mathematics and computer science) from the University of Wroclaw, Poland.