Pre-trained word embeddings have been a critical in the success of deep learning for NLP, since they allow models to take advantage of the nearly unlimited amount of unannotated text on the web. In the last few years, conditional language models have been used to generate pre-trained contextual representations, which are much richer and more powerful than plain embeddings. This talk describes BERT (Bidirectional Encoder Representation from Transformers), a new pre-training technique which generates deeply bidirectional pre-trained language representations. BERT obtains state-of-the-art results on the Stanford Question Answering Dataset, MultiNLI, Stanford Sentiment Treebank, and many other tasks.
Jacob Devlin is a Staff Research Scientist at Google. At Google, his primary research interest is developing fast, powerful, and scalable deep learning models for information retrieval, question answering, and other language understanding tasks. From 2014 to 2017, he worked as a Principle Research Scientist at Microsoft Research, where he led Microsoft Translate's transition from phrase-based translation to neural machine translation (NMT). Mr. Devlin was the recipient of the ACL 2014 Best Long Paper award and the NAACL 2012 Best Short Paper award. He received his Master's in Computer Science from the University of Maryland in 2009, advised by Dr. Bonnie Dorr.