This talk is part of the NLP Seminar Series.

STACL: Simultaneous Translation with Integrated Anticipation and Controllable Latency

Liang Huang, Baidu Research / Oregon State University
Date: 11:00 pm - 12:00 pm, Jan 10 2019
Venue: Room 104, Gates Computer Science Building

Abstract

Simultaneous translation, which translates sentences before they are finished, is useful in many scenarios but is notoriously difficult due to word-order differences. While the conventional seq-to-seq framework is only suitable for full-sentence translation, we propose a novel prefix-to-prefix framework for simultaneous translation that seamlessly integrates anticipation and translation. Within this framework, we present a very simple yet surprisingly effective “wait-k” model trained to generate the target sentence concurrently with the source sentence, but always k words behind, for any given k. We also formulate a new latency metric that addresses the deficiencies of previous ones. Experiments show our strategy achieves low latency and reasonable quality (compared to full-sentence translation) on 4 directions: Chinese<->English and German<->English.

This technique has been successfully deployed to simultaneously translate Chinese speeches into English subtitles at the 2018 Baidu World Conference. It has also been covered by numerous media reports:

https://simultrans-demo.github.io/

Bio

Liang Huang is Principal Scientist at Baidu Research USA and Assistant Professor (on leave) at Oregon State University. He received his PhD from the University of Pennsylvania in 2008 (under the late Aravind Joshi) and BS from Shanghai Jiao Tong University in 2003. He has been a research scientist at Google, a research assistant professor at USC/ISI, an assistant professor at CUNY, a part-time research scientist at IBM, and an assistant professor at Oregon State University. He is a leading expert in natural language processing (NLP), where he is known for his work on fast algorithms and provable theory in parsing, machine translation, and structured prediction. Dr. Huang also works on applying the same linear-time algorithms he developed for parsing to computational structural biology. He received a Best Paper Award at ACL 2008, a Best Paper Honorable Mention at EMNLP 2016, several best paper nominations (ACL 2007, EMNLP 2008, ACL 2010, and SIGMOD 2018), two Google Faculty Research Awards (2010 and 2013), a Yahoo! Faculty Research Award (2015), and a University Teaching Prize at Penn (2005). The NLP group he runs at Oregon State University ranks 15th on csrankings.org. He also enjoys teaching algorithms and co-authored a best-selling textbook in China on algorithms for programming contests.