In the first half of the talk, I will discuss NLP work that I have done at Google Brain, describing a novel decoding method for conversational modeling (EMNLP 2017, Best Paper Award at ICML Language Generation Workshop), a large scale exploration of neural machine translation architectures (EMNLP 2017), and an open-source seq2seq framework in TensorFlow (tf-seq2seq - 4,450+ stars, 1000+ forks on GitHub).
In Generating High-Quality and Informative Conversation Responses with Sequence to Sequence Models, we addressed some of the shortcomings (so to speak) of sequence-to-sequence models for conversational modeling, namely their tendency to produce short and generic responses (e.g. “I don’t know”, which has high maximum likelihood in virtually any context) by generating responses segment by segment using stochastic beam search with negative sampling. We also proposed a computationally efficient form of self attention which helped the model to maintain coherence as we increased the length of the responses, avoiding common failure modes like “I live in the center of the sun in the center of the sun…” Our method was able to produce responses that had lower perplexity and which were preferred by human evaluators compared to baseline sequence-to-sequence models with explicit length promotion.
In the second half of the talk, I will describe the non-confidential aspects of the moonshot that I am leading at Google Brain in the area of Machine Learning for Systems (alongside Jeff Dean and my long-time collaborator Azalia Mirhoseini). We developed a deep reinforcement learning method for efficiently partitioning large computational graphs across multiple hardware devices like CPUs or GPUs, as described in A Hierarchical Model for Device Placement (ICLR 2018). In order to scale to large computational graphs with tens of thousands of operations, we take a hierarchical approach, learning to first group the operations of the graph and then to place those groups onto devices. The two models are trained jointly using policy gradient, and the reward signal is the runtime of the generated placement. We were able to achieve runtime reductions of up to 60% with no humans in the loop.
Anna Goldie is a researcher on the Google Brain team, working on question-answering, conversational modeling, machine translation, meta-learning, and most recently, machine learning for systems (i.e. how to use ML to optimize and automate the design of computer systems). She completed her Bachelors in Computer Science, Bachelors in Linguistics, and Masters in Computer Science at MIT, where she built a Mandarin-speaking dialogue system for her thesis. Anna likes learning languages and speaks fluent Mandarin, Japanese, and French, as well as some Korean, Italian, German, and Spanish.