The Stanford Natural Language Processing Group

This talk is part of the NLP Seminar Series.

Deep Sequence Models: Context Representation, Regularization, and Application to Language.

Adji Bousso Dieng, Columbia University
Date: 11:00am - 12:00pm, Apr 12 2018
Venue: Room 219, Gates Computer Science Building

Abstract

Recurrent Neural Networks (RNNs) are the most successful models for sequential data. They have achieved state-of-the-art results in many tasks including language modeling, image and text generation, speech recognition, and machine translation. Despite all these successes, RNNs still face some challenges: they fail to capture long-term dependencies (don't believe the myth that they do!) and they easily overfit.

The ability to capture long-term dependencies in sequential data depends on the way context is represented. Theoretically, RNNs capture all the dependencies in the sequence via the use of recurrence and parameter sharing. However practically, RNNs face optimization issues. Assumptions made to counter these optimization challenges hinder the capability of RNNs to capture long-term dependencies. On the other hand, the overfitting problem of RNNs stem from the strong dependence of the hidden units to each other. I will talk about my research on context representation and regularization for RNNs. First, I will make the case that in the context of language, topic models are very effective at representing context and can be used jointly with RNNs to facilitate learning and capture long-term dependencies. Second, I will discuss our new proposed method to regularize RNNs called NOISIN. NOISIN relies on the concept of unbiased noise injection in the hidden units of RNNs to reduce co-adaptation. It significantly improves the generalization capabilities of existing RNN-based models. For example, it improves RNNs with dropout by as much as 12.2% on the Penn Treebank and 9.4% on the Wikitext-2 dataset.

Bio

Adji Bousso Dieng is a PhD student at Columbia University where she works with David Blei and John Paisley. Her work at Columbia is about combining probabilistic graphical modeling and deep learning to design better sequence models. She develops these models within the framework of variational inference which enables efficient and scalable learning. Her hope is that her research can be applied to many real world applications particularly to natural language understanding.

Prior to joining Columbia, she worked as a Junior Professional Associate at the World Bank. She did her undergraduate training in France where she attended Lycee Henri IV and Telecom ParisTech---France's Grandes Ecoles system. She holds a Diplome d'Ingenieur from Telecom ParisTech and spent the third year of Telecom ParisTech's curriculum at Cornell University where she earned a Master in Statistics.