The Stanford Natural Language Processing Group

This talk is part of the NLP Seminar Series.

Building linguistically informed models for low-resource settings

Nanyun Peng, Information Science Institute, University of Southern California
Date: 11:00 pm - 12:00 pm, Nov 15 2018
Venue: Room 392, Gates Computer Science Building

Abstract

Recent advancements in data-driven approaches, such as deep neural networks, has demonstrated strong performances by using a vast amount of labeled data. However, for highly specialized domains (e.g., biomedical domain) or lesser-explored languages (e.g., Indonesian), it is often impractical to assume the availability of abundant annotated data, restricting the feasibility of using these data-hungry models. In this talk, I will present a suite of models to overcome this barrier by leveraging linguistic knowledge. Specifically, I will demonstrate how a linguistic-informed graph LSTMs can model biomedical relation extraction by using only several thousands of training examples. Then, I'll discuss how language properties, such as word order, play a role in cross-lingual transfer learning to help a dependency parser work on far-away target languages with only training data in the source language.

Bio

Nanyun (Violet) Peng is a Research Assistant Professor in the computer science department and a Computer Scientist in the Information Science Institute, University of Southern California. She received her Ph.D. at Johns Hopkins University. Her research focuses on low-resource information extraction, creative language generation, and phonology/morphology modeling. Nanyun is the 2016 Fred Jelinek Fellow. She has a background in computational linguistics and economics and holds BAs in both.