The Stanford Natural Language Processing Group

This talk is part of the NLP Seminar Series.

What is wrong with my model? Detection and analysis of bugs in NLP models

Marco Tulio Ribeiro, Microsoft Research
Date: 11:00am - 12:00pm, Feb 27 2020
Venue: Room 392 Gates Computer Science Building

Abstract

I will present two projects that deal with evaluation and analysis of NLP models beyond cross validation accuracy. First, I will talk about Errudite (ACL2019), a tool and set of principles for model-agnostic error analysis that is scalable and reproducible. Instead of manually inspecting a small set of examples, we propose systematically grouping of instances with filtering queries and counterfactual analysis (if possible).

Then, I will talk about ongoing work in which we borrow insights from software engineering (unit tests, etc) to propose a new testing methodology for NLP models. Our tests reveal a variety of critical failures in multiple tasks and models, and we show via a user study that the methodology can be used to easily detect previously unknown bugs.

Bio

Marco Tulio Ribeiro is a Senior Researcher at Microsoft Research and an Affiliate Assistant Professor at the University of Washington. His work is on facilitating the communication between humans and machine learning models, which includes interpretability, trust, debugging, feedback and robustness. He received his PhD from the University of Washington.