The Stanford Natural Language Processing Group

This talk is part of the NLP Seminar Series.

Rethinking Benchmarking in AI

Douwe Kiela, Facebook AI Research
Date: 10:00am - 11:00am PT, Nov 5 2020
Venue: Zoom (link hidden)

Abstract

The current benchmarking paradigm in AI has many issues: benchmarks saturate quickly, are susceptible to overfitting, contain exploitable annotator artifacts, have unclear or imperfect evaluation metrics, and do not measure what we really care about. I will talk about my work in trying to rethink the way we do benchmarking in AI, specifically in natural language processing, covering the Adversarial NLI and Hateful Memes datasets, as well as the recently launched Dynabench platform.

Bio

Douwe Kiela is a Research Scientist at Facebook AI Research, working on natural language processing and multimodal reasoning and understanding. His work has mainly been focused on representation learning, grounded language learning and multi-agent communication. Recently, he has become interested in improving the way we evaluate AI systems.