Humans reasons about social dynamics when navigating everyday situations. Due to limited expressivity of existing NLP approaches, reasoning about the biased and harmful social dynamics in language remains a challenge, and can backfire against certain populations.
In the first part of the talk, I will analyze a failure case of NLP systems, namely, racial bias in automatic hate speech detection. We uncover severe racial skews in training corpora, and show that models trained on hate speech corpora acquire and propagate these racial biases. This results in tweets by self-identified African Americans being up to two times more likely to be labelled as offensive compared to others. We propose ways to reduce these biases, by making a tweet's dialect more explicit during the annotation process.
Then, I will introduce Social Bias Frames, a conceptual formalism that models the pragmatic frames in which people project social biases and stereotypes onto others to reason about biased or harmful implications in language. Using a new corpus of 150k structured annotations, we show that models can learn to reason about high-level offensiveness of statements, but struggle to explain why a statement might be harmful. I will conclude with future directions for better reasoning about biased social dynamics.
Maarten Sap is a 5th year PhD student at the University of Washington advised by Noah Smith and Yejin Choi. He is interested in natural language processing (NLP) for social understanding; specifically in understanding how NLP can help us understand human behavior, and how we can endow NLP systems with social intelligence, social commonsense, or theory of mind. He's interned on project Mosaic at AI2, working on social commonsense for artificial intelligence systems, and at Microsoft Research working on long-term memory and storytelling with Eric Horvitz.