Humans and machines are prone to picking up on frequent superficial patterns which lead to skewed generalizations. I'll present two case studies that show how implicit biases in our minds and in our data may lead to undesired outcomes. In the first part of my talk, I'll present a framework to examine how people are portrayed in media articles. Our analysis of media coverage of the #MeToo movement addresses questions like “Whom does the media portray as sympathetic?” and “Whom does the media portray as powerful?” We demonstrate that although this movement has empowered women by encouraging them to share their stories, this empowerment does not necessarily translate into online media coverage of events. In the second part, I'll focus on text classification. Off-the-shelf text classifiers are biased towards learning frequent spurious correlations in the training data that may be confounds in the actual classification task. A major challenge in fixing such systems is to discover features that are not just correlated with the signals in the training data, but are true indicators of these signals, and therefore generalize well. I'll present a methodology to adversarially disentangle true indicators from latent confounds. I'll conclude with directions for future work.
Yulia Tsvetkov is an assistant professor in the Language Technologies Institute at Carnegie Mellon University. Her group's research projects currently focus on language generation, multilinguality, and NLP for social good. Prior to joining LTI, Yulia was a postdoc in the department of Computer Science at Stanford University; she received her PhD from CMU.