Language conveys implications that can perpetuate misinformation or toxic behavior. However, large language models often do not detect these nuances. In this talk, I address challenges in machine understanding of false or harmful language. First, I introduce Misinfo Reaction Frames, a structured formalism of over 200k headline/annotated dimension pairs for encoding reader reactions to headlines (e.g. would a reader share an article with a friend?). These predicted reaction frames go beyond binary classification of misinformation and directly model impact. Next, I introduce a system for measuring effectiveness of generation evaluation metrics at detecting factuality errors (Go Figure). I define conditions for an effective factuality metric. I also begin to categorize factuality errors and hallucinations common in text summarization. Finally, I show that large language models not only struggle to detect implied toxicity (as highlighted by prior work like Social Bias Frames) but also perpetuate harmful social biases by generating hate speech. I explore how machine-generated hate speech can act as a defense against bad actors. Through these works, I argue for the importance of pragmatic approaches to machine understanding of false or harmful language. I conclude with next steps for bridging the gap between human and machine understanding of misinformation and toxicity.
Saadia Gabriel is a 5th year PhD student at the University of Washington, where she is advised by Prof. Yejin Choi. She is particularly excited about natural language generation and social commonsense reasoning, as well as testing robustness of machine learning algorithms. She has previously worked on evaluating factuality in generation, as well as improving fairness and explainability in toxic language detection.