Determining when a machine learning model is “good enough” is challenging since held-out accuracy metrics significantly overestimate real-world performance. In this talk, I will describe automated techniques to detect bugs that can occur naturally when a model is deployed. I will start by identifying “semantically equivalent” replacement rules for a model that should not change the meaning of the input but lead to a change in the model’s predictions. Then I will present our work on evaluating the consistency behavior of the model by exploring performance on new instances that are implied by the model’s predictions. I will also describe a method to understand and debug models by adversarially modifying the training data to change the model’s predictions. The talk will include applications of these ideas on a number of NLP tasks, such as reading comprehension, visual QA, and knowledge graph completion.
Sameer Singh is an Assistant Professor of Computer Science at the University of California, Irvine. He is working on large-scale and interpretable machine learning applied to natural language processing and information extraction. Sameer was a postdoctoral researcher at the University of Washington and received his PhD from the University of Massachusetts, Amherst, during which he also worked at Microsoft Research, Google Research, and Yahoo! Labs. His group has received funding from Allen Institute for AI, NSF, DARPA, Adobe Research, and FICO, and was selected as a DARPA Riser in 2015. Sameer has published extensively at top-tier machine learning and natural language processing conferences. (http://sameersingh.org/)