Today’s neural NLP models achieve high accuracy on in-distribution data and are being widely deployed in production systems. This talk will discuss attacks on such models that not only expose worrisome security and privacy vulnerabilities, but also provide new perspectives into how and why the models work. Concretely, I will show how realistic adversaries can extract secret training data, steal model weights, and manipulate test predictions, all using black-box access to models at either training- or test-time. These attacks will reveal different insights, including how NLP models rely on dataset biases and spurious correlations, and how their training dynamics impact memorization of examples. Finally, I will discuss defenses against these vulnerabilities and suggest practical takeaways for developing secure NLP systems.
Eric Wallace is a 2nd year PhD student at UC Berkeley advised by Dan Klein and Dawn Song. His research interests are in making NLP models more secure, private, and robust. Eric's work received the best demo award at EMNLP 2019.