Much of human knowledge is available on the internet, however natural language is notoriously difficult for computers to interpret. In this talk I will present some recent work on extracting structured knowledge from text with an eye toward realtime information found on social media. I argue that we can not exclusively rely on traditional methods that learn from small, hand-annotated datasets if we hope to extract a broad range of relations and events from diverse text genres at scale. As an alternative to human labeling, I will describe an approach that reasons about latent variables to learn robust information extraction models from large, opportunistically gathered datasets. I will further show how it is possible to leverage edits to a knowledge base as distant supervision for learning to extract events. This approach can be used to accurately recommend edits to Wikipedia before new facts have been added by human editors. Finally, I will present an approach to open-domain forecasting of uncertain events (elections, awards, etc…) by extracting and aggregating users’ predictions on the web.
Alan Ritter is an assistant professor in computer science at Ohio State University. His research interests include natural language processing, social media analysis, and machine learning. Alan completed his PhD in Computer Science at the University of Washington and was a postdoctoral fellow in the Machine Learning Department at Carnegie Mellon. He received an NDSEG fellowship, a best student paper award at IUI, an NSF CRII, and has served as an area chair for ACL, EMNLP and NAACL.