Information retrieval has been a dominant force in the last 20 years of computing. Within the domain of natural language processing (NLP), tasks such as summarization and question answering have been framed as requiring deep understanding of information, but historically most approaches for these tasks have relied heavily on retrieval, surfacing information that already exists in a corpus. In this talk, I argue that the field of NLP is seeing a shift towards information synthesis: methods that can combine existing pieces of information to produce new conclusions. Systems built with such methods promise to produce greater insights for their users than pure retrievers, but there are many challenges which still need to be addressed. I will first discuss our work on models that can take combinations of premise statements and deduce conclusions from them to construct natural language “proofs” of hypotheses, paving the way for explainable textual reasoning. I will then describe some shortcomings of doing this kind of reasoning with large language models and suggest how explanations can help calibrate the inferences they make. Finally, I will discuss the recent impact of GPT-3 on text summarization, showing how the incredible new synthesis capabilities of these models will need to be fleshed out and benchmarked in the coming years.
Greg Durrett is an assistant professor of Computer Science at UT Austin. His current research focuses on making natural language processing systems more interpretable, controllable, and generalizable, spanning application domains including question answering, textual reasoning, summarization, and information extraction. His work is funded by a 2022 NSF CAREER award and other grants from agencies including the NSF, Open Philanthropy, DARPA, Salesforce, and Amazon. He completed his Ph.D. at UC Berkeley in 2016, where he was advised by Dan Klein, and from 2016-2017, he was a research scientist at Semantic Machines.