Over the last decade, advances in natural language processing technologies have paved the way for the emergence of complex spoken language interfaces. A persistent and important problem in the development of these systems is their lack of robustness when confronted with understanding-errors. The problem stems mostly from the unreliability of current speech recognition technology, and is present across all domains and interaction types. My research addresses this problem by: (1) endowing spoken language interfaces with better error awareness, (2) constructing and evaluating a rich repertoire of error recovery strategies, and (3) developing data-driven approaches for making error handling decisions.
In this talk, I focus on the first of these problems: error awareness. Traditionally, spoken dialog systems rely on recognition confidence scores and simple heuristics to guard against potential misunderstandings. While confidence scores can provide an initial reliability assessment, ideally a system should leverage information from subsequent user turns in the conversation and continuously update and improve the accuracy of its beliefs.
I describe a scalable machine learning solution for this belief updating problem. The proposed approach relies on a compressed, abstracted concept-level representation of beliefs and casts the belief updating problem as a multinomial regression task. Experimental results indicate that the constructed belief updating models significantly outperform typical heuristic rules used in current systems. Furthermore, a user study with a mixed-initiative spoken dialog system shows that the proposed approach leads to significant improvements in both the effectiveness and the efficiency of the interaction across a wide range of recognition error rates.