Partially Observable Markov Decision Processes for Spoken Dialogue Systems

Steve Young
University of Cambridge

Abstract

Current spoken dialogue systems (SDS) typically employ hand-crafted decision networks or flow-charts to determine what action to take at each point in a conversation. The result is a system which is fragile to speech recognition errors and which is unable to adapt and learn from experience. There are two key features needed to build robust and adaptable spoken dialogue systems. Firstly, the system must have an explicit mechanism for modelling uncertainty and, secondly, the system must have an objective measure of dialogue success which can be used as the basis of policy optimisation. The framework of Partially Observable Markov Decision Processes (POMDPs) provides both of these. The talk will begin with a simple example to illustrate the underlying principles and potential advantage of the POMDP approach. POMDPs provide a Bayesian model of belief and a principled mathematical framework for modelling uncertainty. They can be trained from real data and they yield policies which can be optimised using reinforcement learning. However, exact belief update and policy optimisation algorithms are intractable and as a result there are many issues inherent in scaling POMDP-based systems to handle real-world tasks. Therefore, the main part of the talk will focus on some of the recent work conducted in the Dialogue Systems Group at Cambridge to develop ways of scaling POMDPs to allow them to be used in practical real-world dialogue systems. Two specific systems will be outlined and performance results from user trials will be presented. The talk will conclude by summarising the main issues which need further work.