Instance-based Natural Language Generation

Sebastian Varges
Computational Semantics Lab, CSLI, Stanford University

Abstract

In recent years, ranking approaches to Natural Language Generation have become increasingly popular. They abandon the idea of generation as a deterministic decision-making process in favor of approaches that combine overgeneration with ranking at some stage in processing.

In this talk, I will describe the use of instance-based ranking methods for surface realization in Natural Language Generation. Our approach to instance-based Natural Language Generation employs two basic components: a rule system that generates a number of realization candidates from a meaning representation and an instance-based ranker that scores the candidates according to their similarity to examples taken from a training corpus. The instance-based ranker uses information retrieval methods to rank output candidates.

The approach is corpus-based in that it uses a treebank (a subset of the Penn Treebank II containing management succession texts) in combination with manual semantic markup to automatically produce a generation grammar. Furthermore, the corpus is also used by the instance-based ranker.

I will present an efficient search technique for identifying the optimal candidate based on the A*-algorithm and detail the annotation scheme and grammar construction algorithm. Furthermore, I will examine the output of the generator and discuss some issues that are relevant to surface generation in general, for example input coverage and the trade-off between fluency and faithfulness. I will conclude with a discussion on how the presented methods have been applied to two other tasks: referring expression generation, i.e. content determination, and realization in a dialogue system.