In recent years, ranking approaches to Natural Language Generation have become increasingly popular. They abandon the idea of generation as a deterministic decision-making process in favor of approaches that combine overgeneration with ranking at some stage in processing.
In this talk, I will describe the use of instance-based ranking methods for surface realization in Natural Language Generation. Our approach to instance-based Natural Language Generation employs two basic components: a rule system that generates a number of realization candidates from a meaning representation and an instance-based ranker that scores the candidates according to their similarity to examples taken from a training corpus. The instance-based ranker uses information retrieval methods to rank output candidates.
The approach is corpus-based in that it uses a treebank (a subset of the Penn Treebank II containing management succession texts) in combination with manual semantic markup to automatically produce a generation grammar. Furthermore, the corpus is also used by the instance-based ranker.
I will present an efficient search technique for identifying the optimal candidate based on the A*-algorithm and detail the annotation scheme and grammar construction algorithm. Furthermore, I will examine the output of the generator and discuss some issues that are relevant to surface generation in general, for example input coverage and the trade-off between fluency and faithfulness. I will conclude with a discussion on how the presented methods have been applied to two other tasks: referring expression generation, i.e. content determination, and realization in a dialogue system.