The
Hypergraph Parsing
Project Page



Overview

Parsing tasks can be seen as analysis of an associated hypergraph. Reachability corresponds to parsability and shortest paths correspond to best parses. This view allows one to adapt graph algorithms for use in parsing. In particular, Dijkstra's algorithm can be generalized to work on hypergraphs, and gives rise to a probabilistic chart parser. A-star search can also be generalized to this setting. The A-star parser gives a way of reducing the work needed to find a best parse which does not sacrifice the correctness or time bounds of the parser, and which gives speed-ups comparable to best-first parsing (which neither guarantees correctness nor preserves the worst-case cubic bounds). Additionally, these algorithms work on fully general PCFGs and can be shown to be correct over a wide range of parsing control strategies and grammar encodings.

We have used these parsers to investigate properties of wide-coverage PCFG grammars. With increased coverage generally comes increased ambiguitiy. For a grammar like the Penn treebank PCFG grammar, essentially any sequence can derive any category. Whether a rule can apply tends to depend on the teminal symbols which anchor that tag. Models based on this observation closely predict parser load. In particular, they explain the somewhat counterintuitive fact that the time to parse a sentence grows more rapidly with more efficient (faster) rule encodings and tree transformations.

Publications



Contact Information

Comments about the project page? Feel free to email Dan.