Translation Alignment using Suffix Trees

Martin Kay
Stanford University

Abstract

For several purposes, it is interesting to be able to align parts of a text with parts of its translation in another language. When the parts are at least as large as sentences, this turns out to be quite easy. Below the sentence level, the relevant parts are generally taken to be words or short phrases and aligning them is fairly difficult. When word boundaries are unreliable, as in German, or nonexistent, as in Japanese, and when phrases are not always grammatical constituents, there are no generally excepted approaches. I will suggest a new one, based on an elegant and underappreciated data structure called a suffix tree, which has many applications beyond this one.