Probabilistic Part of Speech Tagging
Part-of-speech tagging is assigning the correct part of speech (noun, verb, etc.) to words. We have worked on building probabilistic conditional log-linear models for tagging. Our best-performing part of speech tagger for English uses both preceding and following tag context, and many lexical features. Its accuracy is state of the art for tagging Penn Treebank. A model for Chinese has also been developed on the basis of the Toutanova et al. (2003) work.
A java implementation of The Stanford Tagger is available online.