Stanford Log-linear Part-Of-Speech Tagger download

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. This software is a Java implementation of the log-linear part-of-speech (POS) taggers described in:

Kristina Toutanova and Christopher D. Manning. Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), Hong Kong.
Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003 pages 252-259.
The system requires Java 1.5+ to be installed. 120m of memory is requred to run a trained tagger (i.e., you'll need to give to java an option like java -mx120m). Plenty of memory is needed to train a tagger. It depends on the complexity of the model but at least 1GB is recommended. Two trained tagger models for English are included. The tagger can be retrained on other languages based on POS-annotated training text.

Part-of-speech name abbreviations: The two included taggers use the Penn Treebank tag set. Here are some links to documentation of the Penn Treebank English POS tag set: 1993 Computational Linguistics article in PDF, AMALGAM page, Aoife Cahill's list.

The tagger is licensed under the GNU GPL. (Note that this is the full GPL - which allows its use for research purposes or other free software projects but does not allow its incorporation into any type of commercial software, even in part or in translation; see GPL FAQ.) Source is included. The package includes components for command-line invocation and a Java API.

The download is a 14 MB gzipped tar file (mainly consisting of included model files). If you unpack the tar file, you should have everything needed. This software only provides a command-line interface and an API. A simple script is included to invoke the tagger on a Unix system. For another system, an appropriate java command can be given, as described in the included README.txt. Please send any questions or feedback, or extensions and bugfixes to: java-nlp-support@lists.stanford.edu.

Download Stanford Tagger version 2006-05-21 (requires JDK 1.5.0 or above)

Download OLD Stanford Tagger version 2006-01-20 (requires JDK 1.5.0 or above)

Download OLD Stanford Tagger version 1.0 (2004-08-16) [works in JDK 1.4; English only]