The Stanford NLP Group makes parts of our Natural Language Processing software available to everyone. These are variously statistical NLP, deep learning NLP, and rule-based NLP tools for major computational linguistics problems, which can be incorporated into applications with human language technology needs. These packages are widely used in industry, academia, and government.

All our main supported software distributions are written in Java. The latest versions of our software from October 2014 forward require Java 8+. (Versions from March 2013 to September 2014 required Java 1.6+; versions from 2005 on required Java 1.5+.) Distribution packages include components for command-line invocation, jar files, a Java API, and source code. A number of helpful people have extended our work, with bindings or translations for other languages. As a result, much of this software can also easily be used from Python (or Jython), Ruby, Perl, Javascript, F#, and other .NET languages.

Supported software distributions

This code is being developed, and we try to answer questions and fix bugs on a best-effort basis.

All these software distributions are open source, licensed under the GNU General Public License (v3 or later for Stanford CoreNLP; v2 or later for the other releases). Note that this is the full GPL, which allows many free uses, but does not allow its incorporation (even in part or in translation) into any type of proprietary software which you distribute. Commercial licensing is also available; please contact us if you are interested. Bug fixes and code contributions are very welcome; see the contributing page on our GitHub site.

Stanford CoreNLP
An integrated suite of natural language processing tools for English, Spanish, and (mainland) Chinese in Java, including tokenization, part-of-speech tagging, named entity recognition, parsing, and coreference. See also: Stanford Deterministic Coreference Resolution, the online CoreNLP demo, and the CoreNLP FAQ.
Stanford Parser
Implementations of probabilistic natural language parsers in Java: highly optimized PCFG and dependency parsers, a lexicalized PCFG parser, a neural-network dependency parser, and a deep learning reranker. See also: Online parser demo, the Stanford Dependencies page, neural-network dependency parser documentation, and Parser FAQ.
Stanford POS Tagger
A maximum-entropy (CMM) part-of-speech (POS) tagger for English, Arabic, Chinese, French, German, and Spanish, in Java.
Stanford Named Entity Recognizer
A Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition in English, Chinese, German, and Spanish. Online NER demo
Stanford Word Segmenter
A CRF-based word segmenter in Java. Supports Arabic and Chinese.
Stanford Classifier
A machine learning classifier, with good feature templates for text categorization. Provides Naive Bayes and a conditional loglinear classifier (a.k.a., a maximum entropy or multiclass logistic regression model).
Tregex, Tsurgeon, and Semgrex
Tools for matching patterns in linguistic trees (following the tgrep/tgrep2 tradition) and a tree-transformation utility built on top of this matching language. Also, a similar utility for matching patterns in dependency graphs.
Phrasal
A state-of-the-art phrase-based machine translation system.
Stanford EnglishTokenizer
A fast tokenizer for English text (producing Penn Treebank tokenization, roughly)
Stanford TokensRegex
A tool for matching regular expressions over tokens.
Stanford Temporal Tagger (SUTime)
A rule-based temporal tagger for English text. Online SUTime demo,
Stanford Pattern-based Information Extraction and Diagnostics (SPIED)
A boostrapped pattern-based entity extraction system.
Stanford Relation Extractor
A tool for extracting relations between entities.

Other open source software distributions

GloVe: Global Vectors for Word Representations
Software in C for learning state-of-the-art distributed word representations, and a number of sets of pre-trained word vectors.
Topic Modeling Toolbox (TMT)
A suite of topic modeling tools for social scientists and others who wish to perform analysis on datasets that have a substantial textual component. Unfortunately, this software is no longer developed or supported.
Stanford Biomedical Event Parser (SBEP)
Biomedical Event Extraction for the BioNLP 2009/2011 shared task.

Binary software distributions

These systems are not available as source code, but only as compiled Java byte-code and libraries.

Entailment-based MT evaluation software
Software to predict the adequacy of MT system output. The scoring is based in assessing the quality of entailment between the system output and the reference translation.

End-of-life distributions

This is software that we at one point distributed. But we feel either that we are unable to or it isn't useful to maintain it any more. It's still here in case it's useful, but we won't answer questions about it.

FrameNet Reader software
Support files for reading FrameNet XML files (as they existed in 2002-03 - FrameNet version 0.75/1.0) into Java data structures.
Simple manual annotation tool
A simple tool for annotating spans of text with classes suitable for supervised training of named entity recognition and information extraction models. Works on plain text and HTML documents. Click to download stanford-manual-annotation-tool-2004-05-16.tar.gz.

Questions / Support

In addition to the documentation for each package, there is also a general FAQ.

If your question is still not answered, the best way to discuss other topics with Stanford NLP developers and users is by joining the java-nlp-user mailing list (via a webpage). You can send licensing questions, or other questions and feedback to java-nlp-support@lists.stanford.edu. See the email contact guide.