Interface Summary
Interface	Description
AnnotationCreator	Creates a annotation from an input source
Annotator	This is an interface for adding annotations to a partially annotated Annotation.
CoreNLPProtos.CorefChain.CorefMentionOrBuilder
CoreNLPProtos.CorefChainOrBuilder
CoreNLPProtos.DependencyGraph.EdgeOrBuilder
CoreNLPProtos.DependencyGraph.NodeOrBuilder
CoreNLPProtos.DependencyGraphOrBuilder
CoreNLPProtos.DocumentOrBuilder
CoreNLPProtos.EntityOrBuilder
CoreNLPProtos.IndexedWordOrBuilder
CoreNLPProtos.MapIntStringOrBuilder
CoreNLPProtos.MapStringStringOrBuilder
CoreNLPProtos.MentionOrBuilder
CoreNLPProtos.NERMentionOrBuilder
CoreNLPProtos.OpenIETripleOrBuilder
CoreNLPProtos.OperatorOrBuilder
CoreNLPProtos.ParseTreeOrBuilder
CoreNLPProtos.PolarityOrBuilder
CoreNLPProtos.QuoteOrBuilder
CoreNLPProtos.RelationOrBuilder
CoreNLPProtos.SentenceFragmentOrBuilder
CoreNLPProtos.SentenceOrBuilder
CoreNLPProtos.SpanOrBuilder
CoreNLPProtos.SpeakerInfoOrBuilder
CoreNLPProtos.TimexOrBuilder
CoreNLPProtos.TokenOrBuilder
JSONOutputter.Writer	A tiny little functional interface for writing a (key, value) pair.

Class Summary
Class	Description
AbstractTextAnnotationCreator	Creates a stub implementation for creating annotation from various input sources using String as the main input source
Annotation	An annotation representing a span of text in a document.
AnnotationOutputter	An interface for outputting CoreNLP Annotations to different output formats.
AnnotationOutputter.Options
AnnotationPipeline	This class is designed to apply multiple Annotators to an Annotation.
AnnotationSerializer
AnnotationSerializer.IntermediateEdge
AnnotationSerializer.IntermediateNode
AnnotationSerializer.IntermediateSemanticGraph
Annotator.Requirement	The Requirement is a general way of describing the pre and post conditions of an Annotator running.
AnnotatorFactories	A companion to `AnnotatorFactory` defining the common annotators.
AnnotatorFactory	A Factory for creating a certain type of Annotator.
AnnotatorImplementations	A class abstracting the implementation of various annotators.
AnnotatorPool	An object for keeping track of Annotators.
BinarizerAnnotator	This annotator takes unbinarized trees (from the parser annotator or elsewhere) and binarizes them in the attachment.
CharniakParserAnnotator	This class will add parse information to an Annotation from the BLLIP parser.
ChineseSegmenterAnnotator	This class will add segmentation information to an Annotation.
ChunkAnnotationUtils	Utility functions for annotating chunks
CleanXmlAnnotator	An annotator which removes all XML tags (as identified by the tokenizer) and possibly selectively keeps the text between them.
ColumnDataClassifierAnnotator	Created by joberant on 9/8/14.
CoNLLOutputter	Write a subset of our CoreNLP output in CoNLL format.
CoNLLUOutputter	Write a subset of our CoreNLP output in CoNLL-U format.
CorefAnnotator	This class adds coref information to an Annotation.
CoreMapAggregator	Function that aggregates several core maps into one
CoreMapAttributeAggregator	Functions for aggregating token attributes.
CoreMapAttributeAggregator.ConcatAggregator
CoreMapAttributeAggregator.ConcatCoreMapListAggregator<T extends CoreMap>
CoreMapAttributeAggregator.ConcatListAggregator<T>
CoreMapAttributeAggregator.ConcatTextAggregator
CoreMapAttributeAggregator.MostFreqAggregator
CoreNLPProtos
CoreNLPProtos.CorefChain	Protobuf type `edu.stanford.nlp.pipeline.CorefChain`
CoreNLPProtos.CorefChain.Builder	Protobuf type `edu.stanford.nlp.pipeline.CorefChain`
CoreNLPProtos.CorefChain.CorefMention	Protobuf type `edu.stanford.nlp.pipeline.CorefChain.CorefMention`
CoreNLPProtos.CorefChain.CorefMention.Builder	Protobuf type `edu.stanford.nlp.pipeline.CorefChain.CorefMention`
CoreNLPProtos.DependencyGraph	Protobuf type `edu.stanford.nlp.pipeline.DependencyGraph`
CoreNLPProtos.DependencyGraph.Builder	Protobuf type `edu.stanford.nlp.pipeline.DependencyGraph`
CoreNLPProtos.DependencyGraph.Edge	Protobuf type `edu.stanford.nlp.pipeline.DependencyGraph.Edge`
CoreNLPProtos.DependencyGraph.Edge.Builder	Protobuf type `edu.stanford.nlp.pipeline.DependencyGraph.Edge`
CoreNLPProtos.DependencyGraph.Node	Protobuf type `edu.stanford.nlp.pipeline.DependencyGraph.Node`
CoreNLPProtos.DependencyGraph.Node.Builder	Protobuf type `edu.stanford.nlp.pipeline.DependencyGraph.Node`
CoreNLPProtos.Document	Protobuf type `edu.stanford.nlp.pipeline.Document`
CoreNLPProtos.Document.Builder	Protobuf type `edu.stanford.nlp.pipeline.Document`
CoreNLPProtos.Entity	Protobuf type `edu.stanford.nlp.pipeline.Entity`
CoreNLPProtos.Entity.Builder	Protobuf type `edu.stanford.nlp.pipeline.Entity`
CoreNLPProtos.IndexedWord	Protobuf type `edu.stanford.nlp.pipeline.IndexedWord`
CoreNLPProtos.IndexedWord.Builder	Protobuf type `edu.stanford.nlp.pipeline.IndexedWord`
CoreNLPProtos.MapIntString	Protobuf type `edu.stanford.nlp.pipeline.MapIntString`
CoreNLPProtos.MapIntString.Builder	Protobuf type `edu.stanford.nlp.pipeline.MapIntString`
CoreNLPProtos.MapStringString	Protobuf type `edu.stanford.nlp.pipeline.MapStringString`
CoreNLPProtos.MapStringString.Builder	Protobuf type `edu.stanford.nlp.pipeline.MapStringString`
CoreNLPProtos.Mention	Protobuf type `edu.stanford.nlp.pipeline.Mention`
CoreNLPProtos.Mention.Builder	Protobuf type `edu.stanford.nlp.pipeline.Mention`
CoreNLPProtos.NERMention	Protobuf type `edu.stanford.nlp.pipeline.NERMention`
CoreNLPProtos.NERMention.Builder	Protobuf type `edu.stanford.nlp.pipeline.NERMention`
CoreNLPProtos.OpenIETriple	Protobuf type `edu.stanford.nlp.pipeline.OpenIETriple`
CoreNLPProtos.OpenIETriple.Builder	Protobuf type `edu.stanford.nlp.pipeline.OpenIETriple`
CoreNLPProtos.Operator	Protobuf type `edu.stanford.nlp.pipeline.Operator`
CoreNLPProtos.Operator.Builder	Protobuf type `edu.stanford.nlp.pipeline.Operator`
CoreNLPProtos.ParseTree	Protobuf type `edu.stanford.nlp.pipeline.ParseTree`
CoreNLPProtos.ParseTree.Builder	Protobuf type `edu.stanford.nlp.pipeline.ParseTree`
CoreNLPProtos.Polarity	Protobuf type `edu.stanford.nlp.pipeline.Polarity`
CoreNLPProtos.Polarity.Builder	Protobuf type `edu.stanford.nlp.pipeline.Polarity`
CoreNLPProtos.Quote	Protobuf type `edu.stanford.nlp.pipeline.Quote`
CoreNLPProtos.Quote.Builder	Protobuf type `edu.stanford.nlp.pipeline.Quote`
CoreNLPProtos.Relation	Protobuf type `edu.stanford.nlp.pipeline.Relation`
CoreNLPProtos.Relation.Builder	Protobuf type `edu.stanford.nlp.pipeline.Relation`
CoreNLPProtos.Sentence	Protobuf type `edu.stanford.nlp.pipeline.Sentence`
CoreNLPProtos.Sentence.Builder	Protobuf type `edu.stanford.nlp.pipeline.Sentence`
CoreNLPProtos.SentenceFragment	Protobuf type `edu.stanford.nlp.pipeline.SentenceFragment`
CoreNLPProtos.SentenceFragment.Builder	Protobuf type `edu.stanford.nlp.pipeline.SentenceFragment`
CoreNLPProtos.Span	Protobuf type `edu.stanford.nlp.pipeline.Span`
CoreNLPProtos.Span.Builder	Protobuf type `edu.stanford.nlp.pipeline.Span`
CoreNLPProtos.SpeakerInfo	Protobuf type `edu.stanford.nlp.pipeline.SpeakerInfo`
CoreNLPProtos.SpeakerInfo.Builder	Protobuf type `edu.stanford.nlp.pipeline.SpeakerInfo`
CoreNLPProtos.Timex	Protobuf type `edu.stanford.nlp.pipeline.Timex`
CoreNLPProtos.Timex.Builder	Protobuf type `edu.stanford.nlp.pipeline.Timex`
CoreNLPProtos.Token	Protobuf type `edu.stanford.nlp.pipeline.Token`
CoreNLPProtos.Token.Builder	Protobuf type `edu.stanford.nlp.pipeline.Token`
CustomAnnotationSerializer	Serializes Annotation objects using our own format.
DefaultPaths	Default model paths for StanfordCoreNLP All these paths point to files distributed with the model jar file (stanford-corenlp-models-*.jar)
DependencyParseAnnotator	This class adds dependency parse information to an Annotation.
DeterministicCorefAnnotator	Implements the Annotator for the new deterministic coreference resolution system.
EntityMentionsAnnotator	Annotator that marks entity mentions in a document.
GenderAnnotator	This class adds gender information (MALE / FEMALE) to tokens as GenderAnnotations.
GenericAnnotationSerializer	Serializes Annotation objects using the default Java serializer
JSONOutputter	Output an Annotation to human readable JSON.
JSONOutputter.JSONWriter	Our very own little JSON writing class.
LabeledChunkIdentifier	Identifies chunks based on labels that uses IOB like encoding Assumes labels have the form - where the tag is a prefix indicating where in the chunk it is.
LabeledChunkIdentifier.LabelTagType	Class representing a label, tag and type
MentionAnnotator	This class adds mention information to an Annotation.
MorphaAnnotator	This class will add the lemmas of all the words to the Annotation.
NERCombinerAnnotator	This class will add NER information to an Annotation using a combination of NER models.
ParserAnnotator	This class will add parse information to an Annotation.
ParserAnnotatorUtils
POSTaggerAnnotator	Wrapper for the maxent part of speech tagger.
ProtobufAnnotationSerializer	A serializer using Google's protocol buffer format.
QuoteAnnotator	An annotator which picks quotations out of the given text.
RegexNERAnnotator	This class adds NER information to an annotation using the RegexNERSequenceClassifier.
RelationExtractorAnnotator	Annotating relations between entities produced by the NER system.
Requirement	Stores and describes a set of requirements for the typical use of the pipeline.
SentenceAnnotator	A parent class for annotators which might want to analyze one sentence at a time, possibly in a multithreaded manner.
SentimentAnnotator	This annotator attaches a binarized tree with sentiment annotations to each sentence.
StanfordCoreNLP	This is a pipeline that takes in a string and returns various analyzed linguistic forms.
StanfordCoreNLPClient	An annotation pipeline in spirit identical to `StanfordCoreNLP`, but with the backend supported by a web server.
StanfordCoreNLPServer	This class creates a server that runs a new Java annotator in each thread.
StanfordCoreNLPServer.FileHandler	Serve a file from the filesystem or classpath
StanfordCoreNLPServer.PingHandler	A simple ping test.
TextAnnotationCreator	Creates an annotation from text
TextOutputter
TokenizerAnnotator	This class will PTB tokenize the input.
TokensRegexAnnotator	Uses TokensRegex patterns to annotate tokens.
TokensRegexAnnotator.Options
TokensRegexNERAnnotator	TokensRegexNERAnnotator labels tokens with types based on a simple manual mapping from regular expressions to the types of the entities they are meant to describe.
TrueCaseAnnotator
UDFeatureAnnotator	Extracts universal dependencies features from a tree
WordsToSentencesAnnotator	This class assumes that there is a `List<CoreLabel>` under the `TokensAnnotation` field, and runs it through `WordToSentenceProcessor` and puts the new `List<Annotation>` under the `SentencesAnnotation` field.
XMLOutputter	An outputter to XML format.

Enum Summary
Enum	Description
CoreNLPProtos.Language	Protobuf enum `edu.stanford.nlp.pipeline.Language`
CoreNLPProtos.NaturalLogicRelation	Protobuf enum `edu.stanford.nlp.pipeline.NaturalLogicRelation`
CoreNLPProtos.Sentiment	Protobuf enum `edu.stanford.nlp.pipeline.Sentiment`
TokenizerAnnotator.TokenizerType	Enum to identify the different TokenizerTypes.

Exception Summary
Exception Description

ProtobufAnnotationSerializer.LossySerializationException
An exception to denote that the serialization would be lossy.

Exception Summary
Exception	Description
ProtobufAnnotationSerializer.LossySerializationException	An exception to denote that the serialization would be lossy.

Package edu.stanford.nlp.pipeline Description

Linguistic Annotation Pipeline

The point of this package is to enable people to quickly and painlessly get complete linguistic annotations of their text. It is designed to be highly flexible and extensible. I will first discuss the organization and functions of the classes, and then I will give some sample code and a run-down of the implemented Annotators.

Annotation

An Annotation is the data structure which holds the results of annotators. An Annotations is basically a map, from keys to bits of annotation, such as the parse, the part-of-speech tags, or named entity tags. Annotations are designed to operate at the sentence-level, however depending on the Annotators you use this may not be how you choose to use the package.

Annotators

The backbone of this package are the Annotators. Annotators are a lot like functions, except that they operate over Annotations instead of Objects. They do things like tokenize, parse, or NER tag sentences. In the javadocs of your Annotator you should specify what the Annotator is assuming already exists (for instance, the NERAnnotator assumes that the sentence has been tokenized) and where to find these annotations (in the example from the previous set of parentheses, it would be TextAnnotation.class). They should also specify what they add to the annotation, and where.

AnnotationPipeline

An AnnotationPipeline is where many Annotators are strung together to form a linguistic annotation pipeline. It is, itself, an Annotator. AnnotationPipelines usually also keep track of how much time they spend annotating and loading to assist users in finding where the time sinks are. However, the class AnnotationPipeline is not meant to be used as is. It serves as an example on how to build your own pipeline. If you just want to use a typical NLP pipeline take a look at StanfordCoreNLP (described later in this document).

Sample Usage

Here is some sample code which illustrates the intended usage of the package:

 public void testPipeline(String text) throws Exception {
 // create pipeline
 AnnotationPipeline pipeline = new AnnotationPipeline();
 pipeline.addAnnotator(new TokenizerAnnotator(false, "en"));
 pipeline.addAnnotator(new WordsToSentencesAnnotator(false));
 pipeline.addAnnotator(new POSTaggerAnnotator(false));
 pipeline.addAnnotator(new MorphaAnnotator(false));
 pipeline.addAnnotator(new NERCombinerAnnotator(false));
 pipeline.addAnnotator(new ParserAnnotator(false, -1));
 // create annotation with text
 Annotation document = new Annotation(text);
 // annotate text with pipeline
 pipeline.annotate(document);
 // demonstrate typical usage
 for (CoreMap sentence: document.get(CoreAnnotations.SentencesAnnotation.class)) {
 // get the tree for the sentence
 Tree tree = sentence.get(TreeAnnotation.class);
 // get the tokens for the sentence and iterate over them
 for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {
 // get token attributes
 String tokenText = token.get(TextAnnotation.class);
 String tokenPOS = token.get(PartOfSpeechAnnotation.class);
 String tokenLemma = token.get(LemmaAnnotation.class);
 String tokenNE = token.get(NamedEntityTagAnnotation.class);
 }
 }
 }

Existing Annotators

There already exist Annotators for many common tasks, all of which include default model locations, so they can just be used off the shelf. They are:

TokenizerAnnotator - tokenizes the text based on language or Tokenizer class specifications
WordsToSentencesAnnotator - splits a sequence of words into a sequence of sentences
POSTaggerAnnotator - annotates the text with part-of-speech tags
MorphaAnnotator - morphological normalizer (generates lemmas)
NERClassifierCombiner - combines several NER models
TrueCaseAnnotator - detects the true case of words in free text (useful for all upper or lower case text)
ParserAnnotator - generates constituent and dependency trees
NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates
TimeWordAnnotator - recognizes common temporal expressions, such as "teatime"
QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities
DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model
NFLAnnotator - implements entity and relation mention extraction for the NFL domain

How Do I Use This?

You do not have to construct your pipeline from scratch! For the typical NL processors, use StanfordCoreNLP. This pipeline implements the most common functionality needed: tokenization, lemmatization, POS tagging, NER, parsing and coreference resolution. Read below for how to use this pipeline from the command line, or directly in your Java code.

Using StanfordCoreNLP from the Command Line

The command line for StanfordCoreNLP is:

 ./bin/stanfordcorenlp.sh

 java -cp stanford-corenlp-YYYY-MM-DD.jar:stanford-corenlp-YYYY-MM-DD-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP [ -props YOUR_CONFIGURATION_FILE ] -file YOUR_INPUT_FILE

where the following properties are defined: (if -props or annotators is not defined, default properties will be loaded via the classpath)

        "annotators" - comma separated list of annotators
                The following annotators are supported: tokenize, ssplit, pos, lemma, ner, truecase, parse, dcoref, nfl

More information is available here: Stanford CoreNLP

The StanfordCoreNLP API

More information is available here: Stanford CoreNLP

Author:: Jenny Finkel, Mihai Surdeanu, Steven Bethard, David McClosky Last modified: May 7, 2012