This is an interface for adding annotations to a partially annotated
Annotation. In some ways, it is just a glorified function, except
that it explicitly operates in-place on Annotation objects. Annotators
should be given to an AnnotationPipeline in order to make
annotation pipelines (the whole motivation of this package), and
therefore implementers of this interface should be designed to play
well with other Annotators and in their javadocs they should
explicitly state what annotations they are assuming already exist
in the annotation (like parse, POS tag, etc), what keys they are
expecting them under (see, for instance, the ones in CoreAnnotations),
and what annotations they will add (or modify) and the keys
for them as well. If you would like to look at the code for a
relatively simple Annotator, I recommend NERAnnotator. For a lot
of code you could just add the implements directly, but I recommend
wrapping instead because I believe that it will help to keep the
pipeline code more manageable.
An Annotator should also provide a description of what it produces and
a description of what it requires to have been produced by using Sets
The StanfordCoreNLP version of the AnnotationPipeline can
enforce requirements, throwing an exception if an annotator does
not have all of its prerequisites met. An Annotator which does not
participate in this system can simply return Collections.emptySet()
for both requires() and requirementsSatisfied().
We extensively use Properties objects to configure each Annotator.
In particular, CoreNLP has most of its properties in an informal
namespace with properties names like "parse.maxlen" to specify that
a property only applies to a parser annotator. There can also be
global properties; they should not have any periods in their names.
Each Annotator knows its own name; we assume these don't collide badly,
though possibly two parsers could share the "parse.*" namespace.
An Annotator should have a constructor that simply takes a Properties
object. At this point, the Annotator should expect to be getting
properties in namespaces. The classes that annotators call (like
a concrete parser, tagger, or whatever) mainly expect properties
not in namespaces. In general the annotator should subset the
passed in properties to keep only global properties and ones in
its own namespace, and then strip the namespace prefix from the
A mapping from an annotator to a its default transitive dependencies.
Note that this is not guaranteed to be accurate, as properties set in the annotator
can change the annotator's dependencies; but, it's a reasonable guess if you're using