See: Description
Interface | Description |
---|---|
AbstractCoreLabel | |
AbstractToken |
An abstract token.
|
CoreAnnotation<V> |
The base class for any annotation that can be marked on a
CoreMap ,
parameterized by the type of the value associated with the annotation. |
CoreLabel.GenericAnnotation<T> |
Class that all "generic" annotations extend.
|
Datum<L,F> |
Interface for Objects which can be described by their features.
|
Document<L,F,T> |
Represents a text document as a list of Words with a String title.
|
Featurizable<F> |
Interface for Objects that can be described by their features.
|
HasCategory |
Something that implements the
HasCategory interface
knows about categories. |
HasContext | |
HasIndex | |
HasLemma |
Something that implements the
HasLemma interface
knows about lemmas. |
HasNER |
This token is able to produce NER tags
|
HasOffset |
Something that implements the
HasOffset interface
carries char offset references to an original text String. |
HasOriginalText |
This token can produce / set original texts
|
HasTag |
Something that implements the
HasTag interface
knows about part-of-speech tags. |
HasWord |
Something that implements the
HasWord interface
knows about words. |
Label |
Something that implements the
Label interface can act as a
constituent, node, or word label with linguistic attributes. |
Labeled<E> |
Interface for Objects that have a label, whose label is an Object.
|
LabelFactory |
A
LabelFactory object acts as a factory for creating
objects of class Label , or some descendant class. |
Class | Description |
---|---|
AnnotationLookup |
Provides a mapping between CoreAnnotation keys, which are classes, and a text String that names them,
which is needed for things like text serializations and the Semgrex query language.
|
BasicDatum<LabelType,FeatureType> |
Basic implementation of Datum interface that can be constructed with a
Collection of features and one more more labels.
|
BasicDocument<L> |
Basic implementation of Document that should be suitable for most needs.
|
CategoryWordTag |
A
CategoryWordTag object acts as a complex Label
which contains a category, a head word, and a tag. |
CategoryWordTagFactory |
A
CategoryWordTagFactory is a factory that makes
a Label which is a CategoryWordTag triplet. |
CoreAnnotations |
Set of common annotations for
CoreMap s. |
CoreAnnotations.AbbrAnnotation | |
CoreAnnotations.AbgeneAnnotation | |
CoreAnnotations.AbstrAnnotation | |
CoreAnnotations.AfterAnnotation |
Annotation for the whitespace characters appear after this word.
|
CoreAnnotations.AnswerAnnotation |
The standard key for the answer which is a String
|
CoreAnnotations.AnswerObjectAnnotation | |
CoreAnnotations.AnswerProbAnnotation |
The matching probability for the AnswerAnnotation
|
CoreAnnotations.AntecedentAnnotation |
The CoreMap key identifying the annotation's antecedent.
|
CoreAnnotations.ArabicCharAnnotation |
for Arabic: character level information, segmentation
|
CoreAnnotations.ArabicSegAnnotation |
For Arabic: the segmentation information from the segmenter.
|
CoreAnnotations.ArgDescendentAnnotation | |
CoreAnnotations.ArgumentAnnotation |
The standard key for a propbank label which is of type Argument
|
CoreAnnotations.AuthorAnnotation |
Author for the document
(really should be a set of authors, but just have single string for simplicity)
|
CoreAnnotations.BagOfWordsAnnotation | |
CoreAnnotations.BeAnnotation |
annotation stolen from the lex parser
|
CoreAnnotations.BeforeAnnotation |
Annotation for the whitespace characters appearing before this word.
|
CoreAnnotations.BeginIndexAnnotation |
This indexes the beginning of a span of words, e.g., a constituent in a
tree.
|
CoreAnnotations.BestCliquesAnnotation |
Used in Task3 Pascal system
|
CoreAnnotations.BestFullAnnotation | |
CoreAnnotations.CalendarAnnotation |
The CoreMap key identifying the date and time associated with an
annotation.
|
CoreAnnotations.CanonicalEntityMentionIndexAnnotation |
Index into the list of entity mentions in a document for canonical entity mention.
|
CoreAnnotations.CategoryAnnotation | |
CoreAnnotations.CategoryFunctionalTagAnnotation |
The standard key for storing category with functional tags.
|
CoreAnnotations.CharacterOffsetBeginAnnotation |
The CoreMap key identifying the offset of the first char of an
annotation.
|
CoreAnnotations.CharacterOffsetEndAnnotation |
The CoreMap key identifying the offset of the last character after the end
of an annotation.
|
CoreAnnotations.CharAnnotation | |
CoreAnnotations.ChineseCharAnnotation |
For Chinese: character level information, segmentation.
|
CoreAnnotations.ChineseIsSegmentedAnnotation |
Not sure exactly what this is, but it is different from
ChineseSegAnnotation and seems to indicate if the text is segmented
|
CoreAnnotations.ChineseOrigSegAnnotation |
For Chinese: the segmentation info existing in the original text.
|
CoreAnnotations.ChineseSegAnnotation |
For Chinese: the segmentation information from the segmenter.
|
CoreAnnotations.ChunkAnnotation | |
CoreAnnotations.CoarseNamedEntityTagAnnotation |
The CoreMap key for getting the coarse named entity tag (i.e.
|
CoreAnnotations.CoarseTagAnnotation |
CoNLL dep parsing - coarser POS tags.
|
CoreAnnotations.CodepointOffsetBeginAnnotation |
Some codepoints count as more than one character.
|
CoreAnnotations.CodepointOffsetEndAnnotation |
Some codepoints count as more than one character.
|
CoreAnnotations.ColumnDataClassifierAnnotation | |
CoreAnnotations.CommonWordsAnnotation | |
CoreAnnotations.CoNLLDepAnnotation |
CoNLL dep parsing - the dependency type, such as SBJ or OBJ.
|
CoreAnnotations.CoNLLDepParentIndexAnnotation |
CoNLL dep parsing - the index of the word which is the parent of this word
in the dependency tree
|
CoreAnnotations.CoNLLDepTypeAnnotation |
CoNLL dep parsing - the dependency type, such as SBJ or OBJ.
|
CoreAnnotations.CoNLLPredicateAnnotation |
CoNLL SRL/dep parsing - whether the word is a predicate
|
CoreAnnotations.CoNLLSRLAnnotation |
CoNLL SRL/dep parsing - map which, for the current word, specifies its
specific role for each predicate
|
CoreAnnotations.CoNLLUFeats |
CoNLL-U dep parsing - List of morphological features
|
CoreAnnotations.CoNLLUMisc |
CoNLL-U dep parsing - Any other annotation
|
CoreAnnotations.CoNLLUSecondaryDepsAnnotation |
CoNLL-U dep parsing - List of secondary dependencies
|
CoreAnnotations.CoNLLUTokenSpanAnnotation |
CoNLL-U dep parsing - span of multiword tokens
|
CoreAnnotations.ContextsAnnotation | |
CoreAnnotations.CorefMentionToEntityMentionMappingAnnotation |
mapping from coref mentions to corresponding ner derived entity mentions
|
CoreAnnotations.CostMagnificationAnnotation |
Key for relative value of a word - used in RTE
|
CoreAnnotations.CovertIDAnnotation | |
CoreAnnotations.D2_LBeginAnnotation | |
CoreAnnotations.D2_LEndAnnotation | |
CoreAnnotations.D2_LMiddleAnnotation | |
CoreAnnotations.DayAnnotation | |
CoreAnnotations.DependentsAnnotation | |
CoreAnnotations.DictAnnotation | |
CoreAnnotations.DistSimAnnotation | |
CoreAnnotations.DoAnnotation |
annotation stolen from the lex parser
|
CoreAnnotations.DocDateAnnotation | |
CoreAnnotations.DocIDAnnotation |
This refers to the unique identifier for a "document", where document may
vary based on your application.
|
CoreAnnotations.DocSourceTypeAnnotation |
Document source type
What kind of place did the document come from: newswire, discussion forum, web...
|
CoreAnnotations.DocTitleAnnotation |
Document title
What is the document title
|
CoreAnnotations.DocTypeAnnotation |
Document type
What kind of document is it: story, multi-part article, listing, email, etc
|
CoreAnnotations.DomainAnnotation |
Used in CRFClassifier stuff PositionAnnotation should possibly be an int -
it's present as either an int or string depending on context CharAnnotation
may be "CharacterAnnotation" - not sure
|
CoreAnnotations.EmptyIndexAnnotation |
Some datasets - for example, the UD Estonian EWT dataset - use
"empty" nodes to represent words that were unspoken / unwritten
but can be inferred from the structure of the sentence.
|
CoreAnnotations.EndIndexAnnotation |
This indexes the end of a span of words, e.g., a constituent in a
tree.
|
CoreAnnotations.EntityClassAnnotation | |
CoreAnnotations.EntityMentionIndexAnnotation |
index into the list of entity mentions in a document
|
CoreAnnotations.EntityMentionToCorefMentionMappingAnnotation |
Mapping from NER-derived entity mentions to coref mentions.
|
CoreAnnotations.EntityRuleAnnotation | |
CoreAnnotations.EntityTypeAnnotation | |
CoreAnnotations.ExceptionAnnotation |
Stores an exception associated with processing this document
|
CoreAnnotations.FeaturesAnnotation |
The standard key for the features which is a Collection
|
CoreAnnotations.FemaleGazAnnotation | |
CoreAnnotations.FineGrainedNamedEntityTagAnnotation |
The CoreMap key for getting the fine grained named entity tag (i.e.
|
CoreAnnotations.FirstChildAnnotation |
used in binarized trees to specify the first child in the rule for which
this node is the parent
|
CoreAnnotations.ForcedSentenceEndAnnotation |
This indicates the sentence should end at this token.
|
CoreAnnotations.ForcedSentenceUntilEndAnnotation |
This indicates that starting at this token, the sentence should not be ended until
we see a ForcedSentenceEndAnnotation.
|
CoreAnnotations.FreqAnnotation | |
CoreAnnotations.GazAnnotation |
Possibly this should be grouped with gazetteer annotation - original key
was "gaz".
|
CoreAnnotations.GazetteerAnnotation |
The standard key for the gazetteer information
|
CoreAnnotations.GenderAnnotation |
The CoreMap key identifying an entity mention's potential gender.
|
CoreAnnotations.GenericTokensAnnotation |
The CoreMap key for getting the tokens (can be words, phrases or anything that are of type CoreMap) contained by an annotation.
|
CoreAnnotations.GeniaAnnotation | |
CoreAnnotations.GoldAnswerAnnotation |
The standard key for gold answer which is a String
|
CoreAnnotations.GovernorAnnotation | |
CoreAnnotations.GrandparentAnnotation |
specifies the base state of the parent of this node in the parse tree
|
CoreAnnotations.HaveAnnotation |
annotation stolen from the lex parser
|
CoreAnnotations.HeadWordStringAnnotation |
The key for storing a Head word as a string rather than a pointer (as in
TreeCoreAnnotations.HeadWordAnnotation)
|
CoreAnnotations.HeightAnnotation |
Used in srl.unsup
|
CoreAnnotations.IDAnnotation | |
CoreAnnotations.IDFAnnotation |
Inverse document frequency of the word this label represents
|
CoreAnnotations.INAnnotation | |
CoreAnnotations.IndexAnnotation |
This indexes a token number inside a sentence.
|
CoreAnnotations.InterpretationAnnotation |
The standard key for the semantic interpretation
|
CoreAnnotations.IsDateRangeAnnotation |
it really seems like this should have a different name or else be a boolean
|
CoreAnnotations.IsFirstWordOfMWTAnnotation |
The CoreLabel key identifying whether a token is the first word derived
from a multi-word-token.
|
CoreAnnotations.IsMultiWordTokenAnnotation |
The CoreLabel key identifying whether a token is a multi-word-token
This is attached to
CoreLabel s. |
CoreAnnotations.IsNewlineAnnotation |
The CoreLabel key identifying whether a token is a newline or not
This is attached to
CoreLabel s. |
CoreAnnotations.IsURLAnnotation |
it really seems like this should have a different name or else be a boolean
|
CoreAnnotations.KBPTriplesAnnotation |
An annotation for a sentence tagged with its KBP relation.
|
CoreAnnotations.LabelAnnotation |
Used in wsd.supwsd package
|
CoreAnnotations.LabelIDAnnotation | |
CoreAnnotations.LabelWeightAnnotation | |
CoreAnnotations.LastGazAnnotation | |
CoreAnnotations.LastTaggedAnnotation | |
CoreAnnotations.LBeginAnnotation |
Used in Gale2007ChineseSegmenter
|
CoreAnnotations.LeftChildrenNodeAnnotation |
used in incremental DAG parser
|
CoreAnnotations.LeftTermAnnotation |
The Standard key for storing the left terminal number relative to the root
of the tree of the leftmost terminal dominated by the current node
|
CoreAnnotations.LemmaAnnotation |
The CoreMap key for getting the lemma (morphological stem, lexeme form) of a token.
|
CoreAnnotations.LEndAnnotation | |
CoreAnnotations.LengthAnnotation | |
CoreAnnotations.LineNumberAnnotation |
Line number for a sentence in a document delimited by newlines
instead of punctuation.
|
CoreAnnotations.LinkAnnotation | |
CoreAnnotations.LMiddleAnnotation | |
CoreAnnotations.LocationAnnotation |
Reference location for the document
|
CoreAnnotations.MaleGazAnnotation | |
CoreAnnotations.MarkingAnnotation |
Another key used for propbank - to signify core arg nodes or predicate
nodes
|
CoreAnnotations.MentionsAnnotation | |
CoreAnnotations.MentionTokenAnnotation |
used in dcoref.
|
CoreAnnotations.MonthAnnotation |
Used in nlp.coref
|
CoreAnnotations.MorphoCaseAnnotation | |
CoreAnnotations.MorphoGenAnnotation | |
CoreAnnotations.MorphoNumAnnotation | |
CoreAnnotations.MorphoPersAnnotation | |
CoreAnnotations.MWTTokenMiscAnnotation |
CoNLL-U misc features specifically on the MWT part of a token rather than the word
|
CoreAnnotations.MWTTokenTextAnnotation |
Text of the token that was used to create this word during a multi word token split.
|
CoreAnnotations.NamedEntityTagAnnotation |
The CoreMap key for getting the token-level named entity tag (e.g., DATE,
PERSON, etc.)
This key is typically set on token annotations.
|
CoreAnnotations.NamedEntityTagProbsAnnotation |
Label and probability pair representing the coarse grained label and probability
|
CoreAnnotations.NeighborsAnnotation | |
CoreAnnotations.NERIDAnnotation |
This is an NER ID annotation (in case the all caps parsing didn't work out
for you...)
|
CoreAnnotations.NormalizedNamedEntityTagAnnotation |
The key for the normalized value of numeric named entities.
|
CoreAnnotations.NotAnnotation |
annotation stolen from the lex parser
|
CoreAnnotations.NumericCompositeObjectAnnotation |
Annotation indicating the numeric object associated with an annotation.
|
CoreAnnotations.NumericCompositeTypeAnnotation |
Annotation indicating the numeric value of the phrase the token is part of
(twenty first => 21 21 ).
|
CoreAnnotations.NumericCompositeValueAnnotation |
Annotation indicating whether the numeric phrase the token is part of
represents a NUMBER or ORDINAL (twenty first => ORDINAL ORDINAL).
|
CoreAnnotations.NumericObjectAnnotation | |
CoreAnnotations.NumericTypeAnnotation | |
CoreAnnotations.NumericValueAnnotation | |
CoreAnnotations.NumerizedTokensAnnotation | |
CoreAnnotations.NumTxtSentencesAnnotation |
Used by RTE to track number of text sentences, to determine when hyp
sentences begin.
|
CoreAnnotations.OriginalCharAnnotation |
Seems like this could be consolidated with something else...
|
CoreAnnotations.OriginalTextAnnotation |
The exact original surface form of a token.
|
CoreAnnotations.ParagraphAnnotation |
used in dcoref.
|
CoreAnnotations.ParagraphIndexAnnotation |
used in ParagraphAnnotator.
|
CoreAnnotations.ParagraphsAnnotation |
The CoreMap key for getting the paragraphs contained by an annotation.
|
CoreAnnotations.ParaPositionAnnotation | |
CoreAnnotations.ParentAnnotation |
The standard key for the parent which is a String
|
CoreAnnotations.PartOfSpeechAnnotation |
The CoreMap key for getting the Penn part of speech of a token.
|
CoreAnnotations.PercentAnnotation |
annotation stolen from the lex parser
|
CoreAnnotations.PhraseWordsAnnotation | |
CoreAnnotations.PhraseWordsTagAnnotation | |
CoreAnnotations.PolarityAnnotation | |
CoreAnnotations.PositionAnnotation | |
CoreAnnotations.PossibleAnswersAnnotation | |
CoreAnnotations.PredictedAnswerAnnotation | |
CoreAnnotations.PresetAnswerAnnotation |
The standard key for the answer which is a String
|
CoreAnnotations.PrevChildAnnotation |
used in binarized trees to say the name of the most recent child
|
CoreAnnotations.PriorAnnotation |
Used in propbank.srl
|
CoreAnnotations.ProtoAnnotation | |
CoreAnnotations.QuotationIndexAnnotation |
Unique identifier within a document for a given quotation.
|
CoreAnnotations.QuotationsAnnotation |
The CoreMap key for getting the quotations contained by an annotation.
|
CoreAnnotations.QuotedAnnotation |
Indicate whether a sentence is quoted
|
CoreAnnotations.QuotesAnnotation |
Store a list of CoreMaps representing quotes
|
CoreAnnotations.RoleAnnotation |
The standard key for the semantic role label of a phrase.
|
CoreAnnotations.SectionAnnotation |
Section of a document
|
CoreAnnotations.SectionAuthorCharacterOffsetBeginAnnotation |
Store the beginning of the author mention for this section
|
CoreAnnotations.SectionAuthorCharacterOffsetEndAnnotation |
Store the end of the author mention for this section
|
CoreAnnotations.SectionDateAnnotation |
Date for a section of a document
|
CoreAnnotations.SectionEndAnnotation |
Indicates that the token end a section and the label of the section
|
CoreAnnotations.SectionIDAnnotation |
Id for a section of a document
|
CoreAnnotations.SectionIndexAnnotation |
Store an index into a list of sections
|
CoreAnnotations.SectionsAnnotation |
Store a list of sections in the document
|
CoreAnnotations.SectionStartAnnotation |
Indicates that the token starts a new section and the attributes
that should go into that section
|
CoreAnnotations.SectionTagAnnotation |
Store the xml tag for the section as a CoreLabel
|
CoreAnnotations.SemanticHeadTagAnnotation |
The standard key for Semantic Head Word POS which is a String
|
CoreAnnotations.SemanticHeadWordAnnotation |
The standard key for Semantic Head Word which is a String
|
CoreAnnotations.SemanticTagAnnotation | |
CoreAnnotations.SemanticWordAnnotation | |
CoreAnnotations.SentenceBeginAnnotation |
The index of the sentence that this annotation begins in.
|
CoreAnnotations.SentenceEndAnnotation |
The index of the sentence that this annotation begins in.
|
CoreAnnotations.SentenceIDAnnotation | |
CoreAnnotations.SentenceIndexAnnotation |
Unique identifier within a document for a given sentence.
|
CoreAnnotations.SentencePositionAnnotation | |
CoreAnnotations.SentencesAnnotation |
The CoreMap key for getting the sentences contained in an annotation.
|
CoreAnnotations.ShapeAnnotation |
The standard key for the "shape" of a word: a String representing the type
of characters in a word, such as "Xx" for a capitalized word.
|
CoreAnnotations.SpaceBeforeAnnotation |
Used in Chinese segmenters for whether there was space before a character.
|
CoreAnnotations.SpanAnnotation |
The standard key for span which is an IntPair
|
CoreAnnotations.SpeakerAnnotation |
used in dcoref.
|
CoreAnnotations.SpeakerTypeAnnotation |
used to store speaker type information for coref
|
CoreAnnotations.SRLIDAnnotation |
The key for semantic role labels (Note: please add to this description if
you use this key)
|
CoreAnnotations.SRLInstancesAnnotation | |
CoreAnnotations.StackedNamedEntityTagAnnotation |
The CoreMap key for getting the token-level named entity tag (e.g., DATE,
PERSON, etc.) from a previous NER tagger.
|
CoreAnnotations.StateAnnotation |
The base version of the parser state, like NP or VBZ or ...
|
CoreAnnotations.StatementTextAnnotation |
The CoreMap key identifying the annotation's text, as formatted by the
QuestionToStatementTranslator . |
CoreAnnotations.StemAnnotation |
Stem of the word this label represents.
|
CoreAnnotations.SubcategorizationAnnotation | |
CoreAnnotations.TagLabelAnnotation |
Used in Trees
|
CoreAnnotations.TextAnnotation |
The CoreMap key identifying the annotation's text.
|
CoreAnnotations.TokenBeginAnnotation |
The CoreMap key identifying the first token included in an annotation.
|
CoreAnnotations.TokenEndAnnotation |
The CoreMap key identifying the last token after the end of an annotation.
|
CoreAnnotations.TokensAnnotation |
The CoreMap key for getting the tokens contained by an annotation.
|
CoreAnnotations.TopicAnnotation |
Used for Topic Assignments from LDA or its equivalent models.
|
CoreAnnotations.TrueCaseAnnotation |
The CoreMap key for getting the token-level true case annotation (e.g.,
INIT_UPPER)
This key is typically set on token annotations.
|
CoreAnnotations.TrueCaseTextAnnotation |
The CoreMap key identifying the annotation's true-cased text.
|
CoreAnnotations.TrueTagAnnotation | |
CoreAnnotations.UBlockAnnotation | |
CoreAnnotations.UnaryAnnotation |
whether the node is the parent in a unary rule
|
CoreAnnotations.UnclosedQuotationsAnnotation |
The CoreMap key for getting the quotations contained by an annotation.
|
CoreAnnotations.UnknownAnnotation |
Note: this is not a catchall "unknown" annotation but seems to have a
specific meaning for sequence classifiers
|
CoreAnnotations.UseMarkedDiscourseAnnotation |
used in dcoref.
|
CoreAnnotations.UtteranceAnnotation |
used in dcoref.
|
CoreAnnotations.UTypeAnnotation | |
CoreAnnotations.ValueAnnotation |
Contains the "value" - an ill-defined string used widely in MapLabel.
|
CoreAnnotations.VerbSenseAnnotation |
Probank key for the Verb sense given in the Propbank Annotation, should
only be in the verbnode
|
CoreAnnotations.WebAnnotation | |
CoreAnnotations.WikipediaEntityAnnotation |
An annotation for the Wikipedia page (i.e., canonical name) associated with
this token.
|
CoreAnnotations.WordFormAnnotation | |
CoreAnnotations.WordnetSynAnnotation | |
CoreAnnotations.WordPositionAnnotation | |
CoreAnnotations.WordSenseAnnotation | |
CoreAnnotations.XmlContextAnnotation |
Used in CleanXMLAnnotator.
|
CoreAnnotations.XmlElementAnnotation |
Used in SimpleXMLAnnotator.
|
CoreAnnotations.YearAnnotation | |
CoreLabel |
A CoreLabel represents a single word with ancillary information
attached using CoreAnnotations.
|
CoreUtilities | |
DocumentReader<L> |
Basic mechanism for reading in Documents from various input sources.
|
IndexedWord |
This class provides a
CoreLabel that uses its
DocIDAnnotation, SentenceIndexAnnotation, and IndexAnnotation to implement
Comparable/compareTo, hashCode, and equals. |
LabeledWord |
A
LabeledWord object contains a word and its tag. |
MultiTokenTag |
Represents a tag for a multi token expression
Can be used to annotate individual tokens without
having nested annotations
|
MultiTokenTag.Tag | |
RVFDatum<L,F> |
A basic implementation of the Datum interface that can be constructed with a
Collection of features and one more more labels.
|
SegmenterCoreAnnotations | |
SegmenterCoreAnnotations.CharactersAnnotation | |
SegmenterCoreAnnotations.XMLCharAnnotation | |
SentenceUtils |
SentenceUtils holds a couple utility methods for lists that are sentences.
|
StringLabel |
A
StringLabel object acts as a Label by containing a
single String, which it sets or returns in response to requests. |
StringLabelFactory |
A
StringLabelFactory object makes a simple
StringLabel out of a String . |
Tag |
A
Tag object acts as a Label by containing a
String that is a part-of-speech tag. |
TaggedWord |
A
TaggedWord object contains a word and its tag. |
TaggedWordFactory |
A
TaggedWordFactory acts as a factory for creating objects of
class TaggedWord . |
ValueLabel |
A
ValueLabel object acts as a Label with linguistic
attributes. |
Word |
A
Word object acts as a Label by containing a String. |
WordFactory |
A
WordFactory acts as a factory for creating objects of
class Word . |
WordLemmaTag |
A WordLemmaTag corresponds to a pair of a tagged (e.g., for part of speech)
word and its lemma.
|
WordLemmaTagFactory |
A
WordLemmaTagFactory acts as a factory for creating
objects of class WordLemmaTag . |
WordTag |
A WordTag corresponds to a tagged (e.g., for part of speech) word
and is implemented with String-valued word and tag.
|
WordTagFactory |
A
WordTagFactory acts as a factory for creating
objects of class WordTag . |
Enum | Description |
---|---|
CoreAnnotations.SRL_ID | |
CoreLabel.OutputFormat |
This package contains the different data structures used by JavaNLP throughout the years for dealing with linguistic objects in general, of which words are the most generally used. Most data structures in this package are deprecated. The current recommendation is to represent an annotated word as a CoreMap (e.g., an ArrayCoreMap) from the util package.
CoreMap is a basic type-safe data structure that maps keys to corresponding values, where each value's type must be consistent with the key's definition. The CoreAnnotations class in this package contains many common annotations used by different portions of JavaNLP, but you can define new keys locally to a package if they aren't of general applicability. See the CoreMap unit tests for an example usage of CoreMap and of defining a key.
The oldest code in JavaNLP uses various types of ValueLabel, and might expect data types from the Has* family (like HasWord, HasTag, et al., denoting presence or absence of that particular annotation). Second generation code made use of the MapLabel family (including AbstractMapLabel, FeatureLabel, and IndexedFeatureLabel), but this code has all been converted across to use CoreLabel. More modern code will use CoreMap as its basic data structure. CoreLabel is a CoreMap that unifies all the families of interfaces into a single view of an underlying (Array)CoreMap.
It is recommended that new code use the ArrayCoreMap class from the util package as the base representation of a word when possible. Any CoreMap can be presented as one of the older data structures (MapLabel, HasWord, etc.), by simply wrapping it in a CoreLabel "view" with CoreLabel.forCoreMap(map).
Legacy description: Classes for linguistic concepts which are common to many NLP classes, such as Word, Tag, etc. Also contains classes for building and operating on documents and data collections. Two of the basic interfaces are Document for representing a document as a list of words with meta-data, and DataCollection for representing a collection of documents. The most common document class you will probably use is BasicDocument, which provides support for constructing documents from a variety of input sources.