Interface Summary |
CoreTokenFactory<IN extends CoreMap> |
To make tokens like CoreMap or CoreLabel. |
DocumentProcessor<IN,OUT,L,F> |
Top-level interface for transforming Documents. |
LexedTokenFactory<T> |
Constructs a token (of arbitrary type) from a String and its position
in the underlying text. |
ListProcessor<IN,OUT> |
An interface for things that operate on a List. |
SerializableFunction<T1,T2> |
This interface is a conjunction of Function and Serializable, which is
a bad idea from the perspective of the type system, but one that seems
more palatable than other bad ideas until java's type system is flexible
enough to support type conjunctions. |
Tokenizer<T> |
Tokenizers break up text into individual Objects. |
WordSegmenter |
An interface for segmenting strings into words
(in unwordsegmented languages). |
Class Summary |
AbstractListProcessor<IN,OUT,L,F> |
Class AbstractListProcessor |
AbstractTokenizer<T> |
An abstract tokenizer. |
Americanize |
Takes a HasWord or String and returns an Americanized version of it. |
CoreLabelTokenFactory |
Constructs CoreLabel s from Strings optionally with
beginning and ending (character after the end) offset positions in
an original text. |
DocumentPreprocessor |
Produces a list of sentences from either a plain text or XML document. |
LexerTokenizer |
An implementation of Tokenizer designed to work with
Lexer implementing classes. |
Morphology |
Morphology computes the base form of English words, by removing just
inflections (not derivational morphology). |
PTBEscapingProcessor<IN extends HasWord,L,F> |
Produces a new Document of Words in which special characters of the PTB
have been properly escaped. |
PTBTokenizer<T extends HasWord> |
Fast, rule-based tokenizer implementation, initially written to
conform to the Penn Treebank tokenization conventions, but now providing
a range of tokenization options over a broader space of Unicode text. |
PTBTokenizer.PTBTokenizerFactory<T extends HasWord> |
This class provides a factory which will vend instances of PTBTokenizer
which wrap a provided Reader. |
StripTagsProcessor<L,F> |
A Processor whose process method deletes all
SGML/XML/HTML tags (tokens starting with < and ending
with >. |
TokenizerAdapter |
This class adapts between a java.io.StreamTokenizer
and a edu.stanford.nlp.process.Tokenizer . |
WhitespaceTokenizer<T extends HasWord> |
A WhitespaceTokenizer is a tokenizer that splits on and discards only
whitespace characters. |
WhitespaceTokenizer.WhitespaceTokenizerFactory<T extends HasWord> |
A factory which vends WhitespaceTokenizers. |
WordSegmentingTokenizer |
A tokenizer that works by calling a WordSegmenter. |
WordTokenFactory |
Constructs a Word from a String. |
WordToSentenceProcessor<IN> |
Transforms a Document of Words into a Document of Sentences by grouping the
Words. |
WordToTaggedWordProcessor<IN extends HasWord,L,F> |
Transforms a Document of Words into a document all or partly of
TaggedWords by breaking words on a tag divider character. |