Package edu.stanford.nlp.process

Interface Summary
CoreTokenFactory<IN extends CoreMap> To make tokens like CoreMap or CoreLabel.
LexedTokenFactory<T> Constructs a token (of arbitrary type) from a String and its position in the underlying text.
SerializableFunction<T1,T2> This interface is a conjunction of Function and Serializable, which is a bad idea from the perspective of the type system, but one that seems more palatable than other bad ideas until java's type system is flexible enough to support type conjunctions.
Tokenizer<T> Tokenizers break up text into individual Objects.
WordSegmenter An interface for segmenting strings into words (in unwordsegmented languages).
 

Class Summary
AbstractTokenizer<T> An abstract tokenizer.
Americanize Takes a HasWord or String and returns an Americanized version of it.
CoreLabelTokenFactory Constructs CoreLabels from Strings optionally with beginning and ending (character after the end) offset positions in an original text.
DocumentPreprocessor Produces a list of sentences from either a plain text or XML document.
LexerTokenizer An implementation of Tokenizer designed to work with Lexer implementing classes.
Morphology Morphology computes the base form of English words, by removing just inflections (not derivational morphology).
PTBTokenizer<T extends HasWord> Fast, rule-based tokenizer implementation, initially written to conform to the Penn Treebank tokenization conventions, but now providing a range of tokenization options over a broader space of Unicode text.
PTBTokenizer.PTBTokenizerFactory<T extends HasWord> This class provides a factory which will vend instances of PTBTokenizer which wrap a provided Reader.
TokenizerAdapter This class adapts between a java.io.StreamTokenizer and a edu.stanford.nlp.process.Tokenizer.
WhitespaceTokenizer<T extends HasWord> A WhitespaceTokenizer is a tokenizer that splits on and discards only whitespace characters.
WhitespaceTokenizer.WhitespaceTokenizerFactory<T extends HasWord> A factory which vends WhitespaceTokenizers.
WordSegmentingTokenizer A tokenizer that works by calling a WordSegmenter.
WordTokenFactory Constructs a Word from a String.
 

Enum Summary
DocumentPreprocessor.DocType  
 



Stanford NLP Group