Package edu.stanford.nlp.process

Interface Summary
LexedTokenFactory<T> Constructs a token (of arbitrary type) from a String and its position in the underlying text.
Tokenizer<T> Tokenizers break up text into individual Objects.
 

Class Summary
AbstractTokenizer<T> An abstract tokenizer.
Americanize Takes a HasWord or String and returns an Americanized version of it.
CoreLabelTokenFactory Constructs CoreLabels as Strings with a corresponding BEGIN and END position.
LexerTokenizer An implementation of Tokenizer designed to work with Lexer implementing classes.
Morphology Morphology computes the base form of English words, by removing just inflections (not derivational morphology).
PTBTokenizer<T extends HasWord> Tokenizer implementation that conforms to the Penn Treebank tokenization conventions.
PTBTokenizer.PTBTokenizerFactory<T extends HasWord>  
TokenizerAdapter This class adapts between a java.io.StreamTokenizer and a edu.stanford.nlp.process.Tokenizer.
WhitespaceTokenizer A WhitespaceTokenizer is a tokenizer that splits on and discards only whitespace characters.
WhitespaceTokenizer.WhitespaceTokenizerFactory A factory which vends WhitespaceTokenizers.
WordTokenFactory Constructs a Word from a String.
 



Stanford NLP Group