Package edu.stanford.nlp.process

Contains classes for processing documents.

See:
          Description

Interface Summary
CoreTokenFactory<IN extends CoreMap> To make tokens like CoreMap or CoreLabel.
LexedTokenFactory<T> Constructs a token (of arbitrary type) from a String and its position in the underlying text.
ListProcessor<IN,OUT> An interface for things that operate on a List.
Tokenizer<T> Tokenizers break up text into individual Objects.
 

Class Summary
AbstractTokenizer<T> An abstract tokenizer.
Americanize Takes a HasWord or String and returns an Americanized version of it.
CoreLabelTokenFactory Constructs CoreLabels from Strings optionally with beginning and ending (character after the end) offset positions in an original text.
Morphology Morphology computes the base form of English words, by removing just inflections (not derivational morphology).
PTBTokenizer<T extends HasWord> Fast, rule-based tokenizer implementation, initially written to conform to the Penn Treebank tokenization conventions, but now providing a range of tokenization options over a broader space of Unicode text.
PTBTokenizer.PTBTokenizerFactory<T extends HasWord> This class provides a factory which will vend instances of PTBTokenizer which wrap a provided Reader.
TokenizerAdapter This class adapts between a java.io.StreamTokenizer and a edu.stanford.nlp.process.Tokenizer.
WhitespaceTokenizer<T extends HasWord> A WhitespaceTokenizer is a tokenizer that splits on and discards only whitespace characters.
WhitespaceTokenizer.WhitespaceTokenizerFactory<T extends HasWord> A factory which vends WhitespaceTokenizers.
WordShapeClassifier Provides static methods which map any String to another String indicative of its "word shape" -- e.g., whether capitalized, numeric, etc.
WordTokenFactory Constructs a Word from a String.
WordToSentenceProcessor<IN> Transforms a Document of Words into a Document of Sentences by grouping the Words.
 

Package edu.stanford.nlp.process Description

Contains classes for processing documents. The key here is the Processor interface, which has a sole Document process(Document) method which takes a document and returns another processed document, which may be parsed, stoplisted, stemmed, etc.


Sepandar David Kamvar
Last modified: Thu Oct 31 11:14:34 PST 2002



Stanford NLP Group