Package edu.stanford.nlp.process

Interface Summary
DocumentProcessor<IN,OUT,L,F> Top-level interface for transforming Documents.
LexedTokenFactory<T> Constructs a token (of arbitrary type) from a String and its position in the underlying text.
ListProcessor<IN,OUT> An interface for things that operate on a List.
Tokenizer<T> Tokenizers break up text into individual Objects.
 

Class Summary
AbstractListProcessor<IN,OUT,L,F> Class AbstractListProcessor
AbstractTokenizer<T> An abstract tokenizer.
Americanize Takes a HasWord or String and returns an Americanized version of it.
CoreLabelTokenFactory Constructs CoreLabels as Strings with a corresponding BEGIN and END position.
Morphology Morphology computes the base form of English words, by removing just inflections (not derivational morphology).
PTBTokenizer<T extends HasWord> Tokenizer implementation that conforms to the Penn Treebank tokenization conventions.
PTBTokenizer.PTBTokenizerFactory<T extends HasWord>  
StripTagsProcessor<L,F> A Processor whose process method deletes all SGML/XML/HTML tags (tokens starting with < and ending with >.
TokenizerAdapter This class adapts between a java.io.StreamTokenizer and a edu.stanford.nlp.process.Tokenizer.
WordShapeClassifier Provides static methods which map any String to another String indicative of its "word shape" -- e.g., whether capitalized, numeric, etc.
WordTokenFactory Constructs a Word from a String.
WordToSentenceProcessor<IN,L,F> Transforms a Document of Words into a Document of Sentences by grouping the Words.
 



Stanford NLP Group