Package edu.stanford.nlp.process

Interface Summary
DocumentProcessor<IN,OUT,L,F> Top-level interface for transforming Documents.
LexedTokenFactory<T> Constructs a token (of arbitrary type) from a String and its position in the underlying text.
ListProcessor<IN,OUT> An interface for things that operate on a List.
Tokenizer<T> Tokenizers break up text into individual Objects.
 

Class Summary
AbstractListProcessor<IN,OUT,L,F> Class AbstractListProcessor
AbstractTokenizer<T> An abstract tokenizer.
Americanize Takes a HasWord or String and returns an Americanized version of it.
CoreLabelTokenFactory Constructs CoreLabels from Strings optionally with beginning and ending (character after the end) offset positions in an original text.
DocumentPreprocessor Fully customizable preprocessor for XML, HTML, and PLAIN text documents.
Morphology Morphology computes the base form of English words, by removing just inflections (not derivational morphology).
PTBEscapingProcessor<IN extends HasWord,L,F> Produces a new Document of Words in which special characters of the PTB have been properly escaped.
PTBTokenizer<T extends HasWord> Tokenizer implementation that conforms to the Penn Treebank tokenization conventions.
PTBTokenizer.PTBTokenizerFactory<T extends HasWord>  
StripTagsProcessor<L,F> A Processor whose process method deletes all SGML/XML/HTML tags (tokens starting with < and ending with >.
TokenizerAdapter This class adapts between a java.io.StreamTokenizer and a edu.stanford.nlp.process.Tokenizer.
TransformXML<T> Reads XML from an input file or stream and writes XML to an output file or stream, while transforming text appearing inside specified XML tags by applying a specified Function.
TransformXML.SAXInterface<T>  
WhitespaceTokenizer A WhitespaceTokenizer is a tokenizer that splits on and discards only whitespace characters.
WhitespaceTokenizer.WhitespaceTokenizerFactory A factory which vends WhitespaceTokenizers.
WordShapeClassifier Provides static methods which map any String to another String indicative of its "word shape" -- e.g., whether capitalized, numeric, etc.
WordTokenFactory Constructs a Word from a String.
WordToSentenceProcessor<IN> Transforms a Document of Words into a Document of Sentences by grouping the Words.
WordToTaggedWordProcessor<IN extends HasWord,L,F> Transforms a Document of Words into a document all or partly of TaggedWords by breaking words on a tag divider character.
 



Stanford NLP Group