Package edu.stanford.nlp.process

Interface Summary
LexedTokenFactory<T> Constructs a token (of arbitrary type) from a String and its position in the underlying text.
ListProcessor<IN,OUT> An interface for things that operate on a List.
Processor<IN,OUT> Top-level interface for transforming Documents.
SerializableFunction<T1,T2> This interface is a conjunction of Function and Serializable, which is a bad idea from the perspective of the type system, but one that seems more palatable than other bad ideas until java's type system is flexible enough to support type conjunctions.
Tokenizer<T> Tokenizers break up text into individual Objects.
 

Class Summary
AbstractListProcessor<IN,OUT> Class AbstractListProcessor
AbstractTokenizer<T> An abstract tokenizer.
Americanize Takes a HasWord or String and returns an Americanized version of it.
CoreLabelTokenFactory Constructs CoreLabels as Strings with a corresponding BEGIN and END position.
DocumentPreprocessor Fully customizable preprocessor for XML, HTML, and PLAIN text documents.
Morphology Morphology computes the base form of English words, by removing just inflections (not derivational morphology).
PTBEscapingProcessor Produces a new Document of Words in which special characters of the PTB have been properly escaped.
PTBTokenizer<T> Tokenizer implementation that conforms to the Penn Treebank tokenization conventions.
PTBTokenizer.PTBTokenizerFactory<T>  
StripTagsProcessor A Processor whose process method deletes all SGML/XML/HTML tags (tokens starting with < and ending with >.
TokenizerAdapter This class adapts between a java.io.StreamTokenizer and a edu.stanford.nlp.process.Tokenizer.
WhitespaceTokenizer A WhitespaceTokenizer is a tokenizer that splits on and discards only whitespace characters.
WordTokenFactory Constructs a Word from a String.
WordToSentenceProcessor Transforms a Document of Words into a Document of Sentences by grouping the Words.
WordToTaggedWordProcessor Transforms a Document of Words into a document all or partly of TaggedWords by breaking words on a tag divider character.
 



Stanford NLP Group