Package edu.stanford.nlp.process

Contains classes for processing documents.

See:
          Description

Interface Summary
Function<T1,T2> An interface for classes that act as a function transforming one object to another.
LexedTokenFactory Constructs a token (of arbitrary type) from a String and its position in the underlying text.
ListProcessor<IN,OUT> An interface for things that operate on a List.
Processor Top-level interface for transforming Documents.
Tokenizer<T> Tokenizers break up text into individual Objects.
 

Class Summary
AbstractListProcessor<IN,OUT> Class AbstractListProcessor
AbstractTokenizer<T> An abstract tokenizer.
Americanize Takes a HasWord or String and returns a lowercase version of it.
FeatureLabelTokenFactory  
InvertiblePTBTokenizer Tokenizer implementation that conforms to the Penn Treebank tokenization conventions.
InvertiblePTBTokenizer.InvertiblePTBTokenizerFactory  
PTBTokenizer Tokenizer implementation that conforms to the Penn Treebank tokenization conventions.
PTBTokenizer.PTBTokenizerFactory  
StripTagsProcessor A Processor whose process method deletes all SGML/XML/HTML tags (tokens starting with < and ending with >.
WordShapeClassifier Provides static methods which map any String to another String indicative of its "word shape" -- e.g., whether capitalized, numeric, etc.
WordTokenFactory Constructs a Word from a String.
WordToSentenceProcessor Transforms a Document of Words into a Document of Sentences by grouping the Words.
 

Package edu.stanford.nlp.process Description

Contains classes for processing documents. The key here is the Processor interface, which has a sole Document process(Document) method which takes a document and returns another processed document, which may be parsed, stoplisted, stemmed, etc.


Sepandar David Kamvar
Last modified: Thu Oct 31 11:14:34 PST 2002



Stanford NLP Group