Package edu.stanford.nlp.process

Contains classes for processing documents.

See:
          Description

Interface Summary
Appliable  
Feature This provides an interface for a feature that can be used to define a partition over the space of possible unseen words.
FeatureValue This defines an interface for the set of possible values that a Feature can assume.
Processor Top-level interface for transforming Documents.
Tokenizer Tokenizers break up text into individual Objects.
 

Class Summary
AbstractTokenizer Abstract tokenizer.
CapitalFeature Provides a partition over the set of possible unseen words that corresponds to the capitalization of characters in the word.
DocumentProcessor Processor that takes an Appliable and applies to every element in the input Document.
DummyTokenizer Tokenizer implementation that conforms to the Penn Treebank tokenization conventions.
LowercaseProcessor Processor whose process method Converts a collection of mixed-case Words to a collection of lowercase Words.
NumAndCapFeature Provides a partition over the set of possible unseen words that corresponds to the capitalization and inclusion of numbers in the word.
NumberFeature Provides a partition over the set of possible unseen words that corresponds to the formatting of numbers in the word.
NumberProcessor Processor whose process method converts a numbers to the word "*NUMBER*"
PTBTokenizer Tokenizer implementation that conforms to the Penn Treebank tokenization conventions.
SentenceToWordProcessor Transforms a Document of Sentences to a Document of Words by flattening out the Sentences.
SimpleTokenizer Simple tokenizer implementation that wraps a StringTokenizer.
Stemmer Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form.
StopList Simple stoplist class.
StoplistFilter Filter which removes stop-listed words.
StripTagsProcessor A Processor whose process method deletes all SGML/XML/HTML tags (tokens starting with < and ending with >.
TreeToSentenceAppliable Appliable that turns a Tree into its Sentence yield.
WordExtractor Pulls the word String from a Word.
WordToSentenceProcessor Transforms a Document of Words into a Document of Sentences by grouping the Words.
WordToTaggedWordProcessor Transforms a Document of Words into a document all or partly of TaggedWords by breaking words on a tag divider character.
 

Package edu.stanford.nlp.process Description

Contains classes for processing documents. The key here is the Processor interface, which has a sole Document process(Document) method which takes a document and returns another processed document, which may be parsed, stoplisted, stemmed, etc.


Sepandar David Kamvar
Last modified: Thu Oct 31 11:14:34 PST 2002



Stanford NLP Group