|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
Appliable | |
Feature | This provides an interface for a feature that can be used to define a partition over the space of possible unseen words. |
FeatureValue | This defines an interface for the set of possible values that a Feature can assume. |
Processor | Top-level interface for transforming Documents. |
Tokenizer | Tokenizers break up text into individual Objects. |
Class Summary | |
AbstractTokenizer | Abstract tokenizer. |
CapitalFeature | Provides a partition over the set of possible unseen words that corresponds to the capitalization of characters in the word. |
DocumentProcessor | Processor that takes an Appliable and applies to every element in the input Document. |
DummyTokenizer | Tokenizer implementation that conforms to the Penn Treebank tokenization conventions. |
LowercaseProcessor | Processor whose process method Converts a
collection of mixed-case Words to a collection of lowercase Words. |
NumAndCapFeature | Provides a partition over the set of possible unseen words that corresponds to the capitalization and inclusion of numbers in the word. |
NumberFeature | Provides a partition over the set of possible unseen words that corresponds to the formatting of numbers in the word. |
NumberProcessor | Processor whose process method converts a
numbers to the word "*NUMBER*" |
PTBTokenizer | Tokenizer implementation that conforms to the Penn Treebank tokenization conventions. |
SentenceToWordProcessor | Transforms a Document of Sentences to a Document of Words by flattening out the Sentences. |
SimpleTokenizer | Simple tokenizer implementation that wraps a StringTokenizer. |
Stemmer | Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form. |
StopList | Simple stoplist class. |
StoplistFilter | Filter which removes stop-listed words. |
StripTagsProcessor | A Processor whose process method deletes all
SGML/XML/HTML tags (tokens starting with < and ending
with > |
TreeToSentenceAppliable | Appliable that turns a Tree into its Sentence yield. |
WordExtractor | Pulls the word String from a Word. |
WordToSentenceProcessor | Transforms a Document of Words into a Document of Sentences by grouping the Words. |
WordToTaggedWordProcessor | Transforms a Document of Words into a document all or partly of TaggedWords by breaking words on a tag divider character. |
Contains classes for processing documents. The key here is the Processor
interface, which has a sole Document process(Document)
method
which takes a document and returns another document, which may
be parsed, stoplisted, stemmed, etc.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |