Interface Summary |
DocumentProcessor<IN,OUT,L,F> |
Top-level interface for transforming Documents. |
LexedTokenFactory<T> |
Constructs a token (of arbitrary type) from a String and its position
in the underlying text. |
ListProcessor<IN,OUT> |
An interface for things that operate on a List. |
SerializableFunction<T1,T2> |
This interface is a conjunction of Function and Serializable, which is
a bad idea from the perspective of the type system, but one that seems
more palatable than other bad ideas until java's type system is flexible
enough to support type conjunctions. |
Tokenizer<T> |
Tokenizers break up text into individual Objects. |
Class Summary |
AbstractListProcessor<IN,OUT,L,F> |
Class AbstractListProcessor |
AbstractTokenizer<T> |
An abstract tokenizer. |
Americanize |
Takes a HasWord or String and returns an Americanized version of it. |
CoreLabelTokenFactory |
Constructs CoreLabel s as Strings with a corresponding BEGIN and END position. |
DocumentPreprocessor |
Fully customizable preprocessor for XML, HTML, and PLAIN text documents. |
Morphology |
Morphology computes the base form of English words, by removing just
inflections (not derivational morphology). |
PTBEscapingProcessor<IN extends HasWord,L,F> |
Produces a new Document of Words in which special characters of the PTB
have been properly escaped. |
PTBTokenizer<T extends HasWord> |
Tokenizer implementation that conforms to the Penn Treebank tokenization
conventions. |
PTBTokenizer.PTBTokenizerFactory<T extends HasWord> |
|
StripTagsProcessor<L,F> |
A Processor whose process method deletes all
SGML/XML/HTML tags (tokens starting with < and ending
with >. |
TokenizerAdapter |
This class adapts between a java.io.StreamTokenizer
and a edu.stanford.nlp.process.Tokenizer . |
TransformXML |
Reads XML from an input file or stream and writes XML to an output
file or stream, while transforming text appearing inside specified
XML tags by applying a specified Function . |
TransformXML.SAXInterface |
|
WhitespaceTokenizer |
A WhitespaceTokenizer is a tokenizer that splits on and discards only
whitespace characters. |
WhitespaceTokenizer.WhitespaceTokenizerFactory |
A factory which vends WhitespaceTokenizers. |
WordShapeClassifier |
Provides static methods which
map any String to another String indicative of its "word shape" -- e.g.,
whether capitalized, numeric, etc. |
WordTokenFactory |
Constructs a Word from a String. |
WordToSentenceProcessor<IN,L,F> |
Transforms a Document of Words into a Document of Sentences by grouping the
Words. |
WordToTaggedWordProcessor<IN extends HasWord,L,F> |
Transforms a Document of Words into a document all or partly of
TaggedWords by breaking words on a tag divider character. |