- Type Parameters:
T
- The type of the tokens returned by the Tokenizer
- All Superinterfaces:
- IteratorFromReaderFactory<T>
- All Known Implementing Classes:
- ArabicTokenizer.ArabicTokenizerFactory, FrenchTokenizer.FrenchTokenizerFactory, PTBTokenizer.PTBTokenizerFactory, SpanishTokenizer.SpanishTokenizerFactory, TreeTokenizerFactory, WhitespaceTokenizer.WhitespaceTokenizerFactory
public interface TokenizerFactory<T>
extends IteratorFromReaderFactory<T>
A TokenizerFactory is used to convert a java.io.Reader into a Tokenizer
(an extension of Iterator) over objects of type T represented by the text
in the java.io.Reader. It's mainly a convenience, since you could cast
down anyway.
IMPORTANT NOTE:
A TokenizerFactory should also provide two static methods:
public static TokenizerFactory<? extends HasWord> newTokenizerFactory();
public static TokenizerFactory<Word> newWordTokenizerFactory(String options);
These are expected by certain JavaNLP code (e.g., LexicalizedParser),
which wants to produce a TokenizerFactory by reflection.
- Author:
- Christopher Manning