edu.stanford.nlp.process
Class WordSegmentingTokenizer
java.lang.Object
edu.stanford.nlp.process.AbstractTokenizer<HasWord>
edu.stanford.nlp.process.WordSegmentingTokenizer
- All Implemented Interfaces:
- Tokenizer<HasWord>, Iterator<HasWord>
public class WordSegmentingTokenizer
- extends AbstractTokenizer<HasWord>
A tokenizer that works by calling a WordSegmenter.
This is used for Chinese and Arabic.
- Author:
- Galen Andrew, Spence Green
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
WordSegmentingTokenizer
public WordSegmentingTokenizer(WordSegmenter segmenter,
Reader r)
WordSegmentingTokenizer
public WordSegmentingTokenizer(WordSegmenter segmenter,
Tokenizer<CoreLabel> tokenizer)
getNext
protected HasWord getNext()
- Description copied from class:
AbstractTokenizer
- Internally fetches the next token.
- Specified by:
getNext
in class AbstractTokenizer<HasWord>
- Returns:
- the next token in the token stream, or null if none exists.
factory
public static TokenizerFactory<HasWord> factory(WordSegmenter wordSegmenter)
Stanford NLP Group