edu.stanford.nlp.process
Class WordSegmentingTokenizer
java.lang.Object
edu.stanford.nlp.process.AbstractTokenizer<Word>
edu.stanford.nlp.process.WordSegmentingTokenizer
- All Implemented Interfaces:
- Tokenizer<Word>, Iterator<Word>
public class WordSegmentingTokenizer
- extends AbstractTokenizer<Word>
A tokenizer that works by calling a WordSegmenter.
This is used for Chinese and Arabic.
- Author:
- Galen Andrew
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
WordSegmentingTokenizer
public WordSegmentingTokenizer(WordSegmenter wordSegmenter,
Reader r)
getNext
protected Word getNext()
- Description copied from class:
AbstractTokenizer
- Internally fetches the next token.
- Specified by:
getNext
in class AbstractTokenizer<Word>
- Returns:
- the next token in the token stream, or null if none exists.
segmentWords
public ArrayList<Word> segmentWords(String s)
factory
public static TokenizerFactory<Word> factory(WordSegmenter wordSegmenter)
Stanford NLP Group