edu.stanford.nlp.process
Class WordSegmentingTokenizer

java.lang.Object
  extended by edu.stanford.nlp.process.AbstractTokenizer<Word>
      extended by edu.stanford.nlp.process.WordSegmentingTokenizer
All Implemented Interfaces:
Tokenizer<Word>, java.util.Iterator<Word>

public class WordSegmentingTokenizer
extends AbstractTokenizer<Word>

A tokenizer that works by calling a WordSegmenter. This is used for Chinese and Arabic.

Author:
Galen Andrew

Field Summary
 
Fields inherited from class edu.stanford.nlp.process.AbstractTokenizer
nextToken
 
Constructor Summary
WordSegmentingTokenizer(WordSegmenter wordSegmenter, java.io.Reader r)
           
 
Method Summary
static TokenizerFactory<Word> factory(WordSegmenter wordSegmenter)
           
protected  Word getNext()
          Internally fetches the next token.
 Sentence<Word> segmentWords(java.lang.String s)
           
 
Methods inherited from class edu.stanford.nlp.process.AbstractTokenizer
hasNext, next, peek, remove, tokenize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordSegmentingTokenizer

public WordSegmentingTokenizer(WordSegmenter wordSegmenter,
                               java.io.Reader r)
Method Detail

getNext

protected Word getNext()
Description copied from class: AbstractTokenizer
Internally fetches the next token.

Specified by:
getNext in class AbstractTokenizer<Word>
Returns:
the next token in the token stream, or null if none exists.

segmentWords

public Sentence<Word> segmentWords(java.lang.String s)

factory

public static TokenizerFactory<Word> factory(WordSegmenter wordSegmenter)


Stanford NLP Group