edu.stanford.nlp.process
Class WordSegmentingTokenizer

java.lang.Object
  extended by edu.stanford.nlp.process.AbstractTokenizer<HasWord>
      extended by edu.stanford.nlp.process.WordSegmentingTokenizer
All Implemented Interfaces:
Tokenizer<HasWord>, Iterator<HasWord>

public class WordSegmentingTokenizer
extends AbstractTokenizer<HasWord>

A tokenizer that works by calling a WordSegmenter. This is used for Chinese and Arabic.

Author:
Galen Andrew, Spence Green

Field Summary
 
Fields inherited from class edu.stanford.nlp.process.AbstractTokenizer
nextToken
 
Constructor Summary
WordSegmentingTokenizer(WordSegmenter segmenter, Reader r)
           
WordSegmentingTokenizer(WordSegmenter segmenter, Tokenizer<CoreLabel> tokenizer)
           
 
Method Summary
static TokenizerFactory<HasWord> factory(WordSegmenter wordSegmenter)
           
protected  HasWord getNext()
          Internally fetches the next token.
 
Methods inherited from class edu.stanford.nlp.process.AbstractTokenizer
hasNext, next, peek, remove, tokenize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordSegmentingTokenizer

public WordSegmentingTokenizer(WordSegmenter segmenter,
                               Reader r)

WordSegmentingTokenizer

public WordSegmentingTokenizer(WordSegmenter segmenter,
                               Tokenizer<CoreLabel> tokenizer)
Method Detail

getNext

protected HasWord getNext()
Description copied from class: AbstractTokenizer
Internally fetches the next token.

Specified by:
getNext in class AbstractTokenizer<HasWord>
Returns:
the next token in the token stream, or null if none exists.

factory

public static TokenizerFactory<HasWord> factory(WordSegmenter wordSegmenter)


Stanford NLP Group