edu.stanford.nlp.process
Class SimpleTokenizer

java.lang.Object
  |
  +--edu.stanford.nlp.process.AbstractTokenizer
        |
        +--edu.stanford.nlp.process.SimpleTokenizer
All Implemented Interfaces:
Iterator, Tokenizer

public class SimpleTokenizer
extends AbstractTokenizer

Simple tokenizer implementation that wraps a StringTokenizer. Word delimiter chars are space and tab only. Newlines are returned as tokens. The tokens returned are edu.stanford.nlp.trees.Word objects.

Author:
Joseph Smarr (jsmarr@stanford.edu), Teg Grenager (grenager@stanford.edu)

Field Summary
protected static String delims
          Word delimiter characters used to tokenize text: " \t\n"
 
Constructor Summary
SimpleTokenizer()
          Constructs a new SimpleTokenizer.
SimpleTokenizer(Reader r)
          Constructs a new SimpleTokenizer with the Reader r as its source.
 
Method Summary
 boolean hasNext()
          Returns true if this Tokenizer has more elements.
 Object next()
          Returns the next Word token, or null if there is none.
 void setSource(Reader r)
          Sets the source of this Tokenizer to be the Reader r.
 
Methods inherited from class edu.stanford.nlp.process.AbstractTokenizer
pushBack, remove, tokenize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

delims

protected static final String delims
Word delimiter characters used to tokenize text: " \t\n"

See Also:
Constant Field Values
Constructor Detail

SimpleTokenizer

public SimpleTokenizer()
Constructs a new SimpleTokenizer. No source is specified, so hasNext() will return false.


SimpleTokenizer

public SimpleTokenizer(Reader r)
Constructs a new SimpleTokenizer with the Reader r as its source.

Method Detail

hasNext

public boolean hasNext()
Returns true if this Tokenizer has more elements.

Specified by:
hasNext in interface Tokenizer
Specified by:
hasNext in class AbstractTokenizer

next

public Object next()
Returns the next Word token, or null if there is none.

Specified by:
next in interface Tokenizer
Specified by:
next in class AbstractTokenizer

setSource

public void setSource(Reader r)
Sets the source of this Tokenizer to be the Reader r.

Specified by:
setSource in interface Tokenizer
Specified by:
setSource in class AbstractTokenizer


Stanford NLP Group