edu.stanford.nlp.process
Interface Tokenizer

All Superinterfaces:
Iterator
All Known Implementing Classes:
AbstractTokenizer, LexerTokenizer

public interface Tokenizer
extends Iterator

Tokenizers break up text into individual Objects. These objects may be Strings, Words, or other Objects. This Tokenizer interface allows the source to be set upon construction, or to be reset using the setSource(Reader r) method. Thus each Tokenizer instance may be used for one data source or several.

Author:
Teg Grenager (grenager@stanford.edu)

Method Summary
 boolean hasNext()
          Returns true if this Tokenizer has more elements.
 Object next()
          Returns the next token from this Tokenizer.
 Object peek()
          Returns the next token, without removing it, from the Tokenizer, so that the same token will be again returned on the next call to next() or peek().
 void remove()
          Removes from the underlying collection the last element returned by the iterator.
 List tokenize()
          Returns all tokens of this Tokenizer as a List for convenience.
 

Method Detail

next

public Object next()
Returns the next token from this Tokenizer.

Specified by:
next in interface Iterator

hasNext

public boolean hasNext()
Returns true if this Tokenizer has more elements.

Specified by:
hasNext in interface Iterator

remove

public void remove()
Removes from the underlying collection the last element returned by the iterator. This is an optional operation for Iterators - a Tokenizer normally would not support it. This method can be called only once per call to next.

Specified by:
remove in interface Iterator

peek

public Object peek()
Returns the next token, without removing it, from the Tokenizer, so that the same token will be again returned on the next call to next() or peek().


tokenize

public List tokenize()
Returns all tokens of this Tokenizer as a List for convenience.



Stanford NLP Group