edu.stanford.nlp.process
Interface Tokenizer

All Superinterfaces:
Iterator
All Known Implementing Classes:
AbstractTokenizer

public interface Tokenizer
extends Iterator

Tokenizers break up text into individual Objects. These objects may be Strings, Words, or other Objects. This Tokenizer interface allows the source to be set upon construction, or to be reset using the setSource(Reader r) method. Thus each Tokenizer instance may be used for one data source or several.

Author:
Teg Grenager (grenager@stanford.edu)

Method Summary
 boolean hasNext()
          Returns true if this Tokenizer has more elements.
 Object next()
          Returns the next token from this Tokenizer.
 void pushBack()
          Pushes the last token returned back on this Tokenizer, so that it will be returned again in the next call to next().
 void remove()
          Removes from the underlying collection the last element returned by the iterator (optional operation).
 void setSource(Reader r)
          Sets the source for this Tokenizer.
 List tokenize()
          Returns all tokens of this Tokenizer as a List for convenience.
 

Method Detail

next

public Object next()
Returns the next token from this Tokenizer.

Specified by:
next in interface Iterator

hasNext

public boolean hasNext()
Returns true if this Tokenizer has more elements.

Specified by:
hasNext in interface Iterator

remove

public void remove()
Removes from the underlying collection the last element returned by the iterator (optional operation). This method can be called only once per call to next.

Specified by:
remove in interface Iterator

pushBack

public void pushBack()
Pushes the last token returned back on this Tokenizer, so that it will be returned again in the next call to next().


tokenize

public List tokenize()
Returns all tokens of this Tokenizer as a List for convenience.


setSource

public void setSource(Reader r)
Sets the source for this Tokenizer.



Stanford NLP Group