edu.stanford.nlp.process
Class WhitespaceTokenizer
java.lang.Object
edu.stanford.nlp.process.AbstractTokenizer
edu.stanford.nlp.process.WhitespaceTokenizer
- All Implemented Interfaces:
- Tokenizer, java.util.Iterator
public class WhitespaceTokenizer
- extends AbstractTokenizer
Simple Tokenizer implementation that tokenizes on whitespace.
This implementation returns Word objects. It has a parameter for whether
to make EOL a token. If it is, it is return as a Word with String value
"\n".
Constructor Summary |
WhitespaceTokenizer(java.io.Reader r)
Constructs a new WhitespaceTokenizer with the Reader r as its source. |
WhitespaceTokenizer(java.io.Reader r,
boolean eolIsSignificant)
Constructs a new WhitespaceTokenizer with the Reader r as its source. |
Method Summary |
static TokenizerFactory |
factory()
|
static TokenizerFactory |
factory(boolean eolIsSignificant)
|
protected java.lang.Object |
getNext()
Internally fetches the next token. |
static void |
main(java.lang.String[] args)
Reads a file from the argument and prints its tokens one per line. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
WhitespaceTokenizer
public WhitespaceTokenizer(java.io.Reader r)
- Constructs a new WhitespaceTokenizer with the Reader r as its source.
WhitespaceTokenizer
public WhitespaceTokenizer(java.io.Reader r,
boolean eolIsSignificant)
- Constructs a new WhitespaceTokenizer with the Reader r as its source.
getNext
protected java.lang.Object getNext()
- Internally fetches the next token.
- Specified by:
getNext
in class AbstractTokenizer
- Returns:
- the next token in the token stream, or null if none exists.
factory
public static TokenizerFactory factory()
factory
public static TokenizerFactory factory(boolean eolIsSignificant)
main
public static void main(java.lang.String[] args)
throws java.io.IOException
- Reads a file from the argument and prints its tokens one per line.
This is mainly as a testing aid, but it can also be quite useful
standalone to turn a corpus into a one token per line file of tokens.
Usage:
java edu.stanford.nlp.process.WhitespaceTokenizer filename
- Parameters:
args
- Command line arguments
- Throws:
java.io.IOException