edu.stanford.nlp.process
Class SimpleTokenizer

java.lang.Object
  extended byedu.stanford.nlp.process.AbstractTokenizer
      extended byedu.stanford.nlp.process.SimpleTokenizer
All Implemented Interfaces:
Iterator, Tokenizer

public class SimpleTokenizer
extends AbstractTokenizer

Simple Tokenizer implementation that wraps a StreamTokenizer.

Author:
Joseph Smarr (jsmarr@stanford.edu), Teg Grenager (grenager@stanford.edu)

Field Summary
 
Fields inherited from class edu.stanford.nlp.process.AbstractTokenizer
nextToken
 
Constructor Summary
SimpleTokenizer(Reader r)
          Constructs a new SimpleTokenizer with the Reader r as its source.
SimpleTokenizer(Reader r, boolean eolIsSignificant)
          Constructs a new SimpleTokenizer with the Reader r as its source.
 
Method Summary
static TokenizerFactory factory()
           
static TokenizerFactory factory(boolean eolIsSignificant)
           
protected  Object getNext()
           
static void main(String[] args)
          Reads a file from the argument and prints its tokens one per line.
 
Methods inherited from class edu.stanford.nlp.process.AbstractTokenizer
hasNext, next, peek, remove, tokenize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleTokenizer

public SimpleTokenizer(Reader r)
Constructs a new SimpleTokenizer with the Reader r as its source.


SimpleTokenizer

public SimpleTokenizer(Reader r,
                       boolean eolIsSignificant)
Constructs a new SimpleTokenizer with the Reader r as its source.

Method Detail

getNext

protected Object getNext()
Specified by:
getNext in class AbstractTokenizer

factory

public static TokenizerFactory factory()

factory

public static TokenizerFactory factory(boolean eolIsSignificant)

main

public static void main(String[] args)
                 throws IOException
Reads a file from the argument and prints its tokens one per line. This is mainly as a testing aid, but it can also be quite useful standalone to turn a corpus into a one token per line file of tokens.

Usage: java edu.stanford.nlp.process.SimpleTokenizer filename

Parameters:
args - Command line arguments
Throws:
IOException


Stanford NLP Group