edu.stanford.nlp.process
Class DummyTokenizer

java.lang.Object
  extended byedu.stanford.nlp.process.AbstractTokenizer
      extended byedu.stanford.nlp.process.DummyTokenizer
All Implemented Interfaces:
Iterator, Tokenizer

public class DummyTokenizer
extends AbstractTokenizer

A Tokenizer that splits only on white space (spaces, tabs, and carriage returns). Essentially, assumes that the input is pre-tokenized. Currently doesn't support peek() or remove(), although it might in the future.

Author:
Roger Levy (rog@stanford.edu)

Field Summary
 
Fields inherited from class edu.stanford.nlp.process.AbstractTokenizer
nextToken
 
Constructor Summary
DummyTokenizer(Reader r)
          Constructs a new DummyTokenizer that treats carriage returns as normal whitespace.
 
Method Summary
static TokenizerFactory factory()
           
protected  Object getNext()
          Gets the next valid token from the input stream.
static void main(String[] args)
          Reads a file from the argument and prints its tokens one per line.
 
Methods inherited from class edu.stanford.nlp.process.AbstractTokenizer
hasNext, next, peek, remove, tokenize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DummyTokenizer

public DummyTokenizer(Reader r)
Constructs a new DummyTokenizer that treats carriage returns as normal whitespace.

Method Detail

getNext

protected Object getNext()
Gets the next valid token from the input stream.

Specified by:
getNext in class AbstractTokenizer

factory

public static TokenizerFactory factory()

main

public static void main(String[] args)
                 throws IOException
Reads a file from the argument and prints its tokens one per line. This is mainly as a testing aid, but it can also be quite useful standalone to turn a corpus into a one token per line file of tokens.

Usage: java edu.stanford.nlp.process.DummyTokenizer filename

Parameters:
args - Command line arguments
Throws:
IOException


Stanford NLP Group