edu.stanford.nlp.process
Class DummyTokenizer
java.lang.Object
edu.stanford.nlp.process.AbstractTokenizer
edu.stanford.nlp.process.DummyTokenizer
- All Implemented Interfaces:
- Iterator, Tokenizer
- public class DummyTokenizer
- extends AbstractTokenizer
A Tokenizer that splits only on white space (spaces, tabs, and
carriage returns). Essentially, assumes that the input is
pre-tokenized. Currently doesn't support peek() or remove(),
although it might in the future.
- Author:
- Roger Levy (rog@stanford.edu)
Constructor Summary |
DummyTokenizer(Reader r)
Constructs a new DummyTokenizer that treats carriage returns as normal whitespace. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DummyTokenizer
public DummyTokenizer(Reader r)
- Constructs a new DummyTokenizer that treats carriage returns as normal whitespace.
getNext
protected Object getNext()
- Gets the next valid token from the input stream.
- Specified by:
getNext
in class AbstractTokenizer
factory
public static TokenizerFactory factory()
main
public static void main(String[] args)
throws IOException
- Reads a file from the argument and prints its tokens one per line.
This is mainly as a testing aid, but it can also be quite useful
standalone to turn a corpus into a one token per line file of tokens.
Usage: java edu.stanford.nlp.process.DummyTokenizer filename
- Parameters:
args
- Command line arguments
- Throws:
IOException
Stanford NLP Group