edu.stanford.nlp.trees.international
Class CHTBTokenizer

java.lang.Object
  |
  +--edu.stanford.nlp.trees.international.CHTBTokenizer
All Implemented Interfaces:
StreamTokenizer

public class CHTBTokenizer
extends Object
implements StreamTokenizer

A simple tokenizer for tokenizing Penn Chinese Treebank files. A token is any parenthesis, node label, or terminal. All SGML content of the files is ignored.

Author:
Roger Levy

Constructor Summary
CHTBTokenizer(Reader r)
          Constructs a new tokenizer from a Reader.
 
Method Summary
static void main(String[] args)
          The main() method tokenizes a file in the specified Encoding and prints it to standard output in the specified Encoding.
 String next()
          Returns the next token.
 void pushBack()
          Pushes the previous token back.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CHTBTokenizer

public CHTBTokenizer(Reader r)
Constructs a new tokenizer from a Reader. Note that getting the bytes going into the Reader into Java-internal Unicode is not the tokenizer's job. This can be done by converting the file with ConvertEncodingThread, or by specifying the files encoding explicitly in the Reader with java.io.InputStreamReader.

Parameters:
r - Reader
Method Detail

next

public String next()
            throws IOException
Returns the next token. Satisfies the edu.stanford.nlp.io.StreamTokenizer interface.

Specified by:
next in interface StreamTokenizer
Returns:
the next token as a String.
Throws:
IOException

pushBack

public void pushBack()
Pushes the previous token back. Satisfies the edu.stanford.nlp.io.StreamTokenizer interface.

Specified by:
pushBack in interface StreamTokenizer

main

public static void main(String[] args)
                 throws IOException
The main() method tokenizes a file in the specified Encoding and prints it to standard output in the specified Encoding. Its arguments are (Infile, Encoding).

IOException


Stanford NLP Group