edu.stanford.nlp.trees
Class PennTreeReader

java.lang.Object
  extended by edu.stanford.nlp.trees.PennTreeReader
All Implemented Interfaces:
TreeReader
Direct Known Subclasses:
FragDiscardingPennTreeReader

public class PennTreeReader
extends Object
implements TreeReader

This class implements the TreeReader interface to read Penn Treebank-style files. The reader is implemented as a pushdown automaton (PDA) that parses the Lisp-style format in which the trees are stored. This reader is compatible with both PTB and PATB trees.

Author:
Christopher Manning, Roger Levy, Spence Green

Constructor Summary
PennTreeReader(Reader in)
          Read parse trees from a Reader.
PennTreeReader(Reader in, TreeFactory tf)
          Read parse trees from a Reader.
PennTreeReader(Reader in, TreeFactory tf, TreeNormalizer tn)
          Read parse trees from a Reader.
PennTreeReader(Reader in, TreeFactory tf, TreeNormalizer tn, Tokenizer<String> st)
          Read parse trees from a Reader.
 
Method Summary
 void close()
          Closes the underlying Reader used to create this class.
static void main(String[] args)
          Loads treebank data from first argument and prints it.
 Tree readTree()
          Reads a single tree in standard Penn Treebank format from the input stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PennTreeReader

public PennTreeReader(Reader in)
Read parse trees from a Reader. For the defaulted arguments, you get a SimpleTreeFactory, no TreeNormalizer, and a PennTreebankTokenizer.

Parameters:
in - The Reader

PennTreeReader

public PennTreeReader(Reader in,
                      TreeFactory tf)
Read parse trees from a Reader.

Parameters:
in - the Reader
tf - TreeFactory -- factory to create some kind of Tree

PennTreeReader

public PennTreeReader(Reader in,
                      TreeFactory tf,
                      TreeNormalizer tn)
Read parse trees from a Reader.

Parameters:
in - Reader
tf - TreeFactory -- factory to create some kind of Tree
tn - the method of normalizing trees

PennTreeReader

public PennTreeReader(Reader in,
                      TreeFactory tf,
                      TreeNormalizer tn,
                      Tokenizer<String> st)
Read parse trees from a Reader.

Parameters:
in - Reader
tf - TreeFactory -- factory to create some kind of Tree
tn - the method of normalizing trees
st - Tokenizer that divides up Reader
Method Detail

readTree

public Tree readTree()
              throws IOException
Reads a single tree in standard Penn Treebank format from the input stream. The method supports additional parentheses around the tree (an unnamed ROOT node) so long as they are balanced. If the token stream ends before the current tree is complete, then the method will throw an IOException.

Note that the method will skip malformed trees and attempt to read additional trees from the input stream. It is possible, however, that a malformed tree will corrupt the token stream. In this case, an IOException will eventually be thrown.

Specified by:
readTree in interface TreeReader
Returns:
A single tree, or null at end of token stream.
Throws:
IOException

close

public void close()
           throws IOException
Closes the underlying Reader used to create this class.

Specified by:
close in interface TreeReader
Throws:
IOException

main

public static void main(String[] args)
Loads treebank data from first argument and prints it.

Parameters:
args - Array of command-line arguments: specifies a filename


Stanford NLP Group