edu.stanford.nlp.trees
Class PennTreeReader

java.lang.Object
  extended by edu.stanford.nlp.trees.PennTreeReader
All Implemented Interfaces:
TreeReader

public class PennTreeReader
extends Object
implements TreeReader

A PennTreeReader is a TreeReader that reads in Penn Treebank-style files. Example usage:
TreeReader tr = new PennTreeReader(new BufferedReader(new InputStreamReader(new FileInputStream(file),"UTF-8")), myTreeFactory);

Author:
Christopher Manning, Roger Levy

Constructor Summary
PennTreeReader(Reader in)
          Read parse trees from a Reader.
PennTreeReader(Reader in, Tokenizer<String> st)
          Read parse trees from a Reader.
PennTreeReader(Reader in, TreeFactory tf)
          Read parse trees from a Reader.
PennTreeReader(Reader in, TreeFactory tf, TreeNormalizer tn)
          Read parse trees from a Reader.
PennTreeReader(Reader in, TreeFactory tf, TreeNormalizer tn, Tokenizer<String> st)
          Read parse trees from a Reader.
 
Method Summary
 Iterator<Tree> asTreeIterator()
          Returns an iterator over Trees which is backed by this PennTreeReader.
 void close()
          Close the Reader behind this TreeReader.
static void main(String[] args)
          Loads treebank data from first argument and prints it.
 Tree readTree()
          Reads a single tree in standard Penn Treebank format, with or without an additional set of parens around it (an unnamed ROOT node).
static TokenizerFactory<Tree> tokenizerFactory(TreeFactory tf, TreeNormalizer tn, Tokenizer<String> stringTokenizer)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PennTreeReader

public PennTreeReader(Reader in)
Read parse trees from a Reader. For the defaulted arguments, you get a SimpleTreeFactory, no TreeNormalizer, and a PennTreebankTokenizer.

Parameters:
in - The Reader

PennTreeReader

public PennTreeReader(Reader in,
                      TreeFactory tf)
Read parse trees from a Reader.

Parameters:
in - the Reader
tf - TreeFactory -- factory to create some kind of Tree

PennTreeReader

public PennTreeReader(Reader in,
                      Tokenizer<String> st)
Read parse trees from a Reader.

Parameters:
in - The Reader
st - The Tokenizer

PennTreeReader

public PennTreeReader(Reader in,
                      TreeFactory tf,
                      TreeNormalizer tn)
Read parse trees from a Reader.

Parameters:
in - Reader
tf - TreeFactory -- factory to create some kind of Tree
tn - the method of normalizing trees

PennTreeReader

public PennTreeReader(Reader in,
                      TreeFactory tf,
                      TreeNormalizer tn,
                      Tokenizer<String> st)
Read parse trees from a Reader.

Parameters:
in - Reader
tf - TreeFactory -- factory to create some kind of Tree
tn - the method of normalizing trees
st - Tokenizer that divides up Reader
Method Detail

readTree

public Tree readTree()
              throws IOException
Reads a single tree in standard Penn Treebank format, with or without an additional set of parens around it (an unnamed ROOT node). If the token stream ends before the current tree is complete, a NoSuchElementException will get thrown from deep within the innards of this method.

Specified by:
readTree in interface TreeReader
Returns:
A single tree, or null at end of token stream.
Throws:
IOException

close

public void close()
           throws IOException
Close the Reader behind this TreeReader.

Specified by:
close in interface TreeReader
Throws:
IOException

tokenizerFactory

public static TokenizerFactory<Tree> tokenizerFactory(TreeFactory tf,
                                                      TreeNormalizer tn,
                                                      Tokenizer<String> stringTokenizer)

asTreeIterator

public Iterator<Tree> asTreeIterator()
Returns an iterator over Trees which is backed by this PennTreeReader. Warning: any IOExceptions which would normally be thrown are turned into RuntimeExceptions.


main

public static void main(String[] args)
Loads treebank data from first argument and prints it.

Parameters:
args - Array of command-line arguments: specifies a filename


Stanford NLP Group