public class PennTreeReader extends Object implements TreeReader
TreeReader
interface to read Penn Treebank-style
files. The reader is implemented as a push-down automaton (PDA) that parses the Lisp-style
format in which the trees are stored. This reader is compatible with both PTB
and PATB trees.
PennTreeReader
silently replaces \* with * and \/ with /. Two possible designs
for this were to make the PennTreeReader
always do
this or to make the TreeNormalizers
do this. We
decided to put it in the PennTreeReader
class itself
to avoid the problem of people making new
TreeNormalizers
and forgetting to include the
unescaping.Constructor and Description |
---|
PennTreeReader(Reader in)
Read parse trees from a
Reader . |
PennTreeReader(Reader in,
TreeFactory tf)
Read parse trees from a
Reader . |
PennTreeReader(Reader in,
TreeFactory tf,
TreeNormalizer tn)
Read parse trees from a Reader.
|
PennTreeReader(Reader in,
TreeFactory tf,
TreeNormalizer tn,
Tokenizer<String> st)
Read parse trees from a Reader.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes the underlying
Reader used to create this
class. |
static void |
main(String[] args)
Loads treebank data from first argument and prints it.
|
Tree |
readTree()
Reads a single tree in standard Penn Treebank format from the
input stream.
|
public PennTreeReader(Reader in)
Reader
.
For the defaulted arguments, you get a
SimpleTreeFactory
, no TreeNormalizer
, and
a PennTreebankTokenizer
.in
- The Reader
public PennTreeReader(Reader in, TreeFactory tf)
Reader
.in
- the Readertf
- TreeFactory -- factory to create some kind of Treepublic PennTreeReader(Reader in, TreeFactory tf, TreeNormalizer tn)
in
- Readertf
- TreeFactory -- factory to create some kind of Treetn
- the method of normalizing treespublic PennTreeReader(Reader in, TreeFactory tf, TreeNormalizer tn, Tokenizer<String> st)
in
- Readertf
- TreeFactory -- factory to create some kind of Treetn
- the method of normalizing treesst
- Tokenizer that divides up Readerpublic Tree readTree() throws IOException
IOException
.
Note that the method will skip malformed trees and attempt to
read additional trees from the input stream. It is possible, however,
that a malformed tree will corrupt the token stream. In this case,
an IOException
will eventually be thrown.
readTree
in interface TreeReader
null
at end of token stream.IOException
- If I/O problempublic void close() throws IOException
Reader
used to create this
class.close
in interface TreeReader
close
in interface Closeable
close
in interface AutoCloseable
IOException
public static void main(String[] args)
args
- Array of command-line arguments: specifies a filename