edu.stanford.nlp.trees
Class DiskTreebank

java.lang.Object
  extended by java.util.AbstractCollection<Tree>
      extended by edu.stanford.nlp.trees.Treebank
          extended by edu.stanford.nlp.trees.DiskTreebank
All Implemented Interfaces:
Iterable<Tree>, Collection<Tree>

public final class DiskTreebank
extends Treebank

A DiskTreebank is a Collection of Trees. A DiskTreebank object stores merely the information to get at a corpus of trees that is stored on disk. Access is usually via apply()'ing a TreeVisitor to each Tree in the Treebank or by using an iterator() to get an iteration over the Trees.

If the root Label of the Tree objects built by the TreeReader implements HasIndex, then the filename and index of the tree in a corpus will be inserted as they are read in.

Author:
Christopher Manning

Field Summary
 
Fields inherited from class edu.stanford.nlp.trees.Treebank
DEFAULT_TREE_FILE_SUFFIX
 
Constructor Summary
DiskTreebank()
          Create a new DiskTreebank.
DiskTreebank(int initialCapacity)
          Create a new Treebank.
DiskTreebank(int initialCapacity, TreeReaderFactory trf)
          Create a new Treebank.
DiskTreebank(String encoding)
          Create a new treebank, set the encoding for file access.
DiskTreebank(TreeReaderFactory trf)
          Create a new DiskTreebank.
DiskTreebank(TreeReaderFactory trf, String encoding)
          Create a new DiskTreebank.
 
Method Summary
 void apply(TreeVisitor tp)
          Applies the TreeVisitor to to all trees in the Treebank.
 void clear()
          Empty a Treebank.
 File getCurrentFile()
          Return the File from which trees are currently being read by an Iterator or apply() and passed to a TreePprocessor.
 Iterator<Tree> iterator()
          Return an Iterator over Trees in the Treebank.
 void loadPath(File path, FileFilter filt)
          Load trees from given directory.
static void main(String[] args)
          Loads treebank and prints it.
static void sentenceLengths(Treebank treebank, String name, String range, PrintWriter pw)
           
 
Methods inherited from class edu.stanford.nlp.trees.Treebank
decimate, encoding, loadPath, loadPath, loadPath, loadPath, loadPath, remove, size, textualSummary, textualSummary, toString, transform, treeReaderFactory
 
Methods inherited from class java.util.AbstractCollection
add, addAll, contains, containsAll, isEmpty, removeAll, retainAll, toArray, toArray
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.util.Collection
equals, hashCode
 

Constructor Detail

DiskTreebank

public DiskTreebank()
Create a new DiskTreebank. The trees are made with a LabeledScoredTreeReaderFactory.

Compatibility note: Until Sep 2004, this used to create a Treebank with a SimpleTreeReaderFactory, but this was changed as the old default wasn't very useful, especially to naive users.


DiskTreebank

public DiskTreebank(String encoding)
Create a new treebank, set the encoding for file access.

Parameters:
encoding - The charset encoding to use for treebank file decoding

DiskTreebank

public DiskTreebank(TreeReaderFactory trf)
Create a new DiskTreebank.

Parameters:
trf - the factory class to be called to create a new TreeReader

DiskTreebank

public DiskTreebank(TreeReaderFactory trf,
                    String encoding)
Create a new DiskTreebank.

Parameters:
trf - the factory class to be called to create a new TreeReader
encoding - The charset encoding to use for treebank file decoding

DiskTreebank

public DiskTreebank(int initialCapacity)
Create a new Treebank. The trees are made with a LabeledScoredTreeReaderFactory.

Compatibility note: Until Sep 2004, this used to create a Treebank with a SimpleTreeReaderFactory, but this was changed as the old default wasn't very useful, especially to naive users.

Parameters:
initialCapacity - The initial size of the underlying Collection. For a DiskTreebank, this parameter is ignored.

DiskTreebank

public DiskTreebank(int initialCapacity,
                    TreeReaderFactory trf)
Create a new Treebank.

Parameters:
initialCapacity - The initial size of the underlying Collection, For a DiskTreebank, this parameter is ignored.
trf - the factory class to be called to create a new TreeReader
Method Detail

clear

public void clear()
Empty a Treebank.

Specified by:
clear in interface Collection<Tree>
Specified by:
clear in class Treebank

loadPath

public void loadPath(File path,
                     FileFilter filt)
Load trees from given directory. This version just records the paths to be processed, and actually processes them at apply time.

Specified by:
loadPath in class Treebank
Parameters:
path - file or directory to load from
filt - a FilenameFilter of files to load

apply

public void apply(TreeVisitor tp)
Applies the TreeVisitor to to all trees in the Treebank.

Specified by:
apply in class Treebank
Parameters:
tp - A class that can process trees.

getCurrentFile

public File getCurrentFile()
Return the File from which trees are currently being read by an Iterator or apply() and passed to a TreePprocessor.

This is useful if one wants to map the original file and directory structure over to a set of modified trees. New code might prefer to build trees with labels that implement HasIndex.

Returns:
the file that trees are currently being read from, or null if no file is currently open

iterator

public Iterator<Tree> iterator()
Return an Iterator over Trees in the Treebank. This is implemented by building per-file MemoryTreebanks for the files in the DiskTreebank. As such, it isn't as efficient as using apply().

Specified by:
iterator in interface Iterable<Tree>
Specified by:
iterator in interface Collection<Tree>
Specified by:
iterator in class AbstractCollection<Tree>

main

public static void main(String[] args)
                 throws IOException
Loads treebank and prints it. All files below the designated filePath within the given number range if any are loaded. You can normalize the trees or not (English-specific) and print trees one per line up to a certain length (for EVALB).

Usage: java edu.stanford.nlp.trees.DiskTreebank [-maxLength n|-normalize|-treeReaderFactory class] filePath [numberRanges]

Parameters:
args - Array of command-line arguments
Throws:
IOException - If there is a treebank file access problem

sentenceLengths

public static void sentenceLengths(Treebank treebank,
                                   String name,
                                   String range,
                                   PrintWriter pw)


Stanford NLP Group