Package edu.stanford.nlp.trees

A package for (NLP) trees, sentences, and similar things.

See:
          Description

Interface Summary
ConstituentFactory A ConstituentFactory is a factory for creating objects of class Constituent, or some descendent class.
Dependency<G extends Label,D extends Label,N> An individual dependency between a governor and a dependent.
DependencyFactory A factory for dependencies of a certain type.
DependencyPrinter  
DependencyReader  
DependencyTyper<T> A generified interface for making some kind of dependency object between a head and dependent.
GrammaticalStructureFromDependenciesFactory An interface for a factory that builds a GrammaticalStructure from a list of TypedDependencies and a TreeGraphNode.
HeadFinder An interface for finding the "head" daughter of a phrase structure tree.
Labeled Interface for Objects which have a Label.
TreebankFactory An interface for treebank vendors.
TreebankLanguagePack This interface specifies language/treebank specific information for a Treebank, which a parser or other treebank user might need to know.
TreeFactory A TreeFactory acts as a factory for creating objects of class Tree, or some descendant class.
TreeReader A TreeReader adds functionality to another Reader by reading in Trees, or some descendant class.
TreeReaderFactory A TreeReaderFactory is a factory for creating objects of class TreeReader, or some descendant class.
TreeTransformer This is a simple interface for a function that alters a local Tree.
TreeVisitor This is a simple strategy-type interface for operations that are applied to Tree.
WordNetConnection Allows us to verify that a wordnet connection is available without compile time errors if the package is not found.
 

Class Summary
AbstractCollinsHeadFinder A base class for Head Finders similar to the one described in Michael Collins' 1999 thesis.
AbstractTreebankLanguagePack This provides an implementation of parts of the TreebankLanguagePack API to reduce the load on fresh implementations.
BobChrisTreeNormalizer Normalizes trees in the way used in Manning and Carpenter 1997.
BobChrisTreeNormalizer.AOverAFilter  
BobChrisTreeNormalizer.EmptyFilter  
CollinsHeadFinder Implements the HeadFinder found in Michael Collins' 1999 thesis.
CollocationFinder Finds WordNet collocations in parse trees.
CompositeTreebank  
CompositeTreeTransformer A TreeTransformer that applies component TreeTransformers in order.
Constituent A Constituent object defines a generic edge in a graph.
CoordinationTransformer Coordination transformer transforms a PennTreebank tree containing a coordination in a flat structure in order to get the dependencies right.
Dependencies Utilities for Dependency objects.
Dependencies.DependentPuncTagRejectFilter<G extends Label,D extends Label,N>  
Dependencies.DependentPuncWordRejectFilter<G extends Label,D extends Label,N>  
DependencyTreeTransformer Transforms an English structure parse tree in order to get the dependencies right: -- put a ROOT node -- remove NONE nodes -- retain only NP-TMP and NP-ADV tags (Note [cdm]: A lot of this overlaps other existing functionality in trees.
DiskTreebank A DiskTreebank is a Collection of Trees.
EnglishGrammaticalRelations EnglishGrammaticalRelations is a set of GrammaticalRelation objects for the English language.
EnglishGrammaticalRelations.AbbreviationModifierGRAnnotation  
EnglishGrammaticalRelations.AdjectivalComplementGRAnnotation  
EnglishGrammaticalRelations.AdjectivalModifierGRAnnotation  
EnglishGrammaticalRelations.AdvClauseModifierGRAnnotation  
EnglishGrammaticalRelations.AdverbialModifierGRAnnotation  
EnglishGrammaticalRelations.AgentGRAnnotation  
EnglishGrammaticalRelations.AppositionalModifierGRAnnotation  
EnglishGrammaticalRelations.ArgumentGRAnnotation  
EnglishGrammaticalRelations.AttributiveGRAnnotation  
EnglishGrammaticalRelations.AuxModifierGRAnnotation  
EnglishGrammaticalRelations.AuxPassiveGRAnnotation  
EnglishGrammaticalRelations.ClausalComplementGRAnnotation  
EnglishGrammaticalRelations.ClausalPassiveSubjectGRAnnotation  
EnglishGrammaticalRelations.ClausalSubjectGRAnnotation  
EnglishGrammaticalRelations.ComplementGRAnnotation  
EnglishGrammaticalRelations.ComplementizerGRAnnotation  
EnglishGrammaticalRelations.ConjunctGRAnnotation  
EnglishGrammaticalRelations.ControllingSubjectGRAnnotation  
EnglishGrammaticalRelations.CoordinationGRAnnotation  
EnglishGrammaticalRelations.CopulaGRAnnotation  
EnglishGrammaticalRelations.DeterminerGRAnnotation  
EnglishGrammaticalRelations.DirectObjectGRAnnotation  
EnglishGrammaticalRelations.ExpletiveGRAnnotation  
EnglishGrammaticalRelations.IndirectObjectGRAnnotation  
EnglishGrammaticalRelations.InfinitivalModifierGRAnnotation  
EnglishGrammaticalRelations.MarkerGRAnnotation  
EnglishGrammaticalRelations.ModifierGRAnnotation  
EnglishGrammaticalRelations.MultiWordExpressionGRAnnotation  
EnglishGrammaticalRelations.NegationModifierGRAnnotation  
EnglishGrammaticalRelations.NominalPassiveSubjectGRAnnotation  
EnglishGrammaticalRelations.NominalSubjectGRAnnotation  
EnglishGrammaticalRelations.NounCompoundModifierGRAnnotation  
EnglishGrammaticalRelations.NpAdverbialModifierGRAnnotation  
EnglishGrammaticalRelations.NumberModifierGRAnnotation  
EnglishGrammaticalRelations.NumericModifierGRAnnotation  
EnglishGrammaticalRelations.ObjectGRAnnotation  
EnglishGrammaticalRelations.ParataxisGRAnnotation  
EnglishGrammaticalRelations.ParticipialModifierGRAnnotation  
EnglishGrammaticalRelations.PhrasalVerbParticleGRAnnotation  
EnglishGrammaticalRelations.PossessionModifierGRAnnotation  
EnglishGrammaticalRelations.PossessiveModifierGRAnnotation  
EnglishGrammaticalRelations.PreconjunctGRAnnotation  
EnglishGrammaticalRelations.PredeterminerGRAnnotation  
EnglishGrammaticalRelations.PredicateGRAnnotation  
EnglishGrammaticalRelations.PrepositionalComplementGRAnnotation  
EnglishGrammaticalRelations.PrepositionalModifierGRAnnotation  
EnglishGrammaticalRelations.PrepositionalObjectGRAnnotation  
EnglishGrammaticalRelations.PunctuationGRAnnotation  
EnglishGrammaticalRelations.PurposeClauseModifierGRAnnotation  
EnglishGrammaticalRelations.QuantifierModifierGRAnnotation  
EnglishGrammaticalRelations.ReferentGRAnnotation  
EnglishGrammaticalRelations.RelativeClauseModifierGRAnnotation  
EnglishGrammaticalRelations.RelativeGRAnnotation  
EnglishGrammaticalRelations.SemanticDependentGRAnnotation  
EnglishGrammaticalRelations.SubjectGRAnnotation  
EnglishGrammaticalRelations.TemporalModifierGRAnnotation  
EnglishGrammaticalRelations.XClausalComplementGRAnnotation  
EnglishGrammaticalStructure A GrammaticalStructure for English.
EnglishGrammaticalStructure.FromDependenciesFactory  
GrammaticalRelation GrammaticalRelation is used to define a standardized, hierarchical set of grammatical relations, together with patterns for identifying them in parse trees.
GrammaticalRelation.DependentGRAnnotation  
GrammaticalRelation.GovernorGRAnnotation  
GrammaticalRelation.GrammaticalRelationAnnotation  
GrammaticalRelation.KillGRAnnotation  
GrammaticalRelation.RootGRAnnotation  
GrammaticalStructure A GrammaticalStructure is a TreeGraph (that is, a tree with additional labeled arcs between nodes) for representing the grammatical relations in a parse tree.
GrammaticalStructureFactory A general factory for GrammaticalStructure objects.
LabeledConstituent A LabeledConstituent object represents a single bracketing in a derivation, including start and end points and Label information, but excluding probabilistic information.
LabeledScoredTreeFactory A LabeledScoredTreeFactory acts as a factory for creating trees with labels and scores.
LabeledScoredTreeNode A LabeledScoredTreeNode represents a tree composed of a root label, a score, and an array of daughter parse trees.
LabeledScoredTreeReaderFactory This class implements a TreeReaderFactory that produces labeled, scored array-based Trees, which have been cleaned up to delete empties, etc.
MemoryTreebank A MemoryTreebank object stores a corpus of examples with given tree structures in memory (as a List).
ModCollinsHeadFinder Implements a variant on the HeadFinder found in Michael Collins' 1999 thesis.
NamedDependency An individual dependency between a head and a dependent.
NPTmpRetainingTreeNormalizer Same TreeNormalizer as BobChrisTreeNormalizer, but optionally provides four extras.
NPTmpRetainingTreeNormalizer.NPTmpRetainingTreeReaderFactory Implementation of TreeReaderFactory, mainly for convenience of constructing by reflection
PennTreebankLanguagePack Specifies the treebank/language specific components needed for parsing the English Penn Treebank.
PennTreebankTokenizer Builds a tokenizer for English PennTreebank (release 2) trees.
PennTreeReader This class implements the TreeReader interface to read Penn Treebank-style files.
PennTreeReaderFactory Vends PennTreeReader objects.
QPTreeTransformer Transforms an English structure parse tree in order to get the dependencies right: Adds an extra structure in QP phrases: (QP (RB well) (IN over) (CD 9)) becomes (QP (XS (RB well) (IN over)) (CD 9))
SemanticHeadFinder Implements a 'semantic head' variant of the the HeadFinder found in Michael Collins' 1999 thesis.
SimpleConstituent A SimpleConstituent object defines a generic edge in a graph.
SimpleConstituentFactory A ConstituentFactory acts as a factory for creating objects of class Constituent, or some descendent class.
SimpleTree A SimpleTree is a minimal concrete implementation of an unlabeled, unscored Tree.
SimpleTreeFactory A SimpleTreeFactory acts as a factory for creating objects of class SimpleTree.
TransformingTreebank This class wraps another Treebank, and will vend trees that have been through a TreeTransformer.
Tree The abstract class Tree is used to collect all of the tree types, and acts as a generic extendable type.
Treebank A Treebank object provides access to a corpus of examples with given tree structures.
TreeCoreAnnotations Set of common annotations for CoreMaps that require classes from the trees package.
TreeCoreAnnotations.HeadTagAnnotation The standard key for storing a head tag in the map as a pointer to another node.
TreeCoreAnnotations.HeadWordAnnotation The standard key for storing a head word in the map as a pointer to another node.
TreeCoreAnnotations.TreeAnnotation The CoreMap key for getting the syntactic parse tree of a sentence.
TreeFunctions This is a utility class which vends tree transformers to translate trees from one factory type to trees of another.
TreeGraph A TreeGraph is a tree with additional directed, labeled arcs between arbitrary pairs of nodes.
TreeGraphNode A "treegraph" is a tree with additional directed, labeled arcs between arbitrary pairs of nodes.
TreeGraphNodeFactory A TreeGraphNodeFactory acts as a factory for creating nodes in a TreeGraph.
TreeNormalizer A class for tree normalization.
TreePrint A class for customizing the print method(s) for a edu.stanford.nlp.trees.Tree as the output of the parser.
Trees Various static utilities for the Tree class.
TreeTokenizerFactory Wrapper for TreeReaderFactory.
TypedDependency A TypedDependency is a relation between two words in a GrammaticalStructure.
UnnamedConcreteDependency An individual dependency between a head and a dependent.
UnnamedDependency An individual dependency between a head and a dependent.
WordStemmer Stems the Words in a Tree using Morphology.
 

Enum Summary
GrammaticalRelation.Language  
 

Package edu.stanford.nlp.trees Description

A package for (NLP) trees, sentences, and similar things. This package provides several key abstractions (via abstract classes) and a number of further classes for related objects. Most of these classes use a Factory pattern to instantiate objects.

A Label is something that can be the label of a Tree or a Constituent. The simplest label is a StringLabel. A Word or a TaggedWord is a Label. They can be constructed with a LabelFactory. A Label often implements various interfaces, such as HasWord.

A Constituent object defines a generic edge in a graph. It has a start and end, and usually a Label. A ConstituentFactory builds a Constituent.

A Tree object provides generic facilities for manipulating NLP trees. A TreeFactory can build a Tree. A Treebank provides an interface to a collection of parsed sentences (normally found on disk as a corpus). A TreeReader reads trees from an InputStream. A TreeReaderFactory builds a TreeReader. A TreeNormalizer canonicalizes a Tree on input from a File. A HeadFinder finds the head daughter of a Tree. The TreeProcessor interface is for general sequential processing of trees, and the TreeTransformer interface is for changing them.

A Sentence is a subclass of an ArrayList. A Sentencebank provides an interface to a large number of sentences (normally found on disk as a corpus). A SentenceReader reads sentences from an InputStream. A SentenceReaderFactory builds a SentenceReader. A SentenceNormalizer canonicalizes a Sentence on input from a File. The SentenceProcessor interface is for general sequential processing of sentences.

There are also various subclasses of StreamTokenizer. The class PairFinder should probably be removed to samples.

Design notes: This package is the result of several iterations of trying to come up with a reusable and extendable set of tree classes. It may still be nonoptimal, but some thought went into it! At any rate, there are several things that it is important to understand to use the class effectively. One is that a Label has a primary value() which is always a String, and this is the only thing that matters for fundamental Label operations, such as checking equality. While anything else (or nothing) can be stored in a Label, all other Label content is regarded as purely decorative. All Label implementations should implement a labelFactory() method that returns a LabelFactory for the appropriate kind of Label. Since this depends on the exact class, this method should always be overwritten when a Label class is extended. The existing Label classes also provide a static factory() method which returns the same thing.

Road Map: There are some plans to change things. We plan to redo Label, so that all Label classes just inherit from AbstractLabel, and do a full equality test on all their fields. The default type of Treebank should be useful. TreeReader should be PennTreeReader. And there is probably more.

Illustrations of use of the trees package

Treebank and Tree

Here is some fairly straightforward code for loading trees from a treebank and iterating over the trees contained therein. It builds a histogram of sentence lengths.

import java.util.Iterator;

import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.io.NumberRangesFileFilter;
import edu.stanford.nlp.util.Timing;

/** This class just prints out sentences and their lengths.
 *  Use: java SentenceLengths /turing/corpora/Treebank2/combined/wsj/07 
 *              [fileRange]
 */
public class SentenceLengths {

    private static final int maxleng = 100;
    private static int[] lengthCounts = new int[maxleng+1];
    private static int numSents = 0;


    public static void main(String[] args) {
        Timing.startTime();
        Treebank treebank = new DiskTreebank(
                                     new LabeledScoredTreeReaderFactory());
        if (args.length > 1) {
            treebank.loadPath(args[0], new NumberRangesFileFilter(args[1],
                                                                  true));
        } else {
            treebank.loadPath(args[0]);
        }
        
        for (Iterator it = treebank.iterator(); it.hasNext(); ) {
            Tree t = (Tree) it.next();
            numSents++;
            int len = t.yield().length();
            if (len <= maxleng) {
                lengthCounts[len]++;
            }
        }
        System.out.print("Files " + args[0] + " ");
        if (args.length > 1) {
            System.out.print(args[1] + " ");
        }
        System.out.println("consists of " + numSents + " sentences");
        for (int i = 0; i <= maxleng; i++) {
            System.out.println("  " + lengthCounts[i] + " of length " + i);
        }
        Timing.endTime("Read/count all trees");
    }

}

Treebank, custom TreeReaderFactory, Tree, and Constituent

This example illustrates building a Treebank by hand, specifying a custom TreeReaderFactory, and illustrates more of the Tree package, and the notion of a Constituent. A Constituent has a start and end point and a Label.

import java.io.*;
import java.util.*;

import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;

/** This class counts how often each constituent appears
 *  Use: java ConstituentCounter /turing/corpora/Treebank2/combined/wsj/07
 */
public class ConstituentCounter {

    public static void main(String[] args) {
        Treebank treebank = new DiskTreebank(new TreeReaderFactory() {
                public TreeReader newTreeReader(Reader in) {
                    return new TreeReader(in, 
                        new LabeledScoredTreeFactory(new StringLabelFactory()),
                                  new BobChrisTreeNormalizer());
                }
            });

        treebank.loadPath(args[0]);
        Counter cnt = new Counter();
        
        ConstituentFactory confac = LabeledConstituent.factory();
        for (Iterator it = treebank.iterator(); it.hasNext(); ) {
            Tree t = (Tree) it.next();
            Set constituents = t.constituents(confac);
            for (Iterator it2 = constituents.iterator(); it2.hasNext(); ) {
                Constituent c = (Constituent) it2.next();
                cnt.increment(c);
            }
        }
        SortedSet ss = new TreeSet(cnt.seenSet());
        for (Iterator it = ss.iterator(); it.hasNext(); ) {
            Constituent c = (Constituent) it.next();
            System.out.println(c + "  " + cnt.countOf(c));
        }
    }

}

Tree and Label

Dealing with the Tree and Label classes is a central part of using this package. This code works out the set of tags (preterminal labels) used in a Treebank. It illustrates writing ones own code to recurse through a Tree, and getting a String value for a Label.

import java.util.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.Counter;

/** This class prints out trees from strings and counts their preterminals.
 *  Use: java TreesFromStrings '(S (NP (DT This)) (VP (VBD was) (JJ good)))'
 */
public class TreesFromStrings {

    private static void addTerminals(Tree t, Counter c) {
        if (t.isLeaf()) {
            // do nothing
        } else if (t.isPreTerminal()) {
            c.increment(t.label().value());
        } else {
            // phrasal node
            Tree[] kids = t.children();
            for (int i = 0; i < kids.length; i++) {
                addTerminals(kids[i], c);
            }
        }
    }

    public static void main(String[] args) {
       Treebank tb = new MemoryTreebank();
       for (int i = 0; i < args.length; i++) {
           try {
               Tree t = Tree.valueOf(args[i]);
               tb.add(t);
           } catch (Exception e) {
               e.printStackTrace();
           }
       }
       Counter c = new Counter();
       for (Iterator it = tb.iterator(); it.hasNext(); ) {
           Tree t = (Tree) it.next();
           addTerminals(t, c);
       }
       System.out.println(c);
   }

}

As well as the Treebank classes, there are corresponding Sentencebank classes (though they are not quite so extensively developed. This final example shows use of a Sentencebank. It also illustrates the Visitor pattern for examining sentences in a Sentencebank. This was actually the original visitation pattern for Treebank and Sentencebank, but these days, it's in general easier to use an Iterator. You can also get Sentences from a Treebank, by taking the yield() or taggedYield() of each Tree.

import java.io.*;

import edu.stanford.nlp.trees.*;

public class SentencePrinter {

    /** Loads SentenceBank from first argument and prints it out.  
* Usage: java SentencePrinter sentencebankPath * @param args Array of command-line arguments */ public static void main(String[] args) { SentenceReaderFactory srf = new SentenceReaderFactory() { public SentenceReader newSentenceReader(Reader in) { return new SentenceReader(in, new TaggedWordFactory(), new PennSentenceNormalizer(), new PennTagbankStreamTokenizer(in)); } }; Sentencebank sentencebank = new DiskSentencebank(srf); sentencebank.loadPath(args[0]); sentencebank.apply(new SentenceVisitor() { public void visitSentence(final Sentence s) { // also print tag as well as word System.out.println(s.toString(false)); } }); } }

Since:
1.2
Author:
Christopher Manning, Dan Klein


Stanford NLP Group