edu.stanford.nlp.trees (Stanford JavaNLP API)

Interface Summary
Interface	Description
ConstituentFactory	A `ConstituentFactory` is a factory for creating objects of class `Constituent`, or some descendent class.
CopulaHeadFinder	A mix-in interface for HeadFinders which support the makesCopulaHead method, which says how the HeadFinder in question handles "to be" verbs.
Dependency<G extends Label,D extends Label,N>	An individual dependency between a governor and a dependent.
DependencyFactory	A factory for dependencies of a certain type.
DependencyPrinter
DependencyReader
DependencyTyper<T>	A generified interface for making some kind of dependency object between a head and dependent.
GrammaticalStructureFactory	A general factory for `GrammaticalStructure` objects.
GrammaticalStructureFromDependenciesFactory	An interface for a factory that builds a GrammaticalStructure from a list of TypedDependencies and a TreeGraphNode.
HasParent	Only to be implemented by Tree subclasses that actualy keep their parent pointers.
HeadFinder	An interface for finding the "head" daughter of a phrase structure tree.
Labeled	Interface for Objects which have a `Label`.
TreebankFactory	An interface for treebank vendors.
TreebankLanguagePack	This interface specifies language/treebank specific information for a Treebank, which a parser or other treebank user might need to know.
TreebankTransformer
TreeFactory	A `TreeFactory` acts as a factory for creating objects of class `Tree`, or some descendant class.
TreeReader	A `TreeReader` adds functionality to another `Reader` by reading in Trees, or some descendant class.
TreeReaderFactory	A `TreeReaderFactory` is a factory for creating objects of class `TreeReader`, or some descendant class.
TreeTransformer	This is a simple interface for a function that alters a local `Tree`.
TreeVisitor	This is a simple strategy-type interface for operations that are applied to `Tree`.
WordNetConnection	Allows us to verify that a wordnet connection is available without compile time errors if the package is not found.

Class Summary
Class	Description
AbstractCollinsHeadFinder	A base class for a HeadFinder similar to the one described in Michael Collins' 1999 thesis.
AbstractTreebankLanguagePack	This provides an implementation of parts of the TreebankLanguagePack API to reduce the load on fresh implementations.
BasicCategoryTreeTransformer	Transforms trees by turning the labels into their basic categories according to the `TreebankLanguagePack`
BobChrisTreeNormalizer	Normalizes trees in the way used in Manning and Carpenter 1997.
BobChrisTreeNormalizer.AOverAFilter
BobChrisTreeNormalizer.EmptyFilter
CollinsDependency	Extracts bilexical dependencies from Penn Treebank-style phrase structure trees as described in (Collins, 1999) and the later Comp.
CollinsHeadFinder	Implements the HeadFinder found in Michael Collins' 1999 thesis.
CollinsRelation	A relation 4-tuple for the dependency representation of Collins (1999; 2003).
CollocationFinder	Finds WordNet collocations in parse trees.
CompositeTreebank
CompositeTreeTransformer	A TreeTransformer that applies component TreeTransformers in order.
Constituent	A `Constituent` object defines a generic edge in a graph.
CoordinationTransformer	Coordination transformer transforms a PennTreebank tree containing a coordination in a flat structure in order to get the dependencies right.
DateTreeTransformer	Flattens the following two structures: (NP (NP (NNP Month) (CD Day) ) (, ,) (NP (CD Year) )) becomes (NP (NNP Month) (CD Day) (, ,) (CD Year) ) (NP (NP (NNP Month) ) (NP (CD Year) )) becomes (NP (NNP Month) (CD Year))
DeepTree	A tree combined with a map from subtree to SimpleMatrix vectors.
Dependencies	Utilities for Dependency objects.
Dependencies.DependentPuncTagRejectFilter<G extends Label,D extends Label,N>
Dependencies.DependentPuncWordRejectFilter<G extends Label,D extends Label,N>
DependencyScoring	Scoring of typed dependencies
DependencyScoring.Score
DependencyTreeTransformer	Transforms an English structure parse tree in order to get the dependencies right: -- put a ROOT node -- remove NONE nodes -- retain only NP-TMP, NP-ADV, UCP-TMP tags The UCP- tags will later be turned into NP- anyway (Note [cdm]: A lot of this overlaps other existing functionality in trees.
DiskTreebank	A `DiskTreebank` is a `Collection` of `Tree`s.
EnglishGrammaticalRelations	`EnglishGrammaticalRelations` is a set of `GrammaticalRelation` objects for the English language.
EnglishGrammaticalStructure	A GrammaticalStructure for English.
EnglishGrammaticalStructure.FromDependenciesFactory
EnglishGrammaticalStructureFactory
EnglishPatterns	This class contains some English String or Tregex regular expression patterns.
FilteringTreebank	This class wraps another Treebank, and will vend trees that passed a Filter<Tree>.
FilteringTreeReader	A `FilteringTreeReader` filters the output of another TreeReader.
FindTreebankTree	This utility looks for a given sentence in a file or directory of tree files.
GenerateTrees	Generates trees based on simple grammars.
GrammaticalFunctionTreeNormalizer	Tree normalizer for cleaning up labels and preserving the whole node label, the grammatical function and category information from the label, or only the category information.
GrammaticalRelation	`GrammaticalRelation` is used to define a standardized, hierarchical set of grammatical relations, together with patterns for identifying them in parse trees.
GrammaticalStructure	A `GrammaticalStructure` stores dependency relations between nodes in a tree.
GrammaticalStructureConversionUtils	Contains several utility methods to convert constituency trees to dependency trees.
LabeledConstituent	A `LabeledConstituent` object represents a single bracketing in a derivation, including start and end points and `Label` information, but excluding probabilistic information.
LabeledScoredConstituent	A `LabeledScoredConstituent` object defines an edge in a graph with a label and a score.
LabeledScoredConstituentFactory	A `LabeledScoredConstituentFactory` acts as a factory for creating objects of class `LabeledScoredConstituent`.
LabeledScoredTreeFactory	A `LabeledScoredTreeFactory` acts as a factory for creating trees with labels and scores.
LabeledScoredTreeNode	A `LabeledScoredTreeNode` represents a tree composed of a root label, a score, and an array of daughter parse trees.
LabeledScoredTreeReaderFactory	This class implements a `TreeReaderFactory` that produces labeled, scored array-based Trees, which have been cleaned up to delete empties, etc.
LeftHeadFinder	HeadFinder that always returns the leftmost daughter as head.
LengthTreeFilter	Only accept trees that are short enough (less than or equal to length).
MemoryTreebank	A `MemoryTreebank` object stores a corpus of examples with given tree structures in memory (as a `List`).
ModCollinsHeadFinder	Implements a variant on the HeadFinder found in Michael Collins' 1999 thesis.
NamedDependency	An individual dependency between a head and a dependent.
NPTmpRetainingTreeNormalizer	Same TreeNormalizer as BobChrisTreeNormalizer, but optionally provides four extras.
NPTmpRetainingTreeNormalizer.NPTmpAdvRetainingTreeReaderFactory	Implementation of TreeReaderFactory, mainly for convenience of constructing by reflection.
NPTmpRetainingTreeNormalizer.NPTmpRetainingTreeReaderFactory	Implementation of TreeReaderFactory, mainly for convenience of constructing by reflection.
OrderedCombinationTreeNormalizer	This class combines multiple tree normalizers.
OutputSubtrees	Output a tree and all of its subtrees.
PennTreebankLanguagePack	Specifies the treebank/language specific components needed for parsing the English Penn Treebank.
PennTreebankTokenizer	Builds a tokenizer for English PennTreebank (release 2) trees.
PennTreeReader	This class implements the `TreeReader` interface to read Penn Treebank-style files.
PennTreeReaderFactory	Vends `PennTreeReader` objects.
ProcessDependencyConverterRequest
QPTreeTransformer	Transforms an English structure parse tree in order to get the dependencies right: Adds an extra structure in QP phrases: (QP (RB well) (IN over) (CD 9)) becomes (QP (XS (RB well) (IN over)) (CD 9)) (QP (...) (CC ...) (...)) becomes (QP (NP ...) (CC ...) (NP ...))
RecursiveTreeTransformer	A tool to recursively alter a tree in various ways.
RightHeadFinder	HeadFinder that always returns the rightmost daughter as head.
SemanticHeadFinder	Implements a 'semantic head' variant of the the English HeadFinder found in Michael Collins' 1999 thesis.
SimpleConstituent	A `SimpleConstituent` object defines a generic edge in a graph.
SimpleConstituentFactory	A `ConstituentFactory` acts as a factory for creating objects of class `Constituent`, or some descendent class.
SimpleTree	A `SimpleTree` is a minimal concrete implementation of an unlabeled, unscored `Tree`.
SimpleTreeFactory	A `SimpleTreeFactory` acts as a factory for creating objects of class `SimpleTree`.
Span	A `Span` is an optimized `SimpleConstituent` object.
SplitTrainingSet	Given a list of trees, splits the trees into three separate files.
StringLabeledScoredTreeReaderFactory	This class implements a `TreeReaderFactory` that produces labeled, scored array-based Trees, which have been cleaned up to delete empties, etc.
SynchronizedTreeTransformer	If you have a TreeTransformer which is not threadsafe, and you need to call it from multiple threads, this will wrap it in a synchronized manner.
TransformingTreebank	This class wraps another Treebank, and will vend trees that have been through a TreeTransformer.
Tree	The abstract class `Tree` is used to collect all of the tree types, and acts as a generic extensible type.
Treebank	A `Treebank` object provides access to a corpus of examples with given tree structures.
Treebanks	This is just a main method and other static methods for command-line manipulation, statistics, and testing of Treebank objects.
TreebankTagUpdater	Class for automatically applying tags to a treebank
TreeCoreAnnotations	Set of common annotations for `CoreMap`s that require classes from the trees package.
TreeCoreAnnotations.BinarizedTreeAnnotation	The CoreMap key for getting the binarized version of the syntactic parse tree of a sentence.
TreeCoreAnnotations.HeadTagLabelAnnotation	The standard key for storing a head tag in the map as a pointer to the head label.
TreeCoreAnnotations.HeadWordLabelAnnotation	The standard key for storing a head word in the map as a pointer to the head label.
TreeCoreAnnotations.KBestTreesAnnotation	The standard key for storing a list of k-best parses.
TreeCoreAnnotations.TreeAnnotation	The CoreMap key for getting the syntactic parse tree of a sentence.
TreeFilters	A location for general implementations of Filter<Tree>.
TreeFilters.HasMatchingChild
TreeFunctions	This is a utility class which vends tree transformers to translate trees from one factory type to trees of another.
TreeGraphNode	A `TreeGraphNode` is simply a `{@code Tree}` with some additional functionality.
TreeGraphNodeFactory	A `TreeGraphNodeFactory` acts as a factory for creating tree nodes of type `TreeGraphNode`.
TreeLeafLabelTransformer	Applies a Function to the labels in a tree.
TreeLemmatizer
TreeLengthComparator	A `TreeLengthComparator` orders trees by their yield sentence lengths.
TreeNormalizer	A class for tree normalization.
TreePrint	A class for customizing the print method(s) for a `edu.stanford.nlp.trees.Tree` as the output of the parser.
Trees	Various static utilities for the `Tree` class.
TreeToBracketProcessor
TreeTokenizerFactory	Wrapper for TreeReaderFactory.
TypedDependency	A `TypedDependency` is a relation between two words in a `GrammaticalStructure`.
UniversalEnglishGrammaticalRelations	`UniversalEnglishGrammaticalRelations` is a set of `GrammaticalRelation` objects according to the Universal Dependencies standard.
UniversalEnglishGrammaticalStructure	A GrammaticalStructure for Universal Dependencies English.
UniversalEnglishGrammaticalStructure.FromDependenciesFactory
UniversalEnglishGrammaticalStructureFactory
UniversalPOSMapper	Helper class to perform a context-sensitive mapping of POS tags in a tree to universal POS tags.
UniversalSemanticHeadFinder	Implements a 'semantic head' variant of the the HeadFinder found in Michael Collins' 1999 thesis.
UnnamedConcreteDependency	An individual dependency between a head and a dependent.
UnnamedDependency	An individual dependency between a head and a dependent.
WordCatConstituent	A class storing information about a constituent in a character-based tree.
WordCatEqualityChecker	An EqualityChecker for WordCatConstituents.
WordCatEquivalenceClasser	An EquivalenceClasser for WordCatConstituents.
WordStemmer	Stems the Words in a Tree using Morphology.

Enum Summary
Enum	Description
CollinsRelation.Direction
GrammaticalStructure.Extras	A specification for the types of extra edges to add to the dependency tree.
GrammaticalStructureConversionUtils.ConverterOptions	Enum to identify the different TokenizerTypes.

Package edu.stanford.nlp.trees Description

A package for (NLP) trees, sentences, and similar things. This package provides several key abstractions (via abstract classes) and a number of further classes for related objects. Most of these classes use a Factory pattern to instantiate objects.

A Label is something that can be the label of a Tree or a Constituent. The simplest label is a StringLabel. A Word or a TaggedWord is a Label. They can be constructed with a LabelFactory. A Label often implements various interfaces, such as HasWord.

A Constituent object defines a generic edge in a graph. It has a start and end, and usually a Label. A ConstituentFactory builds a Constituent.

A Tree object provides generic facilities for manipulating NLP trees. A TreeFactory can build a Tree. A Treebank provides an interface to a collection of parsed sentences (normally found on disk as a corpus). A TreeReader reads trees from an InputStream. A TreeReaderFactory builds a TreeReader. A TreeNormalizer canonicalizes a Tree on input from a File. A HeadFinder finds the head daughter of a Tree. The TreeProcessor interface is for general sequential processing of trees, and the TreeTransformer interface is for changing them.

A Sentence is a subclass of an ArrayList. A Sentencebank provides an interface to a large number of sentences (normally found on disk as a corpus). A SentenceReader reads sentences from an InputStream. A SentenceReaderFactory builds a SentenceReader. A SentenceNormalizer canonicalizes a Sentence on input from a File. The SentenceProcessor interface is for general sequential processing of sentences.

There are also various subclasses of StreamTokenizer. The class PairFinder should probably be removed to samples.

Design notes: This package is the result of several iterations of trying to come up with a reusable and extendable set of tree classes. It may still be nonoptimal, but some thought went into it! At any rate, there are several things that it is important to understand to use the class effectively. One is that a Label has a primary value() which is always a String, and this is the only thing that matters for fundamental Label operations, such as checking equality. While anything else (or nothing) can be stored in a Label, all other Label content is regarded as purely decorative. All Label implementations should implement a labelFactory() method that returns a LabelFactory for the appropriate kind of Label. Since this depends on the exact class, this method should always be overwritten when a Label class is extended. The existing Label classes also provide a static factory() method which returns the same thing.

Illustrations of use of the `trees` package

Treebank and Tree

Here is some fairly straightforward code for loading trees from a treebank and iterating over the trees contained therein. It builds a histogram of sentence lengths.

 import java.util.Iterator;
 import edu.stanford.nlp.trees.*;
 import edu.stanford.nlp.io.NumberRangesFileFilter;
 import edu.stanford.nlp.util.Timing;

 /** This class just prints out sentences and their lengths.
  *  Use: java SentenceLengths /turing/corpora/Treebank2/combined/wsj/07
  *              [fileRange]
  *\/
 public class SentenceLengths {

   private static final int maxleng = 100;
   private static int[] lengthCounts = new int[maxleng+1];
   private static int numSents = 0;

   public static void main(String[] args) {
     Timing.startTime();
     Treebank treebank = new DiskTreebank(
       new LabeledScoredTreeReaderFactory());
     if (args.length > 1) {
       treebank.loadPath(args[0], new NumberRangesFileFilter(args[1],
         true));
     } else {
       treebank.loadPath(args[0]);
     }

     for (Iterator it = treebank.iterator(); it.hasNext(); ) {
       Tree t = (Tree) it.next();
       numSents++;
       int len = t.yield().length();
       if (len <= maxleng) {
         lengthCounts[len]++;
       }
     }
     System.out.print("Files " + args[0] + " ");
     if (args.length > 1) {
       System.out.print(args[1] + " ");
     }
     System.out.println("consists of " + numSents + " sentences");
     for (int i = 0; i <= maxleng; i++) {
       System.out.println("  " + lengthCounts[i] + " of length " + i);
     }
     Timing.endTime("Read/count all trees");
   }
 }

Treebank, custom TreeReaderFactory, Tree, and Constituent

This example illustrates building a Treebank by hand, specifying a custom TreeReaderFactory, and illustrates more of the Tree package, and the notion of a Constituent. A Constituent has a start and end point and a Label.

 import java.io.*;
 import java.util.*;

 import edu.stanford.nlp.trees.*;
 import edu.stanford.nlp.util.*;

 /** This class counts how often each constituent appears
 *  Use: java ConstituentCounter /turing/corpora/Treebank2/combined/wsj/07
 *\

 public class ConstituentCounter {

   public static void main(String[] args) {
     Treebank treebank = new DiskTreebank(new TreeReaderFactory() {
       public TreeReader newTreeReader(Reader in) {
         return new TreeReader(in,
           new LabeledScoredTreeFactory(new StringLabelFactory()),
           new BobChrisTreeNormalizer());
       }
     });

     treebank.loadPath(args[0]);
     Counter cnt = new Counter();

     ConstituentFactory confac = LabeledConstituent.factory();
     for (Iterator it = treebank.iterator(); it.hasNext(); ) {
       Tree t = (Tree) it.next();
       Set constituents = t.constituents(confac);
       for (Iterator it2 = constituents.iterator(); it2.hasNext(); ) {
         Constituent c = (Constituent) it2.next();
         cnt.increment(c);
       }
     }
     SortedSet ss = new TreeSet(cnt.seenSet());
     for (Iterator it = ss.iterator(); it.hasNext(); ) {
       Constituent c = (Constituent) it.next();
       System.out.println(c + "  " + cnt.countOf(c));
     }
   }
 }

Tree and Label

Dealing with the Tree and Label classes is a central part of using this package. This code works out the set of tags (preterminal labels) used in a Treebank. It illustrates writing ones own code to recurse through a Tree, and getting a String value for a Label.

 import java.util.*;
 import edu.stanford.nlp.trees.*;
 import edu.stanford.nlp.util.Counter;

 /** This class prints out trees from strings and counts their preterminals.
 *  Use: java TreesFromStrings '(S (NP (DT This)) (VP (VBD was) (JJ good)))'
 *\/
 public class TreesFromStrings {

 private static void addTerminals(Tree t, Counter c) {
     if (t.isLeaf()) {
       // do nothing
    } else if (t.isPreTerminal()) {
      c.increment(t.label().value());
    } else {
      // phrasal node
      Tree[] kids = t.children();
      for (int i = 0; i < kids.length; i++) {
        addTerminals(kids[i], c);
      }
    }
  }

 public static void main(String[] args) {
    Treebank tb = new MemoryTreebank();
  for (int i = 0; i < args.length; i++) {
       try {
      Tree t = Tree.valueOf(args[i]);
      tb.add(t);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
  Counter c = new Counter();
  for (Iterator it = tb.iterator(); it.hasNext(); ) {
    Tree t = (Tree) it.next();
    addTerminals(t, c);
  }
     System.out.println(c);
 }

 }

As well as the Treebank classes, there are corresponding Sentencebank classes (though they are not quite so extensively developed. This final example shows use of a Sentencebank. It also illustrates the Visitor pattern for examining sentences in a Sentencebank. This was actually the original visitation pattern for Treebank and Sentencebank, but these days, it's in general easier to use an Iterator. You can also get Sentences from a Treebank, by taking the yield() or taggedYield() of each Tree.

 import java.io.*;

 import edu.stanford.nlp.trees.*;

 public class SentencePrinter {

 /** Loads SentenceBank from first argument and prints it out.  

  *  Usage: java SentencePrinter sentencebankPath
  *  @param args Array of command-line arguments
  *\/
 public static void main(String[] args) {
  SentenceReaderFactory srf = new SentenceReaderFactory() {
    public SentenceReader newSentenceReader(Reader in) {
      return new SentenceReader(in, new TaggedWordFactory(),
          new PennSentenceNormalizer(),
          new PennTagbankStreamTokenizer(in));
    }
  };
  Sentencebank sentencebank = new DiskSentencebank(srf);
  sentencebank.loadPath(args[0]);

  sentencebank.apply(new SentenceVisitor() {
    public void visitSentence(final Sentence s) {
      // also print tag as well as word
      System.out.println(s.toString(false));
   }
  });
 }

 }

Since:: 1.2
Author:: Christopher Manning, Dan Klein

Package edu.stanford.nlp.trees

Package edu.stanford.nlp.trees Description

Illustrations of use of the trees package

Treebank and Tree

Treebank, custom TreeReaderFactory, Tree, and Constituent

Tree and Label

Illustrations of use of the `trees` package