edu.stanford.nlp.trees
Class GrammaticalStructure

java.lang.Object
  extended by edu.stanford.nlp.trees.TreeGraph
      extended by edu.stanford.nlp.trees.GrammaticalStructure
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
EnglishGrammaticalStructure

public abstract class GrammaticalStructure
extends TreeGraph

A GrammaticalStructure is a TreeGraph (that is, a tree with additional labeled arcs between nodes) for representing the grammatical relations in a parse tree. A new GrammaticalStructure is constructed from an existing parse tree with the help of GrammaticalRelation, which defines a hierarchy of grammatical relations, along with patterns for identifying them in parse trees. The constructor for GrammaticalStructure uses these definitions to populate the new GrammaticalStructure with as many labeled grammatical relations as it can. Once constructed, the new GrammaticalStructure can be printed in various formats, or interrogated using the interface methods in this class.

Caveat emptor! This is a work in progress. Nothing in here should be relied upon to function perfectly. Feedback welcome.

Author:
Bill MacCartney, Galen Andrew (refactoring English-specific stuff), Ilya Sherman (dependencies), Daniel Cer
See Also:
EnglishGrammaticalRelations, GrammaticalRelation, EnglishGrammaticalStructure, Serialized Form

Field Summary
protected  List<TypedDependency> allTypedDependencies
           
static int CoNLLX_FieldCount
           
static int CoNLLX_GovField
           
static int CoNLLX_POSField
           
static int CoNLLX_RelnField
           
static int CoNLLX_WordField
           
static String DEFAULT_PARSER_FILE
           
protected  Set<Dependency<Label,Label,Object>> dependencies
           
protected  List<TypedDependency> typedDependencies
           
 
Fields inherited from class edu.stanford.nlp.trees.TreeGraph
root
 
Constructor Summary
GrammaticalStructure(List<TypedDependency> projectiveDependencies, TreeGraphNode root)
           
GrammaticalStructure(Tree t, Collection<GrammaticalRelation> relations, HeadFinder hf, Filter<String> puncFilter)
           
GrammaticalStructure(Tree t, Collection<GrammaticalRelation> relations, Lock relationsLock, HeadFinder hf, Filter<String> puncFilter)
          Create a new GrammaticalStructure, analyzing the parse tree and populate the GrammaticalStructure with as many labeled grammatical relation arcs as possible.
 
Method Summary
 Collection<TypedDependency> allTypedDependencies()
          Returns all the typed dependencies of this grammatical structure.
static GrammaticalStructure buildCoNNLXGrammaticStructure(List<List<String>> tokenFields, Map<String,GrammaticalRelation> shortNameToGRel, GrammaticalStructureFromDependenciesFactory factory)
           
protected  void collapseDependencies(List<TypedDependency> list, boolean CCprocess)
          Destructively modify the Collection<TypedDependency> to collapse language-dependent transitive dependencies.
protected  void collapseDependenciesTree(List<TypedDependency> list)
          Destructively modify the Collection<TypedDependency> to collapse language-dependent transitive dependencies but keeping a tree structure.
protected  void correctDependencies(Collection<TypedDependency> list)
          Destructively modify the TypedDependencyGraph to correct language-dependent dependencies.
 Set<Dependency<Label,Label,Object>> dependencies()
          Returns the set of (governor, dependent) dependencies in this GrammaticalStructure.
static String dependenciesToString(GrammaticalStructure gs, Collection<TypedDependency> deps, Tree tree, boolean conllx, boolean extraSep)
           
static GrammaticalStructure fromStringReps(List<String> tokens, List<String> posTags, List<String> deps)
          Create a grammatical structure from its string representation.
 List<String> getDependencyPath(int nodeIndex, int rootIndex)
          Returns the dependency path as a list of String, from node to root, it is assumed that that root is an ancestor of node
 Set<TreeGraphNode> getDependents(TreeGraphNode t)
          Tries to return a Set of leaf (terminal) nodes which are the DEPENDENTs of the given node t.
static TreeGraphNode getGovernor(TreeGraphNode t)
          Tries to return a leaf (terminal) node which is the GOVERNOR of the given node t.
 GrammaticalRelation getGrammaticalRelation(int govIndex, int depIndex)
          Get GrammaticalRelation between gov and dep, and null if gov is not the governor of dep
static GrammaticalRelation getGrammaticalRelation(TreeGraphNode gov, TreeGraphNode dep)
          Get GrammaticalRelation between gov and dep, and null if gov is not the governor of dep
static List<GrammaticalRelation> getListGrammaticalRelation(TreeGraphNode gov, TreeGraphNode dep)
          Get a list of GrammaticalRelation between gov and dep.
static TreeGraphNode getNodeInRelation(TreeGraphNode t, GrammaticalRelation r)
           
static Collection<TypedDependency> getRoots(Collection<TypedDependency> list)
          Return a list of TypedDependencies which are not dependent on any node from the list.
static boolean isConnected(Collection<TypedDependency> list)
          Checks if all the typeDependencies are connected
static void main(String[] args)
          Given sentences or trees, output the typed dependencies.
static void printDependencies(GrammaticalStructure gs, Collection<TypedDependency> deps, Tree tree, boolean conllx, boolean extraSep)
          Print typed dependencies in either the Stanford dependency representation or in the conllx format.
static List<GrammaticalStructure> readCoNLLXGrammaticStructureCollection(String fileName, Map<String,GrammaticalRelation> shortNameToGRel, GrammaticalStructureFromDependenciesFactory factory)
          Read in a file containing a CoNLL-X dependency treebank and return a corresponding list of GrammaticalStructures.
 Collection<TypedDependency> typedDependencies()
          Returns the typed dependencies of this grammatical structure.
 List<TypedDependency> typedDependencies(boolean includeExtras)
          Returns the typed dependencies of this grammatical structure.
 List<TypedDependency> typedDependenciesCCprocessed()
          Get a list of the typed dependencies, including extras like control dependencies, collapsing them and distributing relations across coordination.
 List<TypedDependency> typedDependenciesCCprocessed(boolean includeExtras)
          Get the typed dependencies after collapsing them and processing eventual CC complements.
 Collection<TypedDependency> typedDependenciesCollapsed()
          Get the typed dependencies after collapsing them.
 List<TypedDependency> typedDependenciesCollapsed(boolean includeExtras)
          Get the typed dependencies after collapsing them.
 Collection<TypedDependency> typedDependenciesCollapsedTree()
          Get the typed dependencies after mostly collapsing them, but keep a tree structure.
 
Methods inherited from class edu.stanford.nlp.trees.TreeGraph
addNodeToIndexMap, getNodeByIndex, getNodes, root, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

dependencies

protected final Set<Dependency<Label,Label,Object>> dependencies

typedDependencies

protected final List<TypedDependency> typedDependencies

allTypedDependencies

protected final List<TypedDependency> allTypedDependencies

DEFAULT_PARSER_FILE

public static final String DEFAULT_PARSER_FILE
See Also:
Constant Field Values

CoNLLX_WordField

public static final int CoNLLX_WordField
See Also:
Constant Field Values

CoNLLX_POSField

public static final int CoNLLX_POSField
See Also:
Constant Field Values

CoNLLX_GovField

public static final int CoNLLX_GovField
See Also:
Constant Field Values

CoNLLX_RelnField

public static final int CoNLLX_RelnField
See Also:
Constant Field Values

CoNLLX_FieldCount

public static final int CoNLLX_FieldCount
See Also:
Constant Field Values
Constructor Detail

GrammaticalStructure

public GrammaticalStructure(Tree t,
                            Collection<GrammaticalRelation> relations,
                            Lock relationsLock,
                            HeadFinder hf,
                            Filter<String> puncFilter)
Create a new GrammaticalStructure, analyzing the parse tree and populate the GrammaticalStructure with as many labeled grammatical relation arcs as possible.

Parameters:
t - A Tree to analyze
relations - A set of GrammaticalRelations to consider
relationsLock - Something needed to make this thread-safe
hf - A HeadFinder for analysis
puncFilter - A Filter to reject punctuation. To delete punctuation dependencies, this filter should return false on punctuation word strings, and true otherwise. If punctuation dependencies should be kept, you should pass in a Filters.<String>acceptFilter().

GrammaticalStructure

public GrammaticalStructure(List<TypedDependency> projectiveDependencies,
                            TreeGraphNode root)

GrammaticalStructure

public GrammaticalStructure(Tree t,
                            Collection<GrammaticalRelation> relations,
                            HeadFinder hf,
                            Filter<String> puncFilter)
Method Detail

fromStringReps

public static GrammaticalStructure fromStringReps(List<String> tokens,
                                                  List<String> posTags,
                                                  List<String> deps)
Create a grammatical structure from its string representation. Like buildCoNNLXGrammaticStructure, this method fakes up the parts of the tree structure that are not used by the grammatical relation transformation operations. Note: Added by daniel cer

Parameters:
tokens -
posTags -
deps -

dependencies

public Set<Dependency<Label,Label,Object>> dependencies()
Returns the set of (governor, dependent) dependencies in this GrammaticalStructure.

Returns:
The set of (governor, dependent) dependencies in this GrammaticalStructure.

getDependents

public Set<TreeGraphNode> getDependents(TreeGraphNode t)
Tries to return a Set of leaf (terminal) nodes which are the DEPENDENTs of the given node t. Probably, t should be a leaf node as well.

Parameters:
t - a leaf node in this GrammaticalStructure
Returns:
a Set of nodes which are dependents of node t, or else null

getGovernor

public static TreeGraphNode getGovernor(TreeGraphNode t)
Tries to return a leaf (terminal) node which is the GOVERNOR of the given node t. Probably, t should be a leaf node as well.

Parameters:
t - a leaf node in this GrammaticalStructure
Returns:
a node which is the governor for node t, or else null

getNodeInRelation

public static TreeGraphNode getNodeInRelation(TreeGraphNode t,
                                              GrammaticalRelation r)

getGrammaticalRelation

public GrammaticalRelation getGrammaticalRelation(int govIndex,
                                                  int depIndex)
Get GrammaticalRelation between gov and dep, and null if gov is not the governor of dep


getGrammaticalRelation

public static GrammaticalRelation getGrammaticalRelation(TreeGraphNode gov,
                                                         TreeGraphNode dep)
Get GrammaticalRelation between gov and dep, and null if gov is not the governor of dep


getListGrammaticalRelation

public static List<GrammaticalRelation> getListGrammaticalRelation(TreeGraphNode gov,
                                                                   TreeGraphNode dep)
Get a list of GrammaticalRelation between gov and dep. Useful for getting extra dependencies, in which two nodes can be linked by multiple arcs.


typedDependencies

public Collection<TypedDependency> typedDependencies()
Returns the typed dependencies of this grammatical structure. These are basic word-level typed dependencies, where each word other than the root of the sentence is dependent on one other word, and the dependencies have a tree structure.

Returns:
The typed dependencies of this grammatical structure

allTypedDependencies

public Collection<TypedDependency> allTypedDependencies()
Returns all the typed dependencies of this grammatical structure. These are like the basic (uncollapsed) dependencies, but may include extra arcs for control relationships, etc.


typedDependencies

public List<TypedDependency> typedDependencies(boolean includeExtras)
Returns the typed dependencies of this grammatical structure.

If the boolean argument is true, the list of typed dependencies returned may include "extras", and does not follow a tree structure.


typedDependenciesCollapsed

public Collection<TypedDependency> typedDependenciesCollapsed()
Get the typed dependencies after collapsing them. Collapsing dependencies refers to turning certain function words such as prepositions and conjunctions into arcs, so they disappear from the set of nodes. There is no guarantee that the dependencies are a tree. While the dependencies are normally tree-like, the collapsing may introduce not only re-entrancies but even small cycles.

Returns:
A set of collapsed dependencies

typedDependenciesCollapsedTree

public Collection<TypedDependency> typedDependenciesCollapsedTree()
Get the typed dependencies after mostly collapsing them, but keep a tree structure. In order to do this, the code does:
  1. no relative clause processing
  2. no xsubj relations
  3. no propagation of conjuncts

Returns:
collapsed dependencies keeping a tree structure

typedDependenciesCollapsed

public List<TypedDependency> typedDependenciesCollapsed(boolean includeExtras)
Get the typed dependencies after collapsing them.

If the boolean argument is true, the list of typed dependencies returned may include "extras".

Returns:
collapsed dependencies

typedDependenciesCCprocessed

public List<TypedDependency> typedDependenciesCCprocessed(boolean includeExtras)
Get the typed dependencies after collapsing them and processing eventual CC complements. The effect of this part is to distributed conjoined arguments across relations or conjoined predicates across their arguments. This is generally useful, and we generally recommend using the output of this method with the second argument being true.

Parameters:
includeExtras - If true, the list of typed dependencies returned may include "extras", such as controlled subject links.
Returns:
collapsed dependencies with CC processed

typedDependenciesCCprocessed

public List<TypedDependency> typedDependenciesCCprocessed()
Get a list of the typed dependencies, including extras like control dependencies, collapsing them and distributing relations across coordination. This method is generally recommended for best representing the semantic and syntactic relations of a sentence. In general it returns a directed graph (i.e., the output may not be a tree and it may contain (small) cycles).

Returns:
collapsed dependencies with CC processed

collapseDependencies

protected void collapseDependencies(List<TypedDependency> list,
                                    boolean CCprocess)
Destructively modify the Collection<TypedDependency> to collapse language-dependent transitive dependencies.

Default is no-op; to be over-ridden in subclasses.

Parameters:
list - A list of dependencies to process for possible collapsing
CCprocess - apply CC process?

collapseDependenciesTree

protected void collapseDependenciesTree(List<TypedDependency> list)
Destructively modify the Collection<TypedDependency> to collapse language-dependent transitive dependencies but keeping a tree structure.

Default is no-op; to be over-ridden in subclasses.

Parameters:
list - A list of dependencies to process for possible collapsing

correctDependencies

protected void correctDependencies(Collection<TypedDependency> list)
Destructively modify the TypedDependencyGraph to correct language-dependent dependencies. (e.g., nsubjpass in a relative clause)

Default is no-op; to be over-ridden in subclasses.


getDependencyPath

public List<String> getDependencyPath(int nodeIndex,
                                      int rootIndex)
Returns the dependency path as a list of String, from node to root, it is assumed that that root is an ancestor of node

Returns:
A list of dependency labels

isConnected

public static boolean isConnected(Collection<TypedDependency> list)
Checks if all the typeDependencies are connected

Parameters:
list - a list of typedDependencies
Returns:
true if the list represents a connected graph, false otherwise

getRoots

public static Collection<TypedDependency> getRoots(Collection<TypedDependency> list)
Return a list of TypedDependencies which are not dependent on any node from the list.

Parameters:
list - The list of TypedDependencies to check
Returns:
A list of TypedDependencies which are not dependent on any node from the list

printDependencies

public static void printDependencies(GrammaticalStructure gs,
                                     Collection<TypedDependency> deps,
                                     Tree tree,
                                     boolean conllx,
                                     boolean extraSep)
Print typed dependencies in either the Stanford dependency representation or in the conllx format.

Parameters:
deps - Typed dependencies to print
tree - Tree corresponding to typed dependencies (only necessary if conllx == true)
conllx - If true use conllx format, otherwise use Stanford representation
extraSep - If true, in the Stanford representation, the extra dependencies (which do not preserve the tree structure) are printed after the basic dependencies

dependenciesToString

public static String dependenciesToString(GrammaticalStructure gs,
                                          Collection<TypedDependency> deps,
                                          Tree tree,
                                          boolean conllx,
                                          boolean extraSep)

readCoNLLXGrammaticStructureCollection

public static List<GrammaticalStructure> readCoNLLXGrammaticStructureCollection(String fileName,
                                                                                Map<String,GrammaticalRelation> shortNameToGRel,
                                                                                GrammaticalStructureFromDependenciesFactory factory)
                                                                         throws IOException
Read in a file containing a CoNLL-X dependency treebank and return a corresponding list of GrammaticalStructures.

Throws:
IOException

buildCoNNLXGrammaticStructure

public static GrammaticalStructure buildCoNNLXGrammaticStructure(List<List<String>> tokenFields,
                                                                 Map<String,GrammaticalRelation> shortNameToGRel,
                                                                 GrammaticalStructureFromDependenciesFactory factory)

main

public static void main(String[] args)
Given sentences or trees, output the typed dependencies.

By default, the method outputs the collapsed typed dependencies with processing of conjuncts. The input can be given as plain text (one sentence by line) using the option -sentFile, or as trees using the option -treeFile. For -sentFile, the input has to be strictly one sentence per line. You can specify where to find a parser with -parserFile serializedParserPath. See LexicalizedParser for more flexible processing of text files (including with Stanford Dependencies output). The above options assume a file as input. You can also feed trees (only) via stdin by using the option -filter. If one does not specify a -parserFile, one can specify which language pack to use with -tLPP, This option specifies a class which determines which GrammaticalStructure to use, which HeadFinder to use, etc. It will default to edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams, but any TreebankLangParserParams can be specified.

If no method of producing trees is given other than to use the LexicalizedParser, but no parser is specified, a default parser is used, the English parser. You can specify options to load with the parser using the -parserOpts flag. If the default parser is used, and no options are provided, the option -retainTmpSubcategories is used.

The following options can be used to specify the types of dependencies wanted:
-collapsed collapsed dependencies
-basic non-collapsed dependencies that preserve a tree structure
-nonCollapsed non-collapsed dependencies that do not preserve a tree structure (the basic dependencies plus the extra ones)
-CCprocessed collapsed dependencies and conjunctions processed (dependencies are added for each conjunct) -- this is the default if no option are passed
-collapsedTree collapsed dependencies retaining a tree structure -makeCopulaHead Contrary to the approach argued for in the SD papers, nevertheless make the verb 'to be' the head, not the predicate noun, adjective, etc.
The -conllx option will output the dependencies into the CoNLL format, instead of in the standard Stanford format (relation(governor,dependent)) and will retain punctuation by default (where punctuation will be attached to the root of the sentence with the "punct" relation). When used in the "collapsed" format, words such as prepositions, conjunctions which get collapsed into the grammatical relations and are not part of the sentence per se anymore will be annotated with "erased" as grammatical relation and attached to the fake "ROOT" node with index 0.

There is also an option to retain dependencies involving punctuation: -keepPunct
The -extraSep option used with -nonCollapsed will print the basic dependencies first, then a separator ======, and then the extra dependencies that do not preserve the tree structure. The -test option is used for debugging: it prints the grammatical structure, as well as the basic, collapsed and CCprocessed dependencies. It also checks the connectivity of the collapsed dependencies. If the collapsed dependencies list doesn't constitute a connected graph, it prints the possible offending nodes (one of them is the real root of the graph).

Using the -conllxFile, you can pass a file containing Stanford dependencies in the CoNLL format (e.g., the basic dependencies), and obtain another representation using one of the representation options.

Usage:
java edu.stanford.nlp.trees.GrammaticalStructure [-treeFile FILE | -sentFile FILE | -conllxFile FILE | -filter]
[-collapsed -basic -CCprocessed -test]

Parameters:
args - Command-line arguments, as above


Stanford NLP Group