edu.stanford.nlp.trees
Class EnglishGrammaticalStructure

java.lang.Object
  extended by edu.stanford.nlp.trees.TreeGraph
      extended by edu.stanford.nlp.trees.GrammaticalStructure
          extended by edu.stanford.nlp.trees.EnglishGrammaticalStructure
All Implemented Interfaces:
Serializable

public class EnglishGrammaticalStructure
extends GrammaticalStructure

A GrammaticalStructure for English.

The parser should be run with the "-retainNPTmpSubcategories" option! Caveat emptor! This is a work in progress. Suggestions welcome.

Author:
Bill MacCartney, Marie-Catherine de Marneffe, Christopher Manning, Daniel Cer (CoNLLX format)
See Also:
Serialized Form

Field Summary
static String CONJ_MARKER
           
static int CoNLLX_FieldCount
           
static int CoNLLX_GovField
           
static int CoNLLX_POSField
           
static int CoNLLX_RelnField
           
static int CoNLLX_WordField
           
static String DEFAULT_PARSER_FILE
           
protected static Map<String,GrammaticalRelation> shortNameToGRel
           
 
Fields inherited from class edu.stanford.nlp.trees.GrammaticalStructure
allTypedDependencies, dependencies, typedDependencies
 
Fields inherited from class edu.stanford.nlp.trees.TreeGraph
root
 
Constructor Summary
EnglishGrammaticalStructure(List<TypedDependency> projectiveDependencies, TreeGraphNode root)
           
EnglishGrammaticalStructure(Tree t)
          Construct a new GrammaticalStructure from an existing parse tree.
EnglishGrammaticalStructure(Tree t, boolean threadSafe)
           
EnglishGrammaticalStructure(Tree t, Filter<String> puncFilter)
          This gets used by GrammaticalStructureFactory (by reflection).
EnglishGrammaticalStructure(Tree t, Filter<String> puncFilter, boolean threadSafe)
          Construct a new GrammaticalStructure from an existing parse tree.
EnglishGrammaticalStructure(Tree t, Filter<String> puncFilter, HeadFinder hf)
          This gets used by GrammaticalStructureFactory (by reflection).
EnglishGrammaticalStructure(Tree t, Filter<String> puncFilter, HeadFinder hf, boolean threadSafe)
           
EnglishGrammaticalStructure(Tree t, HeadFinder hf)
          This gets used by GrammaticalStructureFactory (by reflection).
 
Method Summary
protected  void collapseDependencies(List<TypedDependency> list, boolean CCprocess)
          Destructively modifies this Collection<TypedDependency> by collapsing several types of transitive pairs of dependencies.
protected  void collapseDependenciesTree(List<TypedDependency> list)
          Destructively modifies this Collection<TypedDependency> by collapsing several types of transitive pairs of dependencies, but keeping the tree structure.
protected static GrammaticalRelation conjValue(String conj)
          Does some hard coding to deal with relation in CONJP.
protected  void correctDependencies(Collection<TypedDependency> list)
          Destructively modify the TypedDependencyGraph to correct language-dependent dependencies.
static String dependenciesToString(GrammaticalStructure gs, Collection<TypedDependency> deps, Tree tree, boolean conllx, boolean extraSep)
           
static TreeGraphNode getSubject(TreeGraphNode t)
          Tries to return a node representing the SUBJECT (whether nominal or clausal) of the given node t.
static void main(String[] args)
          Given sentences or trees, output the typed dependencies.
static void printDependencies(GrammaticalStructure gs, Collection<TypedDependency> deps, Tree tree, boolean conllx, boolean extraSep)
          Print typed dependencies in either the Stanford dependency representation or in the conllx format.
static List<GrammaticalStructure> readCoNLLXGrammaticStructureCollection(String fileName)
          Read in a file containing a CoNLL-X dependency treebank and return a corresponding list of GrammaticalStructures.
 
Methods inherited from class edu.stanford.nlp.trees.GrammaticalStructure
allTypedDependencies, dependencies, getDependencyPath, getDependents, getGovernor, getGrammaticalRelation, getGrammaticalRelation, getListGrammaticalRelation, getNodeInRelation, getRoots, isConnected, typedDependencies, typedDependencies, typedDependenciesCCprocessed, typedDependenciesCCprocessed, typedDependenciesCollapsed, typedDependenciesCollapsed, typedDependenciesCollapsedTree
 
Methods inherited from class edu.stanford.nlp.trees.TreeGraph
addNodeToIndexMap, getNodeByIndex, getNodes, root, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

CONJ_MARKER

public static final String CONJ_MARKER
See Also:
Constant Field Values

DEFAULT_PARSER_FILE

public static final String DEFAULT_PARSER_FILE
See Also:
Constant Field Values

CoNLLX_WordField

public static final int CoNLLX_WordField
See Also:
Constant Field Values

CoNLLX_POSField

public static final int CoNLLX_POSField
See Also:
Constant Field Values

CoNLLX_GovField

public static final int CoNLLX_GovField
See Also:
Constant Field Values

CoNLLX_RelnField

public static final int CoNLLX_RelnField
See Also:
Constant Field Values

CoNLLX_FieldCount

public static final int CoNLLX_FieldCount
See Also:
Constant Field Values

shortNameToGRel

protected static final Map<String,GrammaticalRelation> shortNameToGRel
Constructor Detail

EnglishGrammaticalStructure

public EnglishGrammaticalStructure(Tree t)
Construct a new GrammaticalStructure from an existing parse tree. The new GrammaticalStructure has the same tree structure and label values as the given tree (but no shared storage). As part of construction, the parse tree is analyzed using definitions from GrammaticalRelation to populate the new GrammaticalStructure with as many labeled grammatical relations as it can.

Parameters:
t - Parse tree to make grammatical structure from

EnglishGrammaticalStructure

public EnglishGrammaticalStructure(List<TypedDependency> projectiveDependencies,
                                   TreeGraphNode root)

EnglishGrammaticalStructure

public EnglishGrammaticalStructure(Tree t,
                                   boolean threadSafe)

EnglishGrammaticalStructure

public EnglishGrammaticalStructure(Tree t,
                                   Filter<String> puncFilter)
This gets used by GrammaticalStructureFactory (by reflection).

Parameters:
t - Parse tree to make grammatical structure from
puncFilter - Filter to remove punctuation dependencies

EnglishGrammaticalStructure

public EnglishGrammaticalStructure(Tree t,
                                   Filter<String> puncFilter,
                                   boolean threadSafe)
Construct a new GrammaticalStructure from an existing parse tree. The new GrammaticalStructure has the same tree structure and label values as the given tree (but no shared storage). As part of construction, the parse tree is analyzed using definitions from GrammaticalRelation to populate the new GrammaticalStructure with as many labeled grammatical relations as it can.

Parameters:
t - Parse tree to make grammatical structure from
puncFilter - Filter for punctuation words
threadSafe - Whether or not to support simultaneous instances among multiple threads

EnglishGrammaticalStructure

public EnglishGrammaticalStructure(Tree t,
                                   HeadFinder hf)
This gets used by GrammaticalStructureFactory (by reflection).

Parameters:
t - Parse tree to make grammatical structure from
hf - HeadFinder to use when building it

EnglishGrammaticalStructure

public EnglishGrammaticalStructure(Tree t,
                                   Filter<String> puncFilter,
                                   HeadFinder hf)
This gets used by GrammaticalStructureFactory (by reflection).

Parameters:
t - Parse tree to make grammatical structure from
puncFilter - Filter to remove punctuation dependencies
hf - HeadFinder to use when building it

EnglishGrammaticalStructure

public EnglishGrammaticalStructure(Tree t,
                                   Filter<String> puncFilter,
                                   HeadFinder hf,
                                   boolean threadSafe)
Method Detail

getSubject

public static TreeGraphNode getSubject(TreeGraphNode t)
Tries to return a node representing the SUBJECT (whether nominal or clausal) of the given node t. Probably, node t should represent a clause or verb phrase.

Parameters:
t - a node in this GrammaticalStructure
Returns:
a node which is the subject of node t, or else null

correctDependencies

protected void correctDependencies(Collection<TypedDependency> list)
Description copied from class: GrammaticalStructure
Destructively modify the TypedDependencyGraph to correct language-dependent dependencies. (e.g., nsubjpass in a relative clause)

Default is no-op; to be over-ridden in subclasses.

Overrides:
correctDependencies in class GrammaticalStructure

collapseDependencies

protected void collapseDependencies(List<TypedDependency> list,
                                    boolean CCprocess)
Destructively modifies this Collection<TypedDependency> by collapsing several types of transitive pairs of dependencies.
prepositional object dependencies: pobj
prep(cat, in) and pobj(in, hat) are collapsed to prep_in(cat, hat)
prepositional complement dependencies: pcomp
prep(heard, of) and pcomp(of, attacking) are collapsed to prepc_of(heard, attacking)
conjunct dependencies
cc(investors, and) and conj(investors, regulators) are collapsed to conj_and(investors,regulators)
possessive dependencies: possessive
possessive(Montezuma, 's) will be erased. This is like a collapsing, but due to the flatness of NPs, two dependencies are not actually composed.
For relative clauses, it will collapse referent
ref(man, that) and dobj(love, that) are collapsed to dobj(love, man)

Overrides:
collapseDependencies in class GrammaticalStructure
Parameters:
list - A list of dependencies to process for possible collapsing
CCprocess - apply CC process?

collapseDependenciesTree

protected void collapseDependenciesTree(List<TypedDependency> list)
Destructively modifies this Collection<TypedDependency> by collapsing several types of transitive pairs of dependencies, but keeping the tree structure.
prepositional object dependencies: pobj
prep(cat, in) and pobj(in, hat) are collapsed to prep_in(cat, hat)
prepositional complement dependencies: pcomp
prep(heard, of) and pcomp(of, attacking) are collapsed to prepc_of(heard, attacking)
conjunct dependencies
cc(investors, and) and conj(investors, regulators) are collapsed to conj_and(investors,regulators)
possessive dependencies: possessive
possessive(Montezuma, 's) will be erased. This is like a collapsing, but due to the flatness of NPs, two dependencies are not actually composed.

Overrides:
collapseDependenciesTree in class GrammaticalStructure
Parameters:
list - A list of dependencies to process for possible collapsing

conjValue

protected static GrammaticalRelation conjValue(String conj)
Does some hard coding to deal with relation in CONJP. For now we deal with: but not, instead of, rather than, but rather GO TO negcc as well as, not to mention, but also, & GO TO and.

Parameters:
conj - The head dependency of the conjunction marker
Returns:
A GrammaticalRelation made from a normalized form of that conjunction.

printDependencies

public static void printDependencies(GrammaticalStructure gs,
                                     Collection<TypedDependency> deps,
                                     Tree tree,
                                     boolean conllx,
                                     boolean extraSep)
Print typed dependencies in either the Stanford dependency representation or in the conllx format.

Parameters:
deps - Typed dependencies to print
tree - Tree corresponding to typed dependencies (only necessary if conllx == true)
conllx - If true use conllx format, otherwise use Stanford representation
extraSep - If true, in the Stanford representation, the extra dependencies (which do not preserve the tree structure) are printed after the basic dependencies

dependenciesToString

public static String dependenciesToString(GrammaticalStructure gs,
                                          Collection<TypedDependency> deps,
                                          Tree tree,
                                          boolean conllx,
                                          boolean extraSep)

readCoNLLXGrammaticStructureCollection

public static List<GrammaticalStructure> readCoNLLXGrammaticStructureCollection(String fileName)
                                                                         throws IOException
Read in a file containing a CoNLL-X dependency treebank and return a corresponding list of GrammaticalStructures.

Throws:
IOException

main

public static void main(String[] args)
Given sentences or trees, output the typed dependencies.

By default, the method outputs the collapsed typed dependencies with processing of conjuncts. The input can be given as plain text (one sentence by line) using the option -sentFile, or as trees using the option -treeFile. For -sentFile, the input has to be strictly one sentence per line. You can specify where to find a parser with -parserFile serializedParserPath. See LexicalizedParser for more flexible processing of text files (including with Stanford Dependencies output). The above options assume a file as input. You can also feed trees (only) via stdin by using the option -filter.

The following options can be used to specify the types of dependencies wanted:
-collapsed collapsed dependencies
-basic non-collapsed dependencies that preserve a tree structure
-nonCollapsed non-collapsed dependencies that do not preserve a tree structure (the basic dependencies plus the extra ones)
-CCprocessed collapsed dependencies and conjunctions processed (dependencies are added for each conjunct) -- this is the default if no option are passed
-collapsedTree collapsed dependencies retaining a tree structure

The -conllx option will output the dependencies into the CoNLL format, instead of in the standard Stanford format (relation(governor,dependent))
There is also an option to retain dependencies involving punctuation: -keepPunct
The -extraSep option used with -nonCollapsed will print the basic dependencies first, then a separator ======, and then the extra dependencies that do not preserve the tree structure. The -test option is used for debugging: it prints the grammatical structure, as well as the basic, collapsed and CCprocessed dependencies. It also checks the connectivity of the collapsed dependencies. If the collapsed dependencies list doesn't constitute a connected graph, it prints the possible offending nodes (one of them is the real root of the graph).

Using the -conllxFile, you can pass a file containing Stanford dependencies in the CoNLL format (e.g., the basic dependencies), and obtain another representation using one of the representation options.

Usage:
java edu.stanford.nlp.trees.EnglishGrammaticalStructure [-treeFile FILE | -sentFile FILE | -conllxFile FILE | -filter]
[-collapsed -basic -CCprocessed -test]

Parameters:
args - Command-line arguments, as above


Stanford NLP Group