public abstract class GrammaticalStructure extends TreeGraph
GrammaticalStructure
is a TreeGraph
(that is, a tree with additional labeled
arcs between nodes) for representing the grammatical relations in a
parse tree. A new GrammaticalStructure
is constructed
from an existing parse tree with the help of GrammaticalRelation
, which
defines a hierarchy of grammatical relations, along with
patterns for identifying them in parse trees. The constructor for
GrammaticalStructure
uses these definitions to
populate the new GrammaticalStructure
with as many
labeled grammatical relations as it can. Once constructed, the new
GrammaticalStructure
can be printed in various
formats, or interrogated using the interface methods in this
class.
Caveat emptor! This is a work in progress.
Nothing in here should be relied upon to function perfectly.
Feedback welcome.EnglishGrammaticalRelations
,
GrammaticalRelation
,
EnglishGrammaticalStructure
,
Serialized FormModifier and Type | Field and Description |
---|---|
protected java.util.List<TypedDependency> |
allTypedDependencies |
static int |
CoNLLX_FieldCount |
static int |
CoNLLX_GovField |
static int |
CoNLLX_POSField |
static int |
CoNLLX_RelnField |
static int |
CoNLLX_WordField |
static java.lang.String |
DEFAULT_PARSER_FILE |
protected java.util.Set<Dependency<Label,Label,java.lang.Object>> |
dependencies |
protected Filter<java.lang.String> |
puncFilter |
protected java.util.List<TypedDependency> |
typedDependencies |
Constructor and Description |
---|
GrammaticalStructure(java.util.List<TypedDependency> projectiveDependencies,
TreeGraphNode root) |
GrammaticalStructure(Tree t,
java.util.Collection<GrammaticalRelation> relations,
HeadFinder hf,
Filter<java.lang.String> puncFilter) |
GrammaticalStructure(Tree t,
java.util.Collection<GrammaticalRelation> relations,
java.util.concurrent.locks.Lock relationsLock,
HeadFinder hf,
Filter<java.lang.String> puncFilter)
Create a new GrammaticalStructure, analyzing the parse tree and
populate the GrammaticalStructure with as many labeled
grammatical relation arcs as possible.
|
Modifier and Type | Method and Description |
---|---|
java.util.Collection<TypedDependency> |
allTypedDependencies()
Returns all the typed dependencies of this grammatical structure.
|
static GrammaticalStructure |
buildCoNLLXGrammaticalStructure(java.util.List<java.util.List<java.lang.String>> tokenFields,
java.util.Map<java.lang.String,GrammaticalRelation> shortNameToGRel,
GrammaticalStructureFromDependenciesFactory factory) |
protected void |
collapseDependencies(java.util.List<TypedDependency> list,
boolean CCprocess,
boolean includeExtras)
Destructively modify the
Collection<TypedDependency> to collapse
language-dependent transitive dependencies. |
protected void |
collapseDependenciesTree(java.util.List<TypedDependency> list)
Destructively modify the
Collection<TypedDependency> to collapse
language-dependent transitive dependencies but keeping a tree structure. |
protected void |
correctDependencies(java.util.Collection<TypedDependency> list)
Destructively modify the
TypedDependencyGraph to correct
language-dependent dependencies. |
java.util.Set<Dependency<Label,Label,java.lang.Object>> |
dependencies()
Returns the set of (governor, dependent) dependencies in this
GrammaticalStructure . |
static java.lang.String |
dependenciesToString(GrammaticalStructure gs,
java.util.Collection<TypedDependency> deps,
Tree tree,
boolean conllx,
boolean extraSep) |
protected Filter<TypedDependency> |
extraTreeDepFilter()
Returns a Filter which checks dependencies for usefulness as
extra tree-based dependencies.
|
static GrammaticalStructure |
fromStringReps(java.util.List<java.lang.String> tokens,
java.util.List<java.lang.String> posTags,
java.util.List<java.lang.String> deps)
Create a grammatical structure from its string representation.
|
java.util.List<java.lang.String> |
getDependencyPath(int nodeIndex,
int rootIndex)
Returns the dependency path as a list of String, from node to root, it is assumed that
that root is an ancestor of node
|
static java.util.Set<TreeGraphNode> |
getDependents(TreeGraphNode t)
|
protected void |
getExtras(java.util.List<TypedDependency> basicDep)
Get extra dependencies that do not depend on the tree structure,
but rather only depend on the existing dependency structure.
|
static TreeGraphNode |
getGovernor(TreeGraphNode t)
Tries to return a leaf (terminal) node which is the
of the given node t . |
GrammaticalRelation |
getGrammaticalRelation(int govIndex,
int depIndex)
Get GrammaticalRelation between gov and dep, and null if gov is not the
governor of dep
|
static GrammaticalRelation |
getGrammaticalRelation(TreeGraphNode gov,
TreeGraphNode dep)
Get GrammaticalRelation between gov and dep, and null if gov is not the
governor of dep
|
static java.util.List<GrammaticalRelation> |
getListGrammaticalRelation(TreeGraphNode gov,
TreeGraphNode dep)
Get a list of GrammaticalRelation between gov and dep.
|
static TreeGraphNode |
getNodeInRelation(TreeGraphNode t,
GrammaticalRelation r) |
static java.util.Collection<TypedDependency> |
getRoots(java.util.Collection<TypedDependency> list)
Return a list of TypedDependencies which are not dependent on any node from the list.
|
static boolean |
isConnected(java.util.Collection<TypedDependency> list)
Checks if all the typeDependencies are connected
|
static void |
main(java.lang.String[] args)
Given sentences or trees, output the typed dependencies.
|
protected void |
postProcessDependencies(java.util.List<TypedDependency> basicDep)
Post process the dependencies in whatever way this language
requires.
|
static void |
printDependencies(GrammaticalStructure gs,
java.util.Collection<TypedDependency> deps,
Tree tree,
boolean conllx,
boolean extraSep)
Print typed dependencies in either the Stanford dependency representation
or in the conllx format.
|
static java.util.List<GrammaticalStructure> |
readCoNLLXGrammaticalStructureCollection(java.lang.String fileName,
java.util.Map<java.lang.String,GrammaticalRelation> shortNameToGRel,
GrammaticalStructureFromDependenciesFactory factory)
Read in a file containing a CoNLL-X dependency treebank and return a
corresponding list of GrammaticalStructures.
|
java.util.Collection<TypedDependency> |
typedDependencies()
Returns the typed dependencies of this grammatical structure.
|
java.util.List<TypedDependency> |
typedDependencies(boolean includeExtras)
Returns the typed dependencies of this grammatical structure.
|
java.util.List<TypedDependency> |
typedDependenciesCCprocessed()
Get a list of the typed dependencies, including extras like control
dependencies, collapsing them and distributing relations across
coordination.
|
java.util.List<TypedDependency> |
typedDependenciesCCprocessed(boolean includeExtras)
Get the typed dependencies after collapsing them and processing eventual
CC complements.
|
java.util.Collection<TypedDependency> |
typedDependenciesCollapsed()
Get the typed dependencies after collapsing them.
|
java.util.List<TypedDependency> |
typedDependenciesCollapsed(boolean includeExtras)
Get the typed dependencies after collapsing them.
|
java.util.Collection<TypedDependency> |
typedDependenciesCollapsedTree()
Get the typed dependencies after mostly collapsing them, but keep a tree
structure.
|
addNodeToIndexMap, getNodeByIndex, getNodes, root, toString
protected final java.util.Set<Dependency<Label,Label,java.lang.Object>> dependencies
protected final java.util.List<TypedDependency> typedDependencies
protected final java.util.List<TypedDependency> allTypedDependencies
protected final Filter<java.lang.String> puncFilter
public static final java.lang.String DEFAULT_PARSER_FILE
public static final int CoNLLX_WordField
public static final int CoNLLX_POSField
public static final int CoNLLX_GovField
public static final int CoNLLX_RelnField
public static final int CoNLLX_FieldCount
public GrammaticalStructure(Tree t, java.util.Collection<GrammaticalRelation> relations, java.util.concurrent.locks.Lock relationsLock, HeadFinder hf, Filter<java.lang.String> puncFilter)
t
- A Tree to analyzerelations
- A set of GrammaticalRelations to considerrelationsLock
- Something needed to make this thread-safehf
- A HeadFinder for analysispuncFilter
- A Filter to reject punctuation. To delete punctuation
dependencies, this filter should return false on
punctuation word strings, and true otherwise.
If punctuation dependencies should be kept, you
should pass in a Filters.<String>acceptFilter().public GrammaticalStructure(java.util.List<TypedDependency> projectiveDependencies, TreeGraphNode root)
public GrammaticalStructure(Tree t, java.util.Collection<GrammaticalRelation> relations, HeadFinder hf, Filter<java.lang.String> puncFilter)
public static GrammaticalStructure fromStringReps(java.util.List<java.lang.String> tokens, java.util.List<java.lang.String> posTags, java.util.List<java.lang.String> deps)
tokens
- posTags
- deps
- protected Filter<TypedDependency> extraTreeDepFilter()
protected void postProcessDependencies(java.util.List<TypedDependency> basicDep)
protected void getExtras(java.util.List<TypedDependency> basicDep)
public java.util.Set<Dependency<Label,Label,java.lang.Object>> dependencies()
GrammaticalStructure
.GrammaticalStructure
.public static java.util.Set<TreeGraphNode> getDependents(TreeGraphNode t)
Set
of leaf (terminal) nodes
which are the DEPENDENT
s of the given node t
.
Probably, t
should be a leaf node as well.t
- a leaf node in this GrammaticalStructure
Set
of nodes which are dependents of
node t
, or else null
public static TreeGraphNode getGovernor(TreeGraphNode t)
GOVERNOR
of the given node t
.
Probably, t
should be a leaf node as well.t
- a leaf node in this GrammaticalStructure
t
, or else null
public static TreeGraphNode getNodeInRelation(TreeGraphNode t, GrammaticalRelation r)
public GrammaticalRelation getGrammaticalRelation(int govIndex, int depIndex)
public static GrammaticalRelation getGrammaticalRelation(TreeGraphNode gov, TreeGraphNode dep)
public static java.util.List<GrammaticalRelation> getListGrammaticalRelation(TreeGraphNode gov, TreeGraphNode dep)
public java.util.Collection<TypedDependency> typedDependencies()
public java.util.Collection<TypedDependency> allTypedDependencies()
public java.util.List<TypedDependency> typedDependencies(boolean includeExtras)
includeExtras
- If true, the list of typed dependencies
returned may include "extras", and does not follow a tree structure.public java.util.Collection<TypedDependency> typedDependenciesCollapsed()
public java.util.Collection<TypedDependency> typedDependenciesCollapsedTree()
public java.util.List<TypedDependency> typedDependenciesCollapsed(boolean includeExtras)
true
.includeExtras
- If true, the list of typed dependencies
returned may include "extras", like controlling subjectspublic java.util.List<TypedDependency> typedDependenciesCCprocessed(boolean includeExtras)
true
.
The "CCPropagated" option corresponds to calling this method with an
argument of true
.includeExtras
- If true, the list of typed dependencies
returned may include "extras", such as controlled subject links.public java.util.List<TypedDependency> typedDependenciesCCprocessed()
protected void collapseDependencies(java.util.List<TypedDependency> list, boolean CCprocess, boolean includeExtras)
Collection<TypedDependency>
to collapse
language-dependent transitive dependencies.
Default is no-op; to be over-ridden in subclasses.list
- A list of dependencies to process for possible collapsingCCprocess
- apply CC process?protected void collapseDependenciesTree(java.util.List<TypedDependency> list)
Collection<TypedDependency>
to collapse
language-dependent transitive dependencies but keeping a tree structure.
Default is no-op; to be over-ridden in subclasses.list
- A list of dependencies to process for possible collapsingprotected void correctDependencies(java.util.Collection<TypedDependency> list)
TypedDependencyGraph
to correct
language-dependent dependencies. (e.g., nsubjpass in a relative clause)
Default is no-op; to be over-ridden in subclasses.public java.util.List<java.lang.String> getDependencyPath(int nodeIndex, int rootIndex)
public static boolean isConnected(java.util.Collection<TypedDependency> list)
list
- a list of typedDependenciespublic static java.util.Collection<TypedDependency> getRoots(java.util.Collection<TypedDependency> list)
list
- The list of TypedDependencies to checkpublic static void printDependencies(GrammaticalStructure gs, java.util.Collection<TypedDependency> deps, Tree tree, boolean conllx, boolean extraSep)
deps
- Typed dependencies to printtree
- Tree corresponding to typed dependencies (only necessary if conllx
== true)conllx
- If true use conllx format, otherwise use Stanford representationextraSep
- If true, in the Stanford representation, the extra dependencies
(which do not preserve the tree structure) are printed after the
basic dependenciespublic static java.lang.String dependenciesToString(GrammaticalStructure gs, java.util.Collection<TypedDependency> deps, Tree tree, boolean conllx, boolean extraSep)
public static java.util.List<GrammaticalStructure> readCoNLLXGrammaticalStructureCollection(java.lang.String fileName, java.util.Map<java.lang.String,GrammaticalRelation> shortNameToGRel, GrammaticalStructureFromDependenciesFactory factory) throws java.io.IOException
java.io.IOException
public static GrammaticalStructure buildCoNLLXGrammaticalStructure(java.util.List<java.util.List<java.lang.String>> tokenFields, java.util.Map<java.lang.String,GrammaticalRelation> shortNameToGRel, GrammaticalStructureFromDependenciesFactory factory)
public static void main(java.lang.String[] args)
By default, the method outputs the collapsed typed dependencies with processing of conjuncts. The input can be given as plain text (one sentence by line) using the option -sentFile, or as trees using the option -treeFile. For -sentFile, the input has to be strictly one sentence per line. You can specify where to find a parser with -parserFile serializedParserPath. See LexicalizedParser for more flexible processing of text files (including with Stanford Dependencies output). The above options assume a file as input. You can also feed trees (only) via stdin by using the option -filter. If one does not specify a -parserFile, one can specify which language pack to use with -tLPP, This option specifies a class which determines which GrammaticalStructure to use, which HeadFinder to use, etc. It will default to edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams, but any TreebankLangParserParams can be specified.
If no method of producing trees is given other than to use the LexicalizedParser, but no parser is specified, a default parser is used, the English parser. You can specify options to load with the parser using the -parserOpts flag. If the default parser is used, and no options are provided, the option -retainTmpSubcategories is used.
The following options can be used to specify the types of dependencies wanted:
The -conllx
option will output the dependencies in the CoNLL format,
instead of in the standard Stanford format (relation(governor,dependent))
and will retain punctuation by default.
When used in the "collapsed" format, words such as prepositions, conjunctions
which get collapsed into the grammatical relations and are not part of the
sentence per se anymore will be annotated with "erased" as grammatical relation
and attached to the fake "ROOT" node with index 0.
There is also an option to retain dependencies involving punctuation:
-keepPunct
The -extraSep
option used with -nonCollapsed will print the basic
dependencies first, then a separator ======, and then the extra
dependencies that do not preserve the tree structure. The -test option is
used for debugging: it prints the grammatical structure, as well as the
basic, collapsed and CCprocessed dependencies. It also checks the
connectivity of the collapsed dependencies. If the collapsed dependencies
list doesn't constitute a connected graph, it prints the possible offending
nodes (one of them is the real root of the graph).
Using the -conllxFile, you can pass a file containing Stanford dependencies in the CoNLL format (e.g., the basic dependencies), and obtain another representation using one of the representation options.
Usage:
java edu.stanford.nlp.trees.GrammaticalStructure [-treeFile FILE | -sentFile FILE | -conllxFile FILE | -filter]
[-collapsed -basic -CCprocessed -test]
args
- Command-line arguments, as above