public abstract class GrammaticalStructure extends Object implements Serializable
GrammaticalStructure
stores dependency relations between
nodes in a tree. A new GrammaticalStructure
is constructed
from an existing parse tree with the help of GrammaticalRelation
, which
defines a hierarchy of grammatical relations, along with
patterns for identifying them in parse trees. The constructor for
GrammaticalStructure
uses these definitions to
populate the new GrammaticalStructure
with as many
labeled grammatical relations as it can. Once constructed, the new
GrammaticalStructure
can be printed in various
formats, or interrogated using the interface methods in this
class. Internally, this uses a representation via a TreeGraphNode
,
that is, a tree with additional labeled
arcs between nodes, for representing the grammatical relations in a
parse tree.
Caveat emptor! This is a work in progress.
Nothing in here should be relied upon to function perfectly.
Feedback welcome.EnglishGrammaticalRelations
,
GrammaticalRelation
,
EnglishGrammaticalStructure
,
Serialized FormModifier and Type | Field and Description |
---|---|
protected List<TypedDependency> |
allTypedDependencies |
static int |
CoNLLX_FieldCount |
static int |
CoNLLX_GovField |
static int |
CoNLLX_POSField |
static int |
CoNLLX_RelnField |
static int |
CoNLLX_WordField |
static String |
DEFAULT_PARSER_FILE |
protected java.util.function.Predicate<String> |
puncFilter |
protected TreeGraphNode |
root
The root Tree node for this GrammaticalStructure.
|
protected List<TypedDependency> |
typedDependencies |
Constructor and Description |
---|
GrammaticalStructure(List<TypedDependency> projectiveDependencies,
TreeGraphNode root) |
GrammaticalStructure(Tree t,
Collection<GrammaticalRelation> relations,
HeadFinder hf,
java.util.function.Predicate<String> puncFilter) |
GrammaticalStructure(Tree t,
Collection<GrammaticalRelation> relations,
Lock relationsLock,
HeadFinder hf,
java.util.function.Predicate<String> puncFilter)
Create a new GrammaticalStructure, analyzing the parse tree and
populate the GrammaticalStructure with as many labeled
grammatical relation arcs as possible.
|
Modifier and Type | Method and Description |
---|---|
Collection<TypedDependency> |
allTypedDependencies()
Returns all the typed dependencies of this grammatical structure.
|
static GrammaticalStructure |
buildCoNLLXGrammaticalStructure(List<List<String>> tokenFields,
Map<String,GrammaticalRelation> shortNameToGRel,
GrammaticalStructureFromDependenciesFactory factory) |
protected void |
collapseDependencies(List<TypedDependency> list,
boolean CCprocess,
boolean includeExtras)
Destructively modify the
Collection<TypedDependency> to collapse
language-dependent transitive dependencies. |
protected void |
collapseDependenciesTree(List<TypedDependency> list)
Destructively modify the
Collection<TypedDependency> to collapse
language-dependent transitive dependencies but keeping a tree structure. |
protected void |
correctDependencies(Collection<TypedDependency> list)
Destructively modify the
TypedDependencyGraph to correct
language-dependent dependencies. |
static String |
dependenciesToString(GrammaticalStructure gs,
Collection<TypedDependency> deps,
Tree tree,
boolean conllx,
boolean extraSep) |
protected java.util.function.Predicate<TypedDependency> |
extraTreeDepFilter()
Returns a Filter which checks dependencies for usefulness as
extra tree-based dependencies.
|
static GrammaticalStructure |
fromStringReps(List<String> tokens,
List<String> posTags,
List<String> deps)
Create a grammatical structure from its string representation.
|
protected void |
getExtras(List<TypedDependency> basicDep)
Get extra dependencies that do not depend on the tree structure,
but rather only depend on the existing dependency structure.
|
GrammaticalRelation |
getGrammaticalRelation(IndexedWord gov,
IndexedWord dep)
Get GrammaticalRelation between gov and dep, and null if gov is not the
governor of dep
|
GrammaticalRelation |
getGrammaticalRelation(int govIndex,
int depIndex)
Get GrammaticalRelation between gov and dep, and null if gov is not the
governor of dep
|
static Collection<TypedDependency> |
getRoots(Collection<TypedDependency> list)
Return a list of TypedDependencies which are not dependent on any node from the list.
|
static boolean |
isConnected(Collection<TypedDependency> list)
Checks if all the typeDependencies are connected
|
static void |
main(String[] args)
Given sentences or trees, output the typed dependencies.
|
protected void |
postProcessDependencies(List<TypedDependency> basicDep)
Post process the dependencies in whatever way this language
requires.
|
static void |
printDependencies(GrammaticalStructure gs,
Collection<TypedDependency> deps,
Tree tree,
boolean conllx,
boolean extraSep)
Print typed dependencies in either the Stanford dependency representation
or in the conllx format.
|
static List<GrammaticalStructure> |
readCoNLLXGrammaticalStructureCollection(String fileName,
Map<String,GrammaticalRelation> shortNameToGRel,
GrammaticalStructureFromDependenciesFactory factory)
Read in a file containing a CoNLL-X dependency treebank and return a
corresponding list of GrammaticalStructures.
|
TreeGraphNode |
root()
Return the root Tree of this GrammaticalStructure.
|
String |
toString() |
Collection<TypedDependency> |
typedDependencies()
Returns the typed dependencies of this grammatical structure.
|
List<TypedDependency> |
typedDependencies(boolean includeExtras)
Returns the typed dependencies of this grammatical structure.
|
List<TypedDependency> |
typedDependenciesCCprocessed()
Get a list of the typed dependencies, including extras like control
dependencies, collapsing them and distributing relations across
coordination.
|
List<TypedDependency> |
typedDependenciesCCprocessed(boolean includeExtras)
Get the typed dependencies after collapsing them and processing eventual
CC complements.
|
Collection<TypedDependency> |
typedDependenciesCollapsed()
Get the typed dependencies after collapsing them.
|
List<TypedDependency> |
typedDependenciesCollapsed(boolean includeExtras)
Get the typed dependencies after collapsing them.
|
Collection<TypedDependency> |
typedDependenciesCollapsedTree()
Get the typed dependencies after mostly collapsing them, but keep a tree
structure.
|
protected final List<TypedDependency> typedDependencies
protected final List<TypedDependency> allTypedDependencies
protected final java.util.function.Predicate<String> puncFilter
protected final TreeGraphNode root
public static final String DEFAULT_PARSER_FILE
public static final int CoNLLX_WordField
public static final int CoNLLX_POSField
public static final int CoNLLX_GovField
public static final int CoNLLX_RelnField
public static final int CoNLLX_FieldCount
public GrammaticalStructure(Tree t, Collection<GrammaticalRelation> relations, Lock relationsLock, HeadFinder hf, java.util.function.Predicate<String> puncFilter)
t
- A Tree to analyzerelations
- A set of GrammaticalRelations to considerrelationsLock
- Something needed to make this thread-safehf
- A HeadFinder for analysispuncFilter
- A Filter to reject punctuation. To delete punctuation
dependencies, this filter should return false on
punctuation word strings, and true otherwise.
If punctuation dependencies should be kept, you
should pass in a Filters.<String>acceptFilter().public GrammaticalStructure(List<TypedDependency> projectiveDependencies, TreeGraphNode root)
public GrammaticalStructure(Tree t, Collection<GrammaticalRelation> relations, HeadFinder hf, java.util.function.Predicate<String> puncFilter)
public TreeGraphNode root()
public static GrammaticalStructure fromStringReps(List<String> tokens, List<String> posTags, List<String> deps)
tokens
- posTags
- deps
- protected java.util.function.Predicate<TypedDependency> extraTreeDepFilter()
protected void postProcessDependencies(List<TypedDependency> basicDep)
protected void getExtras(List<TypedDependency> basicDep)
public GrammaticalRelation getGrammaticalRelation(int govIndex, int depIndex)
public GrammaticalRelation getGrammaticalRelation(IndexedWord gov, IndexedWord dep)
public Collection<TypedDependency> typedDependencies()
public Collection<TypedDependency> allTypedDependencies()
public List<TypedDependency> typedDependencies(boolean includeExtras)
includeExtras
- If true, the list of typed dependencies
returned may include "extras", and does not follow a tree structure.public Collection<TypedDependency> typedDependenciesCollapsed()
public Collection<TypedDependency> typedDependenciesCollapsedTree()
public List<TypedDependency> typedDependenciesCollapsed(boolean includeExtras)
true
.includeExtras
- If true, the list of typed dependencies
returned may include "extras", like controlling subjectspublic List<TypedDependency> typedDependenciesCCprocessed(boolean includeExtras)
true
.
The "CCPropagated" option corresponds to calling this method with an
argument of true
.includeExtras
- If true, the list of typed dependencies
returned may include "extras", such as controlled subject links.public List<TypedDependency> typedDependenciesCCprocessed()
protected void collapseDependencies(List<TypedDependency> list, boolean CCprocess, boolean includeExtras)
Collection<TypedDependency>
to collapse
language-dependent transitive dependencies.
Default is no-op; to be over-ridden in subclasses.list
- A list of dependencies to process for possible collapsingCCprocess
- apply CC process?protected void collapseDependenciesTree(List<TypedDependency> list)
Collection<TypedDependency>
to collapse
language-dependent transitive dependencies but keeping a tree structure.
Default is no-op; to be over-ridden in subclasses.list
- A list of dependencies to process for possible collapsingprotected void correctDependencies(Collection<TypedDependency> list)
TypedDependencyGraph
to correct
language-dependent dependencies. (e.g., nsubjpass in a relative clause)
Default is no-op; to be over-ridden in subclasses.public static boolean isConnected(Collection<TypedDependency> list)
list
- a list of typedDependenciespublic static Collection<TypedDependency> getRoots(Collection<TypedDependency> list)
list
- The list of TypedDependencies to checkpublic static void printDependencies(GrammaticalStructure gs, Collection<TypedDependency> deps, Tree tree, boolean conllx, boolean extraSep)
deps
- Typed dependencies to printtree
- Tree corresponding to typed dependencies (only necessary if conllx
== true)conllx
- If true use conllx format, otherwise use Stanford representationextraSep
- If true, in the Stanford representation, the extra dependencies
(which do not preserve the tree structure) are printed after the
basic dependenciespublic static String dependenciesToString(GrammaticalStructure gs, Collection<TypedDependency> deps, Tree tree, boolean conllx, boolean extraSep)
public static List<GrammaticalStructure> readCoNLLXGrammaticalStructureCollection(String fileName, Map<String,GrammaticalRelation> shortNameToGRel, GrammaticalStructureFromDependenciesFactory factory) throws IOException
IOException
public static GrammaticalStructure buildCoNLLXGrammaticalStructure(List<List<String>> tokenFields, Map<String,GrammaticalRelation> shortNameToGRel, GrammaticalStructureFromDependenciesFactory factory)
public static void main(String[] args)
By default, the method outputs the collapsed typed dependencies with processing of conjuncts. The input can be given as plain text (one sentence by line) using the option -sentFile, or as trees using the option -treeFile. For -sentFile, the input has to be strictly one sentence per line. You can specify where to find a parser with -parserFile serializedParserPath. See LexicalizedParser for more flexible processing of text files (including with Stanford Dependencies output). The above options assume a file as input. You can also feed trees (only) via stdin by using the option -filter. If one does not specify a -parserFile, one can specify which language pack to use with -tLPP, This option specifies a class which determines which GrammaticalStructure to use, which HeadFinder to use, etc. It will default to edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams, but any TreebankLangParserParams can be specified.
If no method of producing trees is given other than to use the LexicalizedParser, but no parser is specified, a default parser is used, the English parser. You can specify options to load with the parser using the -parserOpts flag. If the default parser is used, and no options are provided, the option -retainTmpSubcategories is used.
The following options can be used to specify the types of dependencies wanted:
The -conllx
option will output the dependencies in the CoNLL format,
instead of in the standard Stanford format (relation(governor,dependent))
and will retain punctuation by default.
When used in the "collapsed" format, words such as prepositions, conjunctions
which get collapsed into the grammatical relations and are not part of the
sentence per se anymore will be annotated with "erased" as grammatical relation
and attached to the fake "ROOT" node with index 0.
There is also an option to retain dependencies involving punctuation:
-keepPunct
The -extraSep
option used with -nonCollapsed will print the basic
dependencies first, then a separator ======, and then the extra
dependencies that do not preserve the tree structure. The -test option is
used for debugging: it prints the grammatical structure, as well as the
basic, collapsed and CCprocessed dependencies. It also checks the
connectivity of the collapsed dependencies. If the collapsed dependencies
list doesn't constitute a connected graph, it prints the possible offending
nodes (one of them is the real root of the graph).
Using the -conllxFile, you can pass a file containing Stanford dependencies in the CoNLL format (e.g., the basic dependencies), and obtain another representation using one of the representation options.
Usage:
java edu.stanford.nlp.trees.GrammaticalStructure [-treeFile FILE | -sentFile FILE | -conllxFile FILE | -filter]
[-collapsed -basic -CCprocessed -test]
args
- Command-line arguments, as above