public abstract class SemgrexPattern extends Object implements Serializable
tgrep
-type pattern for matching node
configurations in one of the SemanticGraph structures. Unlike
tgrep
but like Unix grep
, there is no pre-indexing
of the data to be searched. Rather there is a linear scan through the graph
where matches are sought.
SemgrexPattern instances can be matched against instances of the IndexedWord
class.
A node is represented by a set of attributes and their values contained by
curly braces: {attr1:value1;attr2:value2;...}. Therefore, {} represents any
node in the graph. Attributes must be plain strings; values can be strings
or regular expressions blocked off by "/". (I think regular expressions must
match the whole attribute value; so that /NN/ matches "NN" only, while /NN.* /
matches "NN", "NNS", "NNP", etc. --wcmac)
For example, {lemma:slice;tag:/VB.* /}
represents any verb nodes
with "slice" as their lemma. Attributes are extracted using
edu.stanford.nlp.ling.AnnotationLookup
.
The root of the graph can be marked by the $ sign, that is {$}
represents the root node.
Relations are defined by a symbol representing the type of relationship and a
string or regular expression representing the value of the relationship. A
relationship string of %
means any relationship. It is
also OK simply to omit the relationship symbol altogether.
Currently supported node relations and their symbols:
Symbol | Meaning |
---|---|
A <reln B | A is the dependent of a relation reln with B |
A >reln B | A is the governer of a relation reln with B |
A <<reln B | A is the dependent of a relation reln in a chain to B following dep->gov paths |
A >>reln B | A is the governer of a relation reln in a chain to B following gov->dep paths |
A x,y<<reln B | A is the dependent of a relation reln in a chain to B following dep->gov paths between distances of x and y |
A x,y>>reln B | A is the governer of a relation reln in a chain to B following gov->dep paths between distances of x and y |
A == B | A and B are the same nodes in the same graph |
A @ B | A is aligned to B |
{} >nsubj {} >dobj
{}
" means "any node that is the governor of both a nsubj and
a dobj relation". If instead what you want is a node that is the
governer of a nsubj relation with a node that is itself the
governer of dobj relation, you should write: "{} >nsubj
({} >dobj {})
".
If a relation type is specified for the << relation, the
relation type is only used for the first relation in the sequence.
Therefore, if B depends on A with the relation type foo, the
pattern {} <<foo {}
will then match B and
everything that depends on B.
Similarly, if a relation type is specified for the >>
relation, the relation type is only used for the last relation in
the sequence. Therefore, if A governs B with the relation type
foo, the pattern {} >>foo {}
will then match A
and all of the nodes which have a sequence leading to A.
{} [<subj {} | <agent {}] & @ {}
matches a node that is either the dep of a subj or agent relationship and
has an alignment to some other node.
Relations can be negated with the '!' operator, in which case the expression will match only if there is no node satisfying the relation.
Relations can be made optional with the '?' operator. This way the expression will match even if the optional relation is not satisfied.
The operator ":" partitions a pattern into separate patterns, each of which must be matched. For example, the following is a pattern where the matched node must have both "foo" and "bar" as descendants:
{}=a >> {word:foo} : {}=a >> {word:bar}
This pattern could have been written
{}=a >> {word:foo} >> {word:bar}
However, for more complex examples, partitioning a pattern may make
it more readable.
({tag:NN}=noun)
will match a singular noun node and
after a match is found, the map can be queried with the name to retrieved the
matched node using SemgrexMatcher.getNode(String o)
with (String)
argument "noun" (ParseException
to be thrown. Named nodes
{} >dobj ({} > {}=foo) >mod ({} > {}=foo)
will match a graph in which there are two nodes, X
and
Y
, for which X
is the grandparent of
Y
and there are two paths to Y
, one of
which goes through a dobj
and one of which goes
through a mod
.
{idx:1} >=reln {idx:2}
The name of the relation will then
be stored in the matcher and can be extracted with getRelnName("reln")
At present, though, there is no backreferencing capability such as with the
named nodes; this is only useful when using the API to extract the name of the
relation used when making the match.
In the case of ancestor and descendant relations, the last
relation in the sequence of relations is the name used.
Modifier and Type | Method and Description |
---|---|
static SemgrexPattern |
compile(String semgrex)
Creates a pattern from the given string.
|
boolean |
equals(Object o) |
int |
hashCode() |
static void |
help() |
static void |
main(String[] args)
Prints out all matches of a semgrex pattern on a file of dependencies.
|
SemgrexMatcher |
matcher(SemanticGraph sg)
Get a
SemgrexMatcher for this pattern in this graph. |
SemgrexMatcher |
matcher(SemanticGraph hypGraph,
Alignment alignment,
SemanticGraph txtGraph) |
SemgrexMatcher |
matcher(SemanticGraph hypGraph,
Alignment alignment,
SemanticGraph txtGraph,
boolean ignoreCase) |
SemgrexMatcher |
matcher(SemanticGraph sg,
boolean ignoreCase)
Get a
SemgrexMatcher for this pattern in this graph. |
SemgrexMatcher |
matcher(SemanticGraph sg,
Map<String,IndexedWord> variables)
Get a
SemgrexMatcher for this pattern in this graph, with some
initial conditions on the variable assignments |
String |
pattern() |
void |
prettyPrint()
Print a multi-line representation of the pattern illustrating its syntax
to
System.out . |
void |
prettyPrint(PrintStream ps)
Print a multi-line representation of the pattern illustrating its syntax.
|
void |
prettyPrint(PrintWriter pw)
Print a multi-line representation of the pattern illustrating its syntax.
|
abstract String |
toString() |
abstract String |
toString(boolean hasPrecedence) |
public SemgrexMatcher matcher(SemanticGraph sg)
SemgrexMatcher
for this pattern in this graph.sg
- the SemanticGraph to match onpublic SemgrexMatcher matcher(SemanticGraph sg, Map<String,IndexedWord> variables)
SemgrexMatcher
for this pattern in this graph, with some
initial conditions on the variable assignmentspublic SemgrexMatcher matcher(SemanticGraph sg, boolean ignoreCase)
SemgrexMatcher
for this pattern in this graph.sg
- the SemanticGraph to match onignoreCase
- will ignore case for matching a pattern with a node; not
implemented by Coordination Patternpublic SemgrexMatcher matcher(SemanticGraph hypGraph, Alignment alignment, SemanticGraph txtGraph)
public SemgrexMatcher matcher(SemanticGraph hypGraph, Alignment alignment, SemanticGraph txtGraph, boolean ignoreCase)
public static SemgrexPattern compile(String semgrex)
semgrex
- the pattern stringpublic String pattern()
public abstract String toString()
public abstract String toString(boolean hasPrecedence)
hasPrecedence
- indicates that this pattern has precedence in terms
of "order of operations", so there is no need to parenthesize the
expressionpublic void prettyPrint(PrintWriter pw)
public void prettyPrint(PrintStream ps)
public void prettyPrint()
System.out
.public static void help()
public static void main(String[] args)