edu.stanford.nlp.semgrex
Class SemgrexPattern

java.lang.Object
  extended by edu.stanford.nlp.semgrex.SemgrexPattern
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
CoordinationPattern, NodePattern

public abstract class SemgrexPattern
extends Object
implements Serializable

A SemgrexPattern is a tgrep-type pattern for matching node configurations in one of the SemanticGraph structures. Unlike tgrep but like Unix grep, there is no pre-indexing of the data to be searched. Rather there is a linear scan through the graph where matches are sought.

SemgrexPattern instances can be matched against instances of the IndexedWord class.

A node is represented by a set of attributes and their values contained by curly braces: {attr1:value1;attr2:value2;...}. Therefore, {} represents any node in the graph. Attributes must be plain strings; values can be strings or regular expressions blocked off by "/". (I think regular expressions must match the whole attribute value; so that /NN/ matches "NN" only, while /NN.* / matches "NN", "NNS", "NNP", etc. --wcmac)

For example, {lemma:slice;tag:/VB.* /} represents any verb nodes with "slice" as their lemma.

The root of the graph can be marked by the $ sign, that is {$} represents the root node.

Relations are defined by a symbol representing the type of relationship and a string, or regular expression representing the value of the relationship. A relationship string of % means any relationship. It is also OK simply to omit the relationship symbol altogether.

Currently supported node relations and their symbols:

SymbolMeaning
A <reln B A is the dependent of a relation reln with B
A >reln B A is the governer of a relation reln with B
A <<reln B A is the dependent of a relation reln in a chain to B following dep->gov paths
A >>reln B A is the governer of a relation reln in a chain to B following gov->dep paths
A x,y<<reln B A is the dependent of a relation reln in a chain to B following dep->gov paths between distances of x and y
A x,y>>reln B A is the governer of a relation reln in a chain to B following gov->dep paths between distances of x and y
A @ B A is aligned to B

In a chain of relations, all relations are relative to the first node in the chain. For example, "{} >nsubj {} >dobj {}" means "any node that is the governer of both a nsubj and a dobj relation". If instead what you want is a node that is the governer of a nsubj relation with a node that is itself the governer of dobj relation, you should write: "{} >nsubj ({} >dobj {})".

If a relation type is specified for the << relation, the relation type is only used for the first relation in the sequence. Therefore, if B depends on A with the relation type foo, the pattern {} <<foo {} will then match B and everything that depends on B.

Similarly, if a relation type is specified for the >> relation, the relation type is only used for the last relation in the sequence. Therefore, if A governs B with the relation type foo, the pattern {} >>foo {} will then match A and all of the nodes which have a sequence leading to A.

Boolean relational operators

Relations can be combined using the '&' and '|' operators, negated with the '!' operator, and made optional with the '?' operator.

Relations can be grouped using brackets '[' and ']'. So the expression

{} [<subj {} | <agent {}] & @ {}
matches a node that is either the dep of a subj or agent relationship and has an alignment to some other node.

Relations can be negated with the '!' operator, in which case the expression will match only if there is no node satisfying the relation.

Relations can be made optional with the '?' operator. This way the expression will match even if the optional relation is not satisfied.

The operator ":" partitions a pattern into separate patterns, each of which must be matched. For example, the following is a pattern where the matched node must have both "foo" and "bar" as descendants:

{}=a >> {word:foo} : {}=a >> {word:bar}
This pattern could have been written
{}=a >> {word:foo} >> {word:bar}
However, for more complex examples, partitioning a pattern may make it more readable.

Naming nodes

Nodes can be given names (a.k.a. handles) using '='. A named node will be stored in a map that maps names to nodes so that if a match is found, the node corresponding to the named node can be extracted from the map. For example ({tag:NN}=noun) will match a singular noun node and after a match is found, the map can be queried with the name to retreived the matched node using SemgrexMatcher.getNode(String o) with (String) argument "noun" (not "=noun"). Note that you are not allowed to name a node that is under the scope of a negation operator (the semantics would be unclear, since you can't store a node that never gets matched to). Trying to do so will cause a ParseException to be thrown. Named nodes can be put within the scope of an optionality operator.

Named nodes that refer back to previous named nodes need not have a node description -- this is known as "backreferencing". In this case, the expression will match only when all instances of the same name get matched to the same node. For example: the pattern {} >dobj ({} > {}=foo) >mod ({} > {}=foo) will match a graph in which there are two nodes, X and Y, for which X is the grandparent of Y and there are two paths to Y, one of which goes through a dobj and one of which goes through a mod.

Author:
Chloe Kiddon
See Also:
Serialized Form

Method Summary
static SemgrexPattern compile(String semgrex)
          Creates a pattern from the given string.
 boolean equals(Object o)
           
 int hashCode()
           
 SemgrexMatcher matcher(SemanticGraph sg)
          Get a SemgrexMatcher for this pattern in this graph.
 SemgrexMatcher matcher(SemanticGraph hypGraph, Alignment alignment, SemanticGraph txtGraph)
           
 SemgrexMatcher matcher(SemanticGraph hypGraph, Alignment alignment, SemanticGraph txtGraph, boolean ignoreCase)
           
 SemgrexMatcher matcher(SemanticGraph sg, boolean ignoreCase)
          Get a SemgrexMatcher for this pattern in this graph.
 SemgrexMatcher matcher(SemanticGraph sg, Map<String,IndexedWord> variables)
          Get a SemgrexMatcher for this pattern in this graph, with some initial conditions on the variable assignments
 String pattern()
           
 void prettyPrint()
          Print a multi-line respresentation of the pattern illustrating its syntax to System.out.
 void prettyPrint(PrintStream ps)
          Print a multi-line respresentation of the pattern illustrating its syntax.
 void prettyPrint(PrintWriter pw)
          Print a multi-line respresentation of the pattern illustrating its syntax.
 void setPatternString(String patternString)
           
abstract  String toString()
           
abstract  String toString(boolean hasPrecedence)
          hasPrecedence indicates that this pattern has precedence in terms of "order of operations", so there is no need to parenthesize the expression
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Method Detail

matcher

public SemgrexMatcher matcher(SemanticGraph sg)
Get a SemgrexMatcher for this pattern in this graph.

Parameters:
sg - the SemanticGraph to match on
Returns:
a SemgrexMatcher

matcher

public SemgrexMatcher matcher(SemanticGraph sg,
                              Map<String,IndexedWord> variables)
Get a SemgrexMatcher for this pattern in this graph, with some initial conditions on the variable assignments


matcher

public SemgrexMatcher matcher(SemanticGraph sg,
                              boolean ignoreCase)
Get a SemgrexMatcher for this pattern in this graph.

Parameters:
sg - the SemanticGraph to match on
ignoreCase - will ignore case for matching a pattern with a node; not implemented by Coordination Pattern
Returns:
a SemgrexMatcher

matcher

public SemgrexMatcher matcher(SemanticGraph hypGraph,
                              Alignment alignment,
                              SemanticGraph txtGraph)

matcher

public SemgrexMatcher matcher(SemanticGraph hypGraph,
                              Alignment alignment,
                              SemanticGraph txtGraph,
                              boolean ignoreCase)

compile

public static SemgrexPattern compile(String semgrex)
Creates a pattern from the given string.

Parameters:
semgrex - the pattern string
Returns:
a SemgrexPattern for the string.

pattern

public String pattern()

setPatternString

public void setPatternString(String patternString)

toString

public abstract String toString()
Overrides:
toString in class Object
Returns:
A single-line string representation of the pattern

toString

public abstract String toString(boolean hasPrecedence)
hasPrecedence indicates that this pattern has precedence in terms of "order of operations", so there is no need to parenthesize the expression


prettyPrint

public void prettyPrint(PrintWriter pw)
Print a multi-line respresentation of the pattern illustrating its syntax.


prettyPrint

public void prettyPrint(PrintStream ps)
Print a multi-line respresentation of the pattern illustrating its syntax.


prettyPrint

public void prettyPrint()
Print a multi-line respresentation of the pattern illustrating its syntax to System.out.


equals

public boolean equals(Object o)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object


Stanford NLP Group