SemgrexPattern (Stanford JavaNLP API)

java.lang.Object
- edu.stanford.nlp.semgraph.semgrex.SemgrexPattern

All Implemented Interfaces:: java.io.Serializable

Direct Known Subclasses:: CoordinationPattern, NodePattern

public abstract class SemgrexPattern
extends java.lang.Object
implements java.io.Serializable

A SemgrexPattern is a pattern for matching node and edge configurations a dependency graph. Patterns are written in a similar style to tgrep or Tregex and operate over SemanticGraph objects, which contain IndexedWord nodes. Unlike tgrep but like Unix grep, there is no pre-indexing of the data to be searched. Rather there is a linear scan through the graph where matches are sought.

Nodes

A node is represented by a set of attributes and their values contained by curly braces: {attr1:value1;attr2:value2;...}. Therefore, {} represents any node in the graph. Attributes must be plain strings; values can be strings or regular expressions blocked off by "/". Regular expressions must match the whole attribute value, so that /NN/ matches "NN" only, while /NN.*/ matches "NN", "NNS", "NNP", etc.

For example, {lemma:slice;tag:/VB.*/} represents any verb nodes with "slice" as their lemma. Attributes are extracted using AnnotationLookup.

The root of the graph can be marked by the $ sign, that is {$} represents the root node.

A node description can be negated with '!'. !{lemma:boy} matches any token that isn't "boy".
Another way to negate a node description is with a negative lookahead regex, although this starts to look a little ugly. For example, {lemma:/^(?!boy).*$/} will also match any token with a lemma that isn't "boy". Note, however, that if you use this style, there needs to be some lemma attached to the token.

Relations

Relations are defined by a symbol representing the type of relationship and a string or regular expression representing the value of the relationship. A relationship string of % means any relationship. It is also OK simply to omit the relationship symbol altogether.

Currently supported node relations and their symbols:

Currently supported node relations
Symbol	Meaning
A <reln B	A is the dependent of a relation reln with B
A >reln B	A is the governor of a relation reln with B
A <<reln B	A is the dependent of a relation reln in a chain to B following `dep->gov` paths
A >>reln B	A is the governor of a relation reln in a chain to B following `gov->dep` paths
`A x,y<<reln B`	A is the dependent of a relation reln in a chain to B following `dep->gov` paths between distances of x and y
`A x,y>>reln B`	A is the governor of a relation reln in a chain to B following `gov->dep` paths between distances of x and y
A == B	A and B are the same nodes in the same graph
A . B	A immediately precedes B, i.e. A.index() == B.index() - 1
A - B	A immediately succeeds B, i.e. A.index() == B.index() + 1
A .. B	A precedes B, i.e. `A.index() < B.index()`
A -- B	A succeeds B, i.e. `A.index() > B.index()`
A $+ B	B is a right immediate sibling of A, i.e. A and B have the same parent and A.index() == B.index() - 1
A $- B	B is a left immediate sibling of A, i.e. A and B have the same parent and A.index() == B.index() + 1
A $++ B	B is a right sibling of A, i.e. A and B have the same parent and `A.index() < B.index()`
A $-- B	B is a left sibling of A, i.e. A and B have the same parent and `A.index() > B.index()`
A @ B	A is aligned to B (this is only used when you have two dependency graphs which are aligned)

In a chain of relations, all relations are relative to the first node in the chain. For example, "{} >nsubj {} >dobj {}" means "any node that is the governor of both a nsubj and a dobj relation". If instead what you want is a node that is the governor of a nsubj relation with a node that is itself the governor of dobj relation, you should use parentheses and write: "{} >nsubj ({} >dobj {})".

If a relation type is specified for the << relation, the relation type is only used for the first relation in the sequence. Therefore, if B depends on A with the relation type foo, the pattern {} <<foo {} will then match B and everything that depends on B.

Similarly, if a relation type is specified for the >> relation, the relation type is only used for the last relation in the sequence. Therefore, if A governs B with the relation type foo, the pattern {} >>foo {} will then match A and all of the nodes which have a sequence leading to A.

Boolean relational operators

Relations can be combined using the '&' and '|' operators, negated with the '!' operator, and made optional with the '?' operator.

Relations can be grouped using brackets '[' and ']'. So the expression

{} [<subj {} | <agent {}] & @ {}

matches a node that is either the dep of a subj or agent relationship and has an alignment to some other node.

Relations can be negated with the '!' operator, in which case the expression will match only if there is no node satisfying the relation.

Relations can be made optional with the '?' operator. This way the expression will match even if the optional relation is not satisfied.

The operator ":" partitions a pattern into separate patterns, each of which must be matched. For example, the following is a pattern where the matched node must have both "foo" and "bar" as descendants:

{}=a >> {word:foo} : {}=a >> {word:bar}

This pattern could have been written

{}=a >> {word:foo} >> {word:bar}

However, for more complex examples, partitioning a pattern may make it more readable.

Naming nodes

Nodes can be given names (a.k.a. handles) using '='. A named node will be stored in a map that maps names to nodes so that if a match is found, the node corresponding to the named node can be extracted from the map. For example ({tag:NN}=noun) will match a singular noun node and after a match is found, the map can be queried with the name to retrieved the matched node using SemgrexMatcher.getNode(String o) with (String) argument "noun" (not "=noun"). Note that you are not allowed to name a node that is under the scope of a negation operator (the semantics would be unclear, since you can't store a node that never gets matched to). Trying to do so will cause a ParseException to be thrown. Named nodes can be put within the scope of an optionality operator.

Named nodes that refer back to previously named nodes need not have a node description -- this is known as "backreferencing". In this case, the expression will match only when all instances of the same name get matched to the same node.

For example:

{} >dobj ({} > {}=foo) >mod ({} > {}=foo)

will match a graph in which there are two nodes, X and Y, for which X is the grandparent of Y and there are two paths to Y, one of which goes through a dobj and one of which goes through a mod.

Naming relations

It is also possible to name relations. For example, you can write the pattern {idx:1} >=reln {idx:2} The name of the relation will then be stored in the matcher and can be extracted with getRelnName("reln"). If the relation is later referenced a second time, the type of relation must be the same, or the potential match will not be accepted.

In the case of ancestor and descendant relations, the last relation in the sequence of relations is the name used.

Naming edges

It is also possible to name edges themselves. The following pattern will iterate through the edges from the root: {$} >~edge {} The edge itself is now stored with the matcher and can be extracted with getEdgeName("edge"). If the edge is later referenced a second time, the exact edge must be the same, or the potential match will not be accepted.
This is only legal on relations with only one link between the two endpoints. Other relations (such as grandparent) will throw a parse exception.

TODO

At present a Semgrex pattern will match only once at a root node, even if there is more than one way of satisfying it under the root node. Probably its semantics should be changed, or at least the option should be given, to return all matches, as is the case for Tregex. (Is this still true? It seems to match multiple times from root.)

Author:: Chloe Kiddon
See Also:: Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class SemgrexPattern.OutputFormat

Nested Classes
Modifier and Type	Class and Description
`static class`	`SemgrexPattern.OutputFormat`

Field Summary

Fields
Modifier and Type Field and Description

protected Env env

Fields
Modifier and Type	Field and Description
`protected Env`	`env`

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`static SemgrexPattern`	`compile(java.lang.String semgrex)`
`static SemgrexPattern`	`compile(java.lang.String semgrex, Env env)` Creates a pattern from the given string.
`boolean`	`equals(java.lang.Object o)`
`int`	`hashCode()`
`static void`	`help()`
`static void`	`main(java.lang.String[] args)` Prints out all matches of a semgrex pattern on a file of dependencies.
`SemgrexMatcher`	`matcher(SemanticGraph sg)` Get a `SemgrexMatcher` for this pattern in this graph.
`SemgrexMatcher`	`matcher(SemanticGraph hypGraph, Alignment alignment, SemanticGraph txtGraph)`
`SemgrexMatcher`	`matcher(SemanticGraph hypGraph, Alignment alignment, SemanticGraph txtGraph, boolean ignoreCase)`
`SemgrexMatcher`	`matcher(SemanticGraph sg, boolean ignoreCase)` Get a `SemgrexMatcher` for this pattern in this graph.
`SemgrexMatcher`	`matcher(SemanticGraph sg, IndexedWord root)` Get a `SemgrexMatcher` for this pattern in this graph.
`SemgrexMatcher`	`matcher(SemanticGraph sg, java.util.Map<java.lang.String,IndexedWord> variables)` Get a `SemgrexMatcher` for this pattern in this graph, with some initial conditions on the variable assignments
`java.lang.String`	`pattern()`
`void`	`prettyPrint()` Print a multi-line representation of the pattern illustrating its syntax to `System.out`.
`void`	`prettyPrint(java.io.PrintStream ps)` Print a multi-line representation of the pattern illustrating its syntax.
`void`	`prettyPrint(java.io.PrintWriter pw)` Print a multi-line representation of the pattern illustrating its syntax.
`void`	`setEnv(Env env)` Recursively sets the env variable to this pattern in this and in all its children
`abstract java.lang.String`	`toString()` The goal is to return a string which will be compiled to the same pattern
`abstract java.lang.String`	`toString(boolean hasPrecedence)`

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Field Detail
  - env
```
protected Env env
```
- Method Detail
  - matcher
```
public SemgrexMatcher matcher(SemanticGraph sg)
```
    Get a SemgrexMatcher for this pattern in this graph.
    
    Parameters:
    
    sg - The SemanticGraph to match on
    
    Returns:
    
    a SemgrexMatcher
  - matcher
```
public SemgrexMatcher matcher(SemanticGraph sg,
                              IndexedWord root)
```
    Get a SemgrexMatcher for this pattern in this graph.
    
    Parameters:
    
    sg - The SemanticGraph to match on
    
    root - The IndexedWord from which to start the search
    
    Returns:
    
    a SemgrexMatcher
  - matcher
```
public SemgrexMatcher matcher(SemanticGraph sg,
                              java.util.Map<java.lang.String,IndexedWord> variables)
```
    Get a SemgrexMatcher for this pattern in this graph, with some initial conditions on the variable assignments
  - matcher
```
public SemgrexMatcher matcher(SemanticGraph sg,
                              boolean ignoreCase)
```
    Get a SemgrexMatcher for this pattern in this graph.
    
    Parameters:
    
    sg - The SemanticGraph to match on
    
    ignoreCase - Will ignore case for matching a pattern with a node; not implemented by Coordination Pattern
    
    Returns:
    
    a SemgrexMatcher
  - matcher
```
public SemgrexMatcher matcher(SemanticGraph hypGraph,
                              Alignment alignment,
                              SemanticGraph txtGraph)
```
  - matcher
```
public SemgrexMatcher matcher(SemanticGraph hypGraph,
                              Alignment alignment,
                              SemanticGraph txtGraph,
                              boolean ignoreCase)
```
  - compile
```
public static SemgrexPattern compile(java.lang.String semgrex,
                                     Env env)
```
    Creates a pattern from the given string.
    
    Parameters:
    
    semgrex - The pattern string
    
    Returns:
    
    A SemgrexPattern for the string.
  - compile
```
public static SemgrexPattern compile(java.lang.String semgrex)
```
  - pattern
```
public java.lang.String pattern()
```
  - setEnv
```
public void setEnv(Env env)
```
    Recursively sets the env variable to this pattern in this and in all its children
    
    Parameters:
    
    env - An Env
  - toString
```
public abstract java.lang.String toString()
```
    The goal is to return a string which will be compiled to the same pattern
    
    Overrides:
    
    toString in class java.lang.Object
    
    Returns:
    
    A single-line string representation of the pattern
  - toString
```
public abstract java.lang.String toString(boolean hasPrecedence)
```
    Parameters:
    
    hasPrecedence - indicates that this pattern has precedence in terms of "order of operations", so there is no need to parenthesize the expression
  - prettyPrint
```
public void prettyPrint(java.io.PrintWriter pw)
```
    Print a multi-line representation of the pattern illustrating its syntax.
  - prettyPrint
```
public void prettyPrint(java.io.PrintStream ps)
```
    Print a multi-line representation of the pattern illustrating its syntax.
  - prettyPrint
```
public void prettyPrint()
```
    Print a multi-line representation of the pattern illustrating its syntax to System.out.
  - equals
```
public boolean equals(java.lang.Object o)
```
    Overrides:
    
    equals in class java.lang.Object
  - hashCode
```
public int hashCode()
```
    Overrides:
    
    hashCode in class java.lang.Object
  - help
```
public static void help()
```
  - main
```
public static void main(java.lang.String[] args)
                 throws java.io.IOException
```
    Prints out all matches of a semgrex pattern on a file of dependencies.
    Usage:
    java edu.stanford.nlp.semgraph.semgrex.SemgrexPattern [args]
    See the help() function for a list of possible arguments to provide.
    
    Throws:
    
    java.io.IOException

Class SemgrexPattern

Nodes

Relations

Boolean relational operators

Naming nodes

Naming relations

Naming edges

TODO

Nested Class Summary

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

env

Method Detail

matcher

matcher

matcher

matcher

matcher

matcher

compile

compile

pattern

setEnv

toString

toString

prettyPrint

prettyPrint

prettyPrint

equals

hashCode

help

main