public class Ssurgeon
extends java.lang.Object
<ssurgeon-pattern-list>
<ssurgeon-pattern>
<uid>...</uid>
<notes>...</notes>
<semgrex>...</semgrex>
<language>...</language>
<edit-list>...</edit-list>
</ssurgeon-pattern>
</ssurgeon-pattern-list>
The id
is the id of the Ssurgeon operation. notes
are comments on the Ssurgeon. semgrex
is a Semgrex pattern to use when matching for this operation. edit-list
is the actual Ssurgeon operation to execute. language
is an optional field to determine what
language formalism to use when making new dependencies. By default
it will be English for SD when using the Java API, although most
people probably want UniversalEnglish for UD (including non-English
UD datasets) addEdge -gov node1 -dep node2 -reln depType -weight 0.5
relabelNamedEdge -edge edgename -reln depType
removeEdge -gov node1 -dep node2 reln depType
removeNamedEdge -edge edgename
reattachNamedEdge -edge edgename -gov gov -dep dep
addDep -gov node1 -reln depType -position where ...attributes...
editNode -node node ...attributes...
setRoots n1 (n2 n3 ...)
mergeNodes n1 n2
killAllIncomingEdges -node node
delete -node node
killNonRootedNodes
addEdge
adds a new edge between two existing nodes.
-gov
and -dep
will be nodes matched by the Semgrex pattern.
-reln
is the name of the dependency type to add.
relabelNamedEdge
changes the dependency type of a named edge.
edge
is the name of the edge in the Semgrex pattern.
-reln
is the name of the dependency type to use.
removeEdge
deletes an edge based on its description.
-gov
is the governor to delete, a named node from the Semgrex pattern.
-dep
is the dependent to delete, a named node from the Semgrex pattern.
-reln
is the name of the dependency to delete.
If -gov
or -dep
are left empty, then all (matching) edges to or from the
remaining argument will be deleted.
removeNamedEdge
deletes an edge based on its name.
edge
is the name of the edge in the Semgrex pattern.
reattachNamedEdge
changes an edge's gov and/or dep based on its name.
edge
is the name of the edge in the Semgrex pattern.
-gov
is the governor to attach to, a named node from the Semgrex pattern. If left blank, no edit.
-dep
is the dependent to attach to, a named node from the Semgrex pattern. If left blank, no edit.
At least one of -gov
or -dep
must be set.
addDep
adds a word and a dependency arc to the dependency graph.
-gov
is the governor to attach to, a named node from the Semgrex pattern.
-reln
is the name of the dependency type to use.
-position
is where in the sentence the word should go. -
will be the first word of the sentence,
+
will be the last word of the sentence, and -node
or +node
will be before or after the
named node.
...attributes...
means any attributes which can be set from a string or numerical value
eg -text ...
sets the text of the word
-pos ...
sets the xpos of the word, -cpos ...
sets the upos of the word, etc.
You cannot set the index of a word this way; an exception will be thrown.
To put whitespace in an attribute, you can quote it.
So, for example, a Vietnamese word can be set as -word "xin chào"
editNode
will edit the attributes of a word.
-node
is the node to edit.
...attributes...
are the attributes to change, same as with addDep
combineMWT
will add MWT attributes to a sequence of two or more words.
-node
(repeated) is the nodes to edit.
-word
is the optional text to use for the new MWT. If not set, the words will be concatenated.
setRoots
sets the roots of the sentence to a new root.
n1, n2, ...
are the names of the nodes from the Semgrex to use as the root(s).
This is best done in conjunction with other operations which actually manipulate the structure
of the graph, or the new root will weirdly have dependents and the graph will be incorrect.
mergeNodes
will merge n1 and n2, assuming they are mergeable.
The nodes can be merged if one of the nodes is the head of a phrase
and the other node depends on the head. TODO: can make it process
more than two nodes at once.
killAllIncomingEdges
deletes all edges to a node.
-node
is the node to edit.
Note that this is the same as removeEdge
with only the dependent set.
delete
deletes all nodes reachable from a specific node.
-node
is the node to delete.
You will only want to do this after separating the node from the parts of the graph you want to keep.
killNonRootedNodes
searches the graph and deletes all nodes which have no path to a root.
A practical example comes from the UD_English-Pronouns
dataset, where some words had both nsubj
and csubj
dependencies:
1 Hers hers PRON PRP Gender=Fem|Number=Sing|Person=3|Poss=Yes|PronType=Prs 3 nsubj _ _ 2 is be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 3 cop _ _ 3 easy easy ADJ JJ Degree=Pos 0 root _ _ 4 to to PART TO _ 5 mark _ _ 5 clean clean VERB VB VerbForm=Inf 3 csubj _ SpaceAfter=No 6 . . PUNCT . _ 5 punct _ _
We can update this with the following Semgrex/Ssurgeon pair:
{}=source >nsubj {} >csubj=bad {} relabelNamedEdge -edge bad -reln advcl
The result will be the csubj
updated to advcl
For the most part, each of these operations is already bomb-proof,
eg the pattern will execute once and not repeat on the same part of
the same dependency graph.
However, in the case of addDep
, it is not possible to automatically bomb-proof the command,
as certain sentences may legitimately have multiple words with the same attributes as dependents
of the same governor. In this case, it is necessary to make the Semgrex pattern itself bomb-proof.
As an example, if the intent is to change "Jennifer has lovely antennae" to "Jennifer has lovely blue antennae", the following command would "bomb":
{word:antennae}=antennae
addDep -gov antennae -reln dep -word blue
The following would not:
{word:antennae}=antennae !> {word:blue}
addDep -gov antennae -reln dep -word blue
Modifier and Type | Class and Description |
---|---|
static class |
Ssurgeon.ArgsBox |
static class |
Ssurgeon.RUNTYPE |
protected static class |
Ssurgeon.SsurgeonArgs |
Modifier and Type | Field and Description |
---|---|
protected static Ssurgeon.ArgsBox |
argsBox |
static java.lang.String |
DEP_NODENAME_ARG |
static java.lang.String |
EDGE_NAME_ARG |
static java.lang.String |
GOV_NODENAME_ARG |
static java.lang.String |
NAME_ARG |
static java.lang.String |
NODE_PROTO_ARG |
static java.lang.String |
NODENAME_ARG |
static java.lang.String |
POSITION_ARG |
static java.lang.String |
RELN_ARG |
static java.lang.String |
WEIGHT_ARG |
Modifier and Type | Method and Description |
---|---|
static SsurgPred |
assemblePredFromXML(org.w3c.dom.Element elt)
Constructs a
SsurgPred structure from file, given the root element. |
java.util.Collection<SemanticGraph> |
exhaustFromPatterns(java.util.List<SsurgeonPattern> patternList,
SemanticGraph sg)
Similar to the expandFromPatterns, but performs an exhaustive
search, performing simplifications on the graphs until exhausted.
|
java.util.List<SemanticGraph> |
expandFromPatterns(java.util.List<SsurgeonPattern> patternList,
SemanticGraph sg)
Given a list of SsurgeonPattern edit scripts, and a SemanticGraph
to operate over, returns a list of expansions of that graph, with
the result of each edit applied against a copy of the graph.
|
static java.lang.String |
getEltText(org.w3c.dom.Element element)
For a given Element, treats the first child as a text element
and returns its value.
|
static SsurgeonPattern |
getOperationFromFile(java.lang.String path)
Given a path to a file, converts it into a SsurgeonPattern
TODO: finish implementing this stub.
|
SsurgeonWordlist |
getResource(java.lang.String id)
Returns the given resource with the id.
|
java.util.Collection<SsurgeonWordlist> |
getResources() |
static java.lang.String |
getTagText(org.w3c.dom.Element element,
java.lang.String tag)
For the given element, returns the text for the first child Element with
the given tag.
|
void |
initLog(java.io.File logFilePath) |
static Ssurgeon |
inst() |
static void |
main(java.lang.String[] args)
Performs a simple test and print of a given file.
|
static SsurgeonEdit |
parseEditLine(java.lang.String editLine,
java.util.Map<java.lang.String,java.lang.String> attributeArgs,
Language language)
Given a string entry, converts it into a SsurgeonEdit object.
|
java.util.List<SsurgeonPattern> |
readFromDirectory(java.io.File dir)
Reads all Ssurgeon patterns from file.
|
java.util.List<SsurgeonPattern> |
readFromDocument(org.w3c.dom.Document doc) |
java.util.List<SsurgeonPattern> |
readFromFile(java.io.File file)
Given a path to a file containing a list of SsurgeonPatterns, returns
TODO: deal with resources
|
java.util.List<SsurgeonPattern> |
readFromString(java.lang.String text) |
void |
setLogPrefix(java.lang.String logPrefix) |
static SsurgeonPattern |
ssurgeonPatternFromXML(org.w3c.dom.Element elt)
Given the root Element for a SemgrexPattern (SSURGEON_ELEM_TAG), converts
it into its corresponding SemgrexPattern object.
|
void |
testRead(java.io.File tgtDirPath)
Reads in the test file and prints readable to string (for debugging).
|
static void |
writeToFile(java.io.File tgtFile,
java.util.List<SsurgeonPattern> patterns)
Given a target filepath and a list of Ssurgeon patterns, writes them out as XML forms.
|
static java.lang.String |
writeToString(SsurgeonPattern pattern) |
public static final java.lang.String GOV_NODENAME_ARG
public static final java.lang.String DEP_NODENAME_ARG
public static final java.lang.String EDGE_NAME_ARG
public static final java.lang.String NODENAME_ARG
public static final java.lang.String RELN_ARG
public static final java.lang.String NODE_PROTO_ARG
public static final java.lang.String WEIGHT_ARG
public static final java.lang.String NAME_ARG
public static final java.lang.String POSITION_ARG
protected static Ssurgeon.ArgsBox argsBox
public static Ssurgeon inst()
public void initLog(java.io.File logFilePath) throws java.io.IOException
java.io.IOException
public void setLogPrefix(java.lang.String logPrefix)
public java.util.List<SemanticGraph> expandFromPatterns(java.util.List<SsurgeonPattern> patternList, SemanticGraph sg) throws java.lang.Exception
java.lang.Exception
public java.util.Collection<SemanticGraph> exhaustFromPatterns(java.util.List<SsurgeonPattern> patternList, SemanticGraph sg) throws java.lang.Exception
java.lang.Exception
public static SsurgeonPattern getOperationFromFile(java.lang.String path)
public SsurgeonWordlist getResource(java.lang.String id)
public java.util.Collection<SsurgeonWordlist> getResources()
public static SsurgeonEdit parseEditLine(java.lang.String editLine, java.util.Map<java.lang.String,java.lang.String> attributeArgs, Language language)
public static void writeToFile(java.io.File tgtFile, java.util.List<SsurgeonPattern> patterns)
public static java.lang.String writeToString(SsurgeonPattern pattern)
public java.util.List<SsurgeonPattern> readFromString(java.lang.String text)
public java.util.List<SsurgeonPattern> readFromFile(java.io.File file)
public java.util.List<SsurgeonPattern> readFromDocument(org.w3c.dom.Document doc)
public java.util.List<SsurgeonPattern> readFromDirectory(java.io.File dir) throws java.lang.Exception
java.lang.Exception
public static SsurgeonPattern ssurgeonPatternFromXML(org.w3c.dom.Element elt)
java.lang.Exception
public static SsurgPred assemblePredFromXML(org.w3c.dom.Element elt)
SsurgPred
structure from file, given the root element.java.lang.Exception
public void testRead(java.io.File tgtDirPath) throws java.lang.Exception
java.lang.Exception
public static java.lang.String getTagText(org.w3c.dom.Element element, java.lang.String tag)
public static java.lang.String getEltText(org.w3c.dom.Element element)
public static void main(java.lang.String[] args)