public class Tsurgeon
extends java.lang.Object
java edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon -treeFile aTree exciseNP renameVerbThe file
aTree
has Penn Treebank (S-expression) format trees.
The other (here, two) files have Tsurgeon operations. These consist of
a list of pairs of a tregex expression on one or more
lines, a blank line, and then some number of lines of Tsurgeon operations and then
another blank line.
Tsurgeon uses the Tregex engine to match tree patterns on trees;
for more information on Tregex's tree-matching functionality,
syntax, and semantics, please see the documentation for the
TregexPattern
class.
If you want to use Tsurgeon as an API, the relevant method is
processPattern(edu.stanford.nlp.trees.tregex.TregexPattern, edu.stanford.nlp.trees.tregex.tsurgeon.TsurgeonPattern, edu.stanford.nlp.trees.Tree)
. You will also need to look at the
TsurgeonPattern
class and the parseOperation(java.lang.String)
method.
Here's the simplest form of invocation on a single Tree:
Tree t = Tree.valueOf("(ROOT (S (NP (NP (NNP Bank)) (PP (IN of) (NP (NNP America)))) (VP (VBD called)) (. .)))"); TregexPattern pat = TregexPattern.compile("NP <1 (NP << Bank) <2 PP=remove"); TsurgeonPattern surgery = Tsurgeon.parseOperation("excise remove remove"); Tsurgeon.processPattern(pat, surgery, t).pennPrint();
Here is another sample invocation:
TregexPattern matchPattern = TregexPattern.compile("SQ=sq < (/^WH/ $++ VP)"); List<TsurgeonPattern> ps = new ArrayList<TsurgeonPattern>(); TsurgeonPattern p = Tsurgeon.parseOperation("relabel sq S"); ps.add(p); Treebank lTrees; List<Tree> result = Tsurgeon.processPatternOnTrees(matchPattern,Tsurgeon.collectOperations(ps),lTrees);
Note: If you want to apply multiple surgery patterns, you
will not want to call processPatternOnTrees, for each individual
pattern. Rather, you should either call processPatternsOnTree and
loop through the trees yourself, or, as above, use
collectOperations
to collect all the surgery patterns
into one TsurgeonPattern, and then to call processPatternOnTrees.
Either of these latter methods is much faster.
The parser also has the ability to collect multiple
TsurgeonPatterns into one pattern by itself by enclosing each
pattern in [ ... ]
. For example,
Tsurgeon.parseOperation("[relabel foo BAR] [prune bar]")
For more information on using Tsurgeon from the command line,
see the main(java.lang.String[])
method and the package Javadoc.
Modifier and Type | Method and Description |
---|---|
static TsurgeonPattern |
collectOperations(java.util.List<TsurgeonPattern> patterns)
Collects a list of operation patterns into a sequence of operations to be applied.
|
static Pair<TregexPattern,TsurgeonPattern> |
getOperationFromReader(java.io.BufferedReader reader,
TregexPatternCompiler compiler)
Parses a tsurgeon script text input and compiles a tregex pattern and a list
of tsurgeon operations into a pair.
|
static java.util.List<Pair<TregexPattern,TsurgeonPattern>> |
getOperationsFromFile(java.lang.String filename,
java.lang.String encoding,
TregexPatternCompiler compiler)
Parses a tsurgeon script file and compiles all operations in the file into a list
of pairs of tregex and tsurgeon patterns.
|
static java.util.List<Pair<TregexPattern,TsurgeonPattern>> |
getOperationsFromReader(java.io.BufferedReader reader,
TregexPatternCompiler compiler)
Parses and compiles all operations from a BufferedReader into a list
of pairs of tregex and tsurgeon patterns.
|
static java.lang.String |
getTregexPatternFromReader(java.io.BufferedReader reader)
Assumes that we are at the beginning of a tsurgeon script file and gets the string for the
tregex pattern leading the file.
|
static TsurgeonPattern |
getTsurgeonOperationsFromReader(java.io.BufferedReader reader)
Assumes the given reader has only tsurgeon operations (not a tregex pattern), and parses
these out, collecting them into one operation.
|
static java.lang.String |
getTsurgeonTextFromReader(java.io.BufferedReader reader)
Assumes the given reader has only tsurgeon operations (not a tregex pattern), and returns
them as a String, mirroring the way the strings appear in the file.
|
static void |
main(java.lang.String[] args)
Usage: java edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon [-s] -treeFile file-with-trees [-po matching-pattern operation] operation-file-1 operation-file-2 ...
|
static TsurgeonPattern |
parseOperation(java.lang.String operationString)
Parses an operation string into a
TsurgeonPattern . |
static Tree |
processPattern(TregexPattern matchPattern,
TsurgeonPattern p,
Tree t)
Tries to match a pattern against a tree.
|
static java.util.List<Tree> |
processPatternOnTrees(TregexPattern matchPattern,
TsurgeonPattern p,
java.util.Collection<Tree> inputTrees)
Applies {#processPattern} to a collection of trees.
|
static Tree |
processPatternsOnTree(java.util.List<Pair<TregexPattern,TsurgeonPattern>> ops,
Tree t) |
public static void main(java.lang.String[] args) throws java.lang.Exception
TregexPattern
pattern on one or more lines, then a
blank line (empty or whitespace), then a list of transformation operations one per line
(as specified by Legal operation syntax below) to apply when the pattern is matched,
and then another blank line (empty or whitespace).
Note the need for blank lines: The code crashes if they are not present as separators
(although the blank line at the end of the file can be omitted).
The script file can include comment lines, either whole comment lines or
trailing comments introduced by %, which extend to the end of line. A needed percent
mark can be escaped by a preceding backslash.
For example, if you want to excise an SBARQ node whenever it is the parent of an SQ node, and relabel the SQ node to S, your transformation file would look like this:
SBARQ=n1 < SQ=n2
excise n1 n1
relabel n2 S
-treeFile <filename>
specify the name of the file that has the trees you want to transform.
-po <matchPattern> <operation>
Apply a single operation to every tree using the specified match pattern and the specified operation. Use this option
when you want to quickly try the effect of one pattern/surgery combination, and are too lazy to write a transformation file.
-s
Print each output tree on one line (default is pretty-printing).
-m
For every tree that had a matching pattern, print "before" (prepended as "Operated on:") and "after" (prepended as "Result:"). Unoperated on trees just pass through the transducer as usual.
-encoding X
Uses character set X for input and output of trees.
-macros <filename>
A file of macros to use on the tregex pattern. Macros should be one per line, with original and replacement separated by tabs.
-hf <headFinder-class-name>
use the specified HeadFinder
class to determine headship relations.
-hfArg <string>
pass a string argument in to the HeadFinder
class's constructor. -hfArg
can be used multiple times to pass in multiple arguments.
-trf <TreeReaderFactory-class-name>
use the specified TreeReaderFactory
class to read trees from files.
delete <name>
deletes the node and everything below it.
prune <name>
Like delete, but if, after the pruning, the parent has no children anymore, the parent is pruned too. Pruning continues to affect all ancestors until one is found with remaining children. This may result in a null tree.
excise <name1> <name2>
The name1 node should either dominate or be the same as the name2 node. This excises out everything from
name1 to name2. All the children of name2 go into the parent of name1, where name1 was.
relabel <name> <new-label>
Relabels the node to have the new label. relabel nodeX VP
- for changing a node label to an
alphanumeric string relabel nodeX /<new-label>/
- for relabeling a node to
something that isn't a valid identifier without quoting relabel nodeX /{/
works but you need to do
relabel nodeX /\\]/
in order to get a single close bracket.
relabel nodeX /^VB(.*)$/verb\\/$1/
- for regular
expression based relabeling. In this case, all matches of the
regular expression against the node label are replaced with the
replacement String. This has the semantics of Java/Perl's
replaceAll: you may use capturing groups and put them in
replacements with $n. For example, if the pattern is /foo/bar/
and the node matched is "foo", the replaceAll semantics result in
"barbar". If the pattern is /^foo(.*)$/bar$1/ and node matched is
"foofoo", relabel will result in "barfoo". insert <name> <position>
or insert <tree> <position>
inserts the named node or tree into the position specified.
move <name> <position>
moves the named node into the specified position.
Right now the only ways to specify position are:
$+ <name>
the left sister of the named node
$- <name>
the right sister of the named node
>i <name>
the i_th daughter of the named node
>-i <name>
the i_th daughter, counting from the right, of the named node.
moveprune <name> <position>
moves the named node into
the specified position, then prunes the original position if it
became a node with no children.
replace <name1> <name2>
deletes name1 and inserts a copy of name2 in its place.
replace <name> <tree> <tree2>...
deletes name and inserts the new tree(s) in its place. If
more than one replacement tree is given, each of the new
subtrees will be added in order where the old tree was.
Multiple subtrees at the root is an illegal operation and
will throw an exception.
createSubtree <auxiliary-tree-or-label> <name1> [<name2>]
Create a subtree out of all the nodes from <name1>
through
<name2>
. The subtree is moved to the foot of the given
auxiliary tree, and the tree is inserted where the nodes of
the subtree used to reside. If a simple label is provided as
the first argument, the subtree is given a single parent with
a name corresponding to the label. To limit the operation to
just one node, elide <name2>
.
adjoin <auxiliary_tree> <name>
Adjoins the specified auxiliary tree into the named node.
The daughters of the target node will become the daughters of the foot of the auxiliary tree.
adjoinH <auxiliary_tree> <name>
Similar to adjoin, but preserves the target node
and makes it the root of <tree>
. (It is still accessible as name
. The root of the
auxiliary tree is ignored.)
adjoinF <auxiliary_tree> <name>
Similar to adjoin,
but preserves the target node and makes it the foot of <tree>
.
(It is still accessible as name
, and retains its status as parent of its children.
The root of the auxiliary tree is ignored.)
coindex <name1> <name2> ... <nameM>
Puts a (Penn Treebank style)
coindexation suffix of the form "-N" on each of nodes name_1 through name_m. The value of N will be
automatically generated in reference to the existing coindexations in the tree, so that there is never
an accidental clash of indices across things that are not meant to be coindexed.
In the context of adjoin
, adjoinH
,
adjoinF
, and createSubtree
, an auxiliary
tree is a tree in Penn Treebank format with @
on
exactly one of the leaves denoting the foot of the tree.
The operations which use the foot use the labeled node.
For example:
Tsurgeon:adjoin (FOO (BAR@)) foo
Tregex:B=foo
Input:(A (B 1 2))
Output:(A (FOO (BAR 1 2)))
Tsurgeon applies the same operation to the same tree for as long as the given tregex operation matches. This means that infinite loops are very easy to cause. One common situation where this comes up is with an insert operation will repeats infinitely many times unless you add an expression to the tregex that matches against the inserted pattern. For example, this pattern will infinite loop:
TregexPattern tregex = TregexPattern.compile("S=node << NP");
TsurgeonPattern tsurgeon = Tsurgeon.parseOperation("insert (NP foo) >-1 node");
This pattern, though, will terminate:
TregexPattern tregex = TregexPattern.compile("S=node << NP !<< foo");
TsurgeonPattern tsurgeon = Tsurgeon.parseOperation("insert (NP foo) >-1 node");
Tsurgeon has (very) limited support for conditional statements.
If a pattern is prefaced with
if exists <name>
,
the rest of the pattern will only execute if
the named node was found in the corresponding TregexMatcher.
args
- a list of names of files each of which contains a single tregex matching pattern plus a list, one per line,
of transformation operations to apply to the matched pattern.java.lang.Exception
- If an I/O or pattern syntax errorpublic static Pair<TregexPattern,TsurgeonPattern> getOperationFromReader(java.io.BufferedReader reader, TregexPatternCompiler compiler) throws java.io.IOException
reader
- Reader to read patterns fromnull
when the operations present in the Reader have been exhaustedjava.io.IOException
- If any IO problempublic static java.lang.String getTregexPatternFromReader(java.io.BufferedReader reader) throws java.io.IOException
java.io.IOException
- If the usual kinds of IO errors occurpublic static TsurgeonPattern getTsurgeonOperationsFromReader(java.io.BufferedReader reader) throws java.io.IOException
java.io.IOException
- If the usual kinds of IO errors occurpublic static java.lang.String getTsurgeonTextFromReader(java.io.BufferedReader reader) throws java.io.IOException
java.io.IOException
public static java.util.List<Pair<TregexPattern,TsurgeonPattern>> getOperationsFromFile(java.lang.String filename, java.lang.String encoding, TregexPatternCompiler compiler) throws java.io.IOException
filename
- A file, classpath resource or URL (perhaps gzipped) containing the tsurgeon scriptjava.io.IOException
- If there is any I/O problempublic static java.util.List<Pair<TregexPattern,TsurgeonPattern>> getOperationsFromReader(java.io.BufferedReader reader, TregexPatternCompiler compiler) throws java.io.IOException
reader
- A BufferedReader to read the operationsjava.io.IOException
- If there is any I/O problempublic static java.util.List<Tree> processPatternOnTrees(TregexPattern matchPattern, TsurgeonPattern p, java.util.Collection<Tree> inputTrees)
matchPattern
- A TregexPattern
to be matched against a Tree
.p
- A TsurgeonPattern
to apply.inputTrees
- The input trees to be processedpublic static Tree processPattern(TregexPattern matchPattern, TsurgeonPattern p, Tree t)
TsurgeonPattern
.matchPattern
- A TregexPattern
to be matched against a Tree
.p
- A TsurgeonPattern
to apply.t
- the Tree
to match against and perform surgery on.public static Tree processPatternsOnTree(java.util.List<Pair<TregexPattern,TsurgeonPattern>> ops, Tree t)
public static TsurgeonPattern parseOperation(java.lang.String operationString)
TsurgeonPattern
. Throws an TsurgeonParseException
if
the operation string is ill-formed.
Example of use:
TsurgeonPattern p = Tsurgeon.parseOperation("prune ed");
operationString
- The operation to perform, as a text stringpublic static TsurgeonPattern collectOperations(java.util.List<TsurgeonPattern> patterns)
patterns
- a list of TsurgeonPattern
operations that you want to collect together into a single compound operationTsurgeonPattern
that performs all the operations in the sequence of the patterns
argument