edu.stanford.nlp.trees.tregex.tsurgeon
Class Tsurgeon

java.lang.Object
  extended by edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon

public class Tsurgeon
extends Object

Tsurgeon provides a way of editing trees based on a set of operations that are applied to tree locations matching a tregex pattern. A simple example from the command-line:

java edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon -treeFile atree exciseNP renameVerb
The file atree has Penn Treebank (S-expression) format trees. The other (here, two) files have Tsurgeon operations. These consist of a tregex expression on one line, a blank line, and then some number of lines of Tsurgeon operations. Note that (at present) you can only have one pattern and group of operations per file. (This should probably be changed!)

Tsurgeon uses the tregex engine to match tree patterns on trees; for more information on tregex's tree-matching functionality, syntax, and semantics, please see the documentation for the TregexPattern class.

If you want to use Tsurgeon as an API, the relevant method is processPattern(edu.stanford.nlp.trees.tregex.TregexPattern, edu.stanford.nlp.trees.tregex.tsurgeon.TsurgeonPattern, edu.stanford.nlp.trees.Tree). You will also need to look at the TsurgeonPattern class and the parseOperation(java.lang.String) method.

Here is a sample invocation:

 TregexPattern matchPattern = TregexPattern.compile("SQ=sq < (/^WH/ $++ VP)");
 List ps = new ArrayList();

 TsurgeonPattern p = Tsurgeon.parseOperation("relabel sq S");

 ps.add(p);

 Treebank lTrees;
 List result = Tsurgeon.processPatternOnTrees(matchPattern,Tsurgeon.collectOperations(ps),lTrees);
 

Note: If you want to apply multiple surgery patterns, you will not want to call processPatternOnTrees, but rather to call processPatternsOnTree, and to loop through the trees yourself. This is much faster.

For more information on using Tsurgeon from the command line, see the main(java.lang.String[]) method and the package Javadoc.

Author:
Roger Levy

Method Summary
static TsurgeonPattern collectOperations(List<TsurgeonPattern> patterns)
          Collects a list of operation patterns into a sequence of operations to be applied.
static Pair<TregexPattern,TsurgeonPattern> getOperationFromFile(String filename)
          Parses a tsurgeon script file and compiles all operations in the file into one tsurgeon pattern
static Pair<TregexPattern,TsurgeonPattern> getOperationFromReader(BufferedReader reader)
          Parses a tsurgeon script text input and compiles all operations in the file into one tsurgeon pattern.
static String getPatternFromFile(BufferedReader reader)
          Assumes that we arre at the beginning of a tsurgeon script file and gets the string for the tregex pattern leading the file
static TsurgeonPattern getTsurgeonOperationsFromReader(BufferedReader reader)
          Assumes the given reader has only tsurgeon operations (not a tregex pattern), and parses these out, collection them into one operation.
static String getTsurgeonTextFromReader(BufferedReader reader)
          Assumes the given reader has only tsurgeon operations (not a tregex pattern), and returns them as a string buffer, mirroring the way the strings appear in the file - this is helpful for lazy evaluation of the operations, as in a GUI, because you do not parse the operations on load.
static void main(String[] args)
          Arguments:
static TsurgeonPattern parseOperation(String operationString)
          Parses an operation string into a TsurgeonPattern.
static Tree processPattern(TregexPattern matchPattern, TsurgeonPattern p, Tree t)
          Tries to match a pattern against a tree.
static List<Tree> processPatternOnTrees(TregexPattern matchPattern, TsurgeonPattern p, Collection<Tree> inputTrees)
          Applies {#processPattern} to a collection of trees.
static Tree processPatternsOnTree(List<Pair<TregexPattern,TsurgeonPattern>> ops, Tree t)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(String[] args)
                 throws Exception

Arguments:

Each argument should be the name of a transformation file that contains a TregexPattern pattern on the first line, then a blank line, then a list of transformation operations (as specified by Legal operation syntax below) to apply when the pattern is matched. Note the bit about the blank line: currently the code crashes if it isn't present! For example, if you want to excise an SBARQ node whenever it is the parent of an SQ node, and rename the SQ node to S, your transformation file would look like this:
SBARQ=n1 < SQ=n2

excise n1 n1 rename n2 S

Options:

    -treeFile <filename> specify the name of the file that has the trees you want to transform. -po <matchPattern> <operation> Apply a single operation to every tree using the specified match pattern and the specified operation. Use this option when you want to quickly try the effect of one pattern/surgery combination, and are too lazy to write a transformation file. -s Print each output tree on one line (default is pretty-printing). -m For every tree that had a matching pattern, print "before" (prepended as "Operated on:") and "after" (prepended as "Result:"). Unoperated trees just pass through the transducer as usual. -encoding X Uses character set X for input and output of trees.

Legal operation syntax:

  • delete <name> deletes the node and everything below it.
  • prune <name> Like delete, but if, after the pruning, the parent has no children anymore, the parent is pruned too.
  • excise <name1> <name2> The name1 node should either dominate or be the same as the name2 node. This excises out everything from name1 to name2. All the children of name2 go into the parent of name1, where name1 was.
  • relabel <name> <new-label> relabels the node to have the new label.
  • insert <name> <position> inserts the named node into the position specified.
  • move <name> <position> moves the named node into the specified position

    Right now the only ways to specify position are:

    $+ <name> the left sister of the named node
    $- <name> the right sister of the named node
    >i the i_th daughter of the named node
    >-i the i_th daughter, counting from the right, of the named node.

  • replace <name1> <name2> deletes name1 and inserts a copy of name2 in its place.
  • adjoin <auxiliary_tree> <name> Adjoins the specified auxiliary tree into the named node. The daughters of the target node will become the daughters of the foot of the auxiliary tree.

Parameters:
args - a list of names of files each of which contains a single tregex matching pattern plus a list, one per line, of transformation operations to apply to the matched pattern.
Throws:
Exception - If an I/O or patern syntax error

getOperationFromReader

public static Pair<TregexPattern,TsurgeonPattern> getOperationFromReader(BufferedReader reader)
                                                                  throws IOException
Parses a tsurgeon script text input and compiles all operations in the file into one tsurgeon pattern.

Parameters:
reader - File to read patterns from
Returns:
A pair of a tregex and tsurgeon pattern read from a file
Throws:
IOException - If any IO problem

getPatternFromFile

public static String getPatternFromFile(BufferedReader reader)
                                 throws IOException
Assumes that we arre at the beginning of a tsurgeon script file and gets the string for the tregex pattern leading the file

Parameters:
reader -
Returns:
tregex pattern string
Throws:
IOException

getTsurgeonOperationsFromReader

public static TsurgeonPattern getTsurgeonOperationsFromReader(BufferedReader reader)
                                                       throws IOException
Assumes the given reader has only tsurgeon operations (not a tregex pattern), and parses these out, collection them into one operation.

Parameters:
reader -
Returns:
Throws:
IOException

getTsurgeonTextFromReader

public static String getTsurgeonTextFromReader(BufferedReader reader)
                                        throws IOException
Assumes the given reader has only tsurgeon operations (not a tregex pattern), and returns them as a string buffer, mirroring the way the strings appear in the file - this is helpful for lazy evaluation of the operations, as in a GUI, because you do not parse the operations on load. Comments are still excised.

Parameters:
reader -
Returns:
Throws:
IOException

getOperationFromFile

public static Pair<TregexPattern,TsurgeonPattern> getOperationFromFile(String filename)
                                                                throws IOException
Parses a tsurgeon script file and compiles all operations in the file into one tsurgeon pattern

Parameters:
filename - file containing the tsurgeon script
Returns:
A pair of a tregex and tsurgeon pattern read from a file
Throws:
IOException - If there is any I/O problem

processPatternOnTrees

public static List<Tree> processPatternOnTrees(TregexPattern matchPattern,
                                               TsurgeonPattern p,
                                               Collection<Tree> inputTrees)
Applies {#processPattern} to a collection of trees.

Parameters:
matchPattern - A TregexPattern to be matched against a Tree.
p - A TsurgeonPattern to apply.
inputTrees - The input trees to be processed
Returns:
A List of the transformed trees

processPattern

public static Tree processPattern(TregexPattern matchPattern,
                                  TsurgeonPattern p,
                                  Tree t)
Tries to match a pattern against a tree. If it succeeds, apply the surgical operations contained in a TsurgeonPattern.

Parameters:
matchPattern - A TregexPattern to be matched against a Tree.
p - A TsurgeonPattern to apply.
t - the Tree to match against and perform surgery on.
Returns:
t, which has been surgically modified.

processPatternsOnTree

public static Tree processPatternsOnTree(List<Pair<TregexPattern,TsurgeonPattern>> ops,
                                         Tree t)

parseOperation

public static TsurgeonPattern parseOperation(String operationString)
Parses an operation string into a TsurgeonPattern. Throws an IllegalArgumentException if the operation string is ill-formed.

Example of use:

TsurgeonPattern p = Tsurgeon.parseOperation("prune ed");

Parameters:
operationString - The operation to perform, as a text string
Returns:
the operation pattern.

collectOperations

public static TsurgeonPattern collectOperations(List<TsurgeonPattern> patterns)
Collects a list of operation patterns into a sequence of operations to be applied. Required to keep track of global properties across a sequence of operations. For example, if you want to insert a named node and then coindex it with another node, you will need to collect the insertion and coindexation operations into a single TsurgeonPattern so that tsurgeon is aware of the name of the new node and coindexation becomes possible.

Parameters:
patterns - a list of TsurgeonPattern operations that you want to collect together into a single compound operation
Returns:
a new TsurgeonPattern that performs all the operations in the sequence of the patterns argument


Stanford NLP Group