edu.stanford.nlp.trees.tregex.tsurgeon
Class Tsurgeon

java.lang.Object
  extended by edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon

public class Tsurgeon
extends Object

Tsurgeon provides a way of editing trees based on a set of operations that are applied to tree locations matching a tregex pattern. A simple example from the command-line:

java edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon -treeFile atree exciseNP renameVerb
The file atree has Penn Treebank (S-expression) format trees. The other (here, two) files have Tsurgeon operations. These consist of a list of pairs of a tregex expression on one or more lines, a blank line, and then some number of lines of Tsurgeon operations and then another blank line.

Tsurgeon uses the Tregex engine to match tree patterns on trees; for more information on Tregex's tree-matching functionality, syntax, and semantics, please see the documentation for the TregexPattern class.

If you want to use Tsurgeon as an API, the relevant method is processPattern(edu.stanford.nlp.trees.tregex.TregexPattern, edu.stanford.nlp.trees.tregex.tsurgeon.TsurgeonPattern, edu.stanford.nlp.trees.Tree). You will also need to look at the TsurgeonPattern class and the parseOperation(java.lang.String) method.

Here's the simplest form of invocation on a single Tree:

 Tree t = Tree.valueOf("(ROOT (S (NP (NP (NNP Bank)) (PP (IN of) (NP (NNP America)))) (VP (VBD called)) (. .)))");
 TregexPattern pat = TregexPattern.compile("NP <1 (NP << Bank) <2 PP=remove");
 TsurgeonPattern surgery = Tsurgeon.parseOperation("excise remove remove");
 Tsurgeon.processPattern(pat, surgery, t).pennPrint();
 

Here is another sample invocation:

 TregexPattern matchPattern = TregexPattern.compile("SQ=sq < (/^WH/ $++ VP)");
 List<TsurgeonPattern> ps = new ArrayList<TsurgeonPattern>();

 TsurgeonPattern p = Tsurgeon.parseOperation("relabel sq S");

 ps.add(p);

 Treebank lTrees;
 List<Tree> result = Tsurgeon.processPatternOnTrees(matchPattern,Tsurgeon.collectOperations(ps),lTrees);
 

Note: If you want to apply multiple surgery patterns, you will not want to call processPatternOnTrees, for each individual pattern. Rather, you should either call processPatternsOnTree and loop through the trees yourself, or, as above, collect all the surgery patterns into one TsurgeonPattern, and then to call processPatternOnTrees. Either of these latter methods is much faster.

For more information on using Tsurgeon from the command line, see the main(java.lang.String[]) method and the package Javadoc.

Author:
Roger Levy

Method Summary
static TsurgeonPattern collectOperations(List<TsurgeonPattern> patterns)
          Collects a list of operation patterns into a sequence of operations to be applied.
static Pair<TregexPattern,TsurgeonPattern> getOperationFromReader(BufferedReader reader, TregexPatternCompiler compiler)
          Parses a tsurgeon script text input and compiles a tregex pattern and a list of tsurgeon operations into a pair.
static List<Pair<TregexPattern,TsurgeonPattern>> getOperationsFromFile(String filename, String encoding, TregexPatternCompiler compiler)
          Parses a tsurgeon script file and compiles all operations in the file into a list of pairs of tregex and tsurgeon patterns.
static String getPatternFromFile(BufferedReader reader)
          Assumes that we are at the beginning of a tsurgeon script file and gets the string for the tregex pattern leading the file
static TsurgeonPattern getTsurgeonOperationsFromReader(BufferedReader reader)
          Assumes the given reader has only tsurgeon operations (not a tregex pattern), and parses these out, collecting them into one operation.
static String getTsurgeonTextFromReader(BufferedReader reader)
          Assumes the given reader has only tsurgeon operations (not a tregex pattern), and returns them as a String, mirroring the way the strings appear in the file.
static void main(String[] args)
          Usage: java edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon [-s] -treeFile file-with-trees [-po matching-pattern operation] operation-file-1 operation-file-2 ...
static TsurgeonPattern parseOperation(String operationString)
          Parses an operation string into a TsurgeonPattern.
static Tree processPattern(TregexPattern matchPattern, TsurgeonPattern p, Tree t)
          Tries to match a pattern against a tree.
static List<Tree> processPatternOnTrees(TregexPattern matchPattern, TsurgeonPattern p, Collection<Tree> inputTrees)
          Applies {#processPattern} to a collection of trees.
static Tree processPatternsOnTree(List<Pair<TregexPattern,TsurgeonPattern>> ops, Tree t)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(String[] args)
                 throws Exception
Usage: java edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon [-s] -treeFile file-with-trees [-po matching-pattern operation] operation-file-1 operation-file-2 ... operation-file-n

Arguments:

Each argument should be the name of a transformation file that contains a list of pattern and transformation operation list pairs. That is, it is a sequence of pairs of a TregexPattern pattern on one or more lines, then a blank line (empty or whitespace), then a list of transformation operations one per line (as specified by Legal operation syntax below) to apply when the pattern is matched, and then another blank line (empty or whitespace). Note the need for blank lines: The code crashes if they are not present as separators (although the blank line at the end of the file can be omitted). The script file can include comment lines, either whole comment lines or trailing comments introduced by %, which extend to the end of line. A needed percent mark can be escaped by a preceding backslash.

For example, if you want to excise an SBARQ node whenever it is the parent of an SQ node, and relabel the SQ node to S, your transformation file would look like this:

SBARQ=n1 < SQ=n2

excise n1 n1
relabel n2 S

Options:

Legal operation syntax:

Parameters:
args - a list of names of files each of which contains a single tregex matching pattern plus a list, one per line, of transformation operations to apply to the matched pattern.
Throws:
Exception - If an I/O or pattern syntax error

getOperationFromReader

public static Pair<TregexPattern,TsurgeonPattern> getOperationFromReader(BufferedReader reader,
                                                                         TregexPatternCompiler compiler)
                                                                  throws IOException
Parses a tsurgeon script text input and compiles a tregex pattern and a list of tsurgeon operations into a pair.

Parameters:
reader - Reader to read patterns from
Returns:
A pair of a tregex and tsurgeon pattern read from a file, or null when the operations in the Reader have been exhausted
Throws:
IOException - If any IO problem

getPatternFromFile

public static String getPatternFromFile(BufferedReader reader)
                                 throws IOException
Assumes that we are at the beginning of a tsurgeon script file and gets the string for the tregex pattern leading the file

Returns:
tregex pattern string
Throws:
IOException

getTsurgeonOperationsFromReader

public static TsurgeonPattern getTsurgeonOperationsFromReader(BufferedReader reader)
                                                       throws IOException
Assumes the given reader has only tsurgeon operations (not a tregex pattern), and parses these out, collecting them into one operation. Stops on a whitespace line.

Throws:
IOException

getTsurgeonTextFromReader

public static String getTsurgeonTextFromReader(BufferedReader reader)
                                        throws IOException
Assumes the given reader has only tsurgeon operations (not a tregex pattern), and returns them as a String, mirroring the way the strings appear in the file. This is helpful for lazy evaluation of the operations, as in a GUI, because you do not parse the operations on load. Comments are still excised.

Throws:
IOException

getOperationsFromFile

public static List<Pair<TregexPattern,TsurgeonPattern>> getOperationsFromFile(String filename,
                                                                              String encoding,
                                                                              TregexPatternCompiler compiler)
                                                                       throws IOException
Parses a tsurgeon script file and compiles all operations in the file into a list of pairs of tregex and tsurgeon patterns.

Parameters:
filename - file containing the tsurgeon script
Returns:
A pair of a tregex and tsurgeon pattern read from a file
Throws:
IOException - If there is any I/O problem

processPatternOnTrees

public static List<Tree> processPatternOnTrees(TregexPattern matchPattern,
                                               TsurgeonPattern p,
                                               Collection<Tree> inputTrees)
Applies {#processPattern} to a collection of trees.

Parameters:
matchPattern - A TregexPattern to be matched against a Tree.
p - A TsurgeonPattern to apply.
inputTrees - The input trees to be processed
Returns:
A List of the transformed trees

processPattern

public static Tree processPattern(TregexPattern matchPattern,
                                  TsurgeonPattern p,
                                  Tree t)
Tries to match a pattern against a tree. If it succeeds, apply the surgical operations contained in a TsurgeonPattern.

Parameters:
matchPattern - A TregexPattern to be matched against a Tree.
p - A TsurgeonPattern to apply.
t - the Tree to match against and perform surgery on.
Returns:
t, which has been surgically modified.

processPatternsOnTree

public static Tree processPatternsOnTree(List<Pair<TregexPattern,TsurgeonPattern>> ops,
                                         Tree t)

parseOperation

public static TsurgeonPattern parseOperation(String operationString)
Parses an operation string into a TsurgeonPattern. Throws an TsurgeonParseException if the operation string is ill-formed.

Example of use:

TsurgeonPattern p = Tsurgeon.parseOperation("prune ed");

Parameters:
operationString - The operation to perform, as a text string
Returns:
the operation pattern.

collectOperations

public static TsurgeonPattern collectOperations(List<TsurgeonPattern> patterns)
Collects a list of operation patterns into a sequence of operations to be applied. Required to keep track of global properties across a sequence of operations. For example, if you want to insert a named node and then coindex it with another node, you will need to collect the insertion and coindexation operations into a single TsurgeonPattern so that tsurgeon is aware of the name of the new node and coindexation becomes possible.

Parameters:
patterns - a list of TsurgeonPattern operations that you want to collect together into a single compound operation
Returns:
a new TsurgeonPattern that performs all the operations in the sequence of the patterns argument


Stanford NLP Group