|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon
public class Tsurgeon
Tsurgeon provides a way of editing trees based on a set of operations that are applied to tree locations matching a tregex pattern. A simple example from the command-line:
java edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon -treeFile atree exciseNP renameVerbThe file
atree
has Penn Treebank (S-expression) format trees.
The other (here, two) files have Tsurgeon operations. These consist of
a tregex expression on one
line, a blank line, and then some number of lines of Tsurgeon operations.
Note that (at present) you can only have one pattern and group of operations
per file. (This should probably be changed!)
Tsurgeon uses the tregex engine to match tree patterns on trees;
for more information on tregex's tree-matching functionality,
syntax, and semantics, please see the documentation for the
TregexPattern
class.
If you want to use Tsurgeon as an API, the relevant method is
processPattern(edu.stanford.nlp.trees.tregex.TregexPattern, edu.stanford.nlp.trees.tregex.tsurgeon.TsurgeonPattern, edu.stanford.nlp.trees.Tree)
. You will also need to look at the
TsurgeonPattern
class and the parseOperation(java.lang.String)
method.
Here is a sample invocation:
TregexPattern matchPattern = TregexPattern.compile("SQ=sq < (/^WH/ $++ VP)"); Listps = new ArrayList (); TsurgeonPattern p = Tsurgeon.parseOperation("relabel sq S"); ps.add(p); Treebank lTrees; List result = Tsurgeon.processPatternOnTrees(matchPattern,Tsurgeon.collectOperations(ps),lTrees);
Note: If you want to apply multiple surgery patterns, you will not want to call processPatternOnTrees, but rather to call processPatternsOnTree, and to loop through the trees yourself. This is much faster.
For more information on using Tsurgeon from the command line,
see the main(java.lang.String[])
method and the package Javadoc.
Method Summary | |
---|---|
static TsurgeonPattern |
collectOperations(List<TsurgeonPattern> patterns)
Collects a list of operation patterns into a sequence of operations to be applied. |
static Pair<TregexPattern,TsurgeonPattern> |
getOperationFromFile(String filename)
Parses a tsurgeon script file and compiles all operations in the file into one tsurgeon pattern |
static Pair<TregexPattern,TsurgeonPattern> |
getOperationFromReader(BufferedReader reader)
Parses a tsurgeon script text input and compiles all operations in the file into one tsurgeon pattern. |
static String |
getPatternFromFile(BufferedReader reader)
Assumes that we arre at the beginning of a tsurgeon script file and gets the string for the tregex pattern leading the file |
static TsurgeonPattern |
getTsurgeonOperationsFromReader(BufferedReader reader)
Assumes the given reader has only tsurgeon operations (not a tregex pattern), and parses these out, collection them into one operation. |
static String |
getTsurgeonTextFromReader(BufferedReader reader)
Assumes the given reader has only tsurgeon operations (not a tregex pattern), and returns them as a string buffer, mirroring the way the strings appear in the file - this is helpful for lazy evaluation of the operations, as in a GUI, because you do not parse the operations on load. |
static void |
main(String[] args)
Arguments: |
static TsurgeonPattern |
parseOperation(String operationString)
Parses an operation string into a TsurgeonPattern . |
static Tree |
processPattern(TregexPattern matchPattern,
TsurgeonPattern p,
Tree t)
Tries to match a pattern against a tree. |
static List<Tree> |
processPatternOnTrees(TregexPattern matchPattern,
TsurgeonPattern p,
Collection<Tree> inputTrees)
Applies {#processPattern} to a collection of trees. |
static Tree |
processPatternsOnTree(List<Pair<TregexPattern,TsurgeonPattern>> ops,
Tree t)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static void main(String[] args) throws Exception
TregexPattern
pattern on the first line, then a
blank line, then a list of transformation operations (as specified by Legal operation syntax below) to apply when the pattern is matched.
Note the bit about the blank line: currently the code crashes if it
isn't present!
For example, if you want to excise an SBARQ node whenever it is the parent of an SQ node, and rename the SQ node to S, your transformation file would look like this:
SBARQ=n1 < SQ=n2
excise n1 n1
rename n2 S
-treeFile <filename>
specify the name of the file that has the trees you want to transform.
-po <matchPattern> <operation>
Apply a single operation to every tree using the specified match pattern and the specified operation. Use this option
when you want to quickly try the effect of one pattern/surgery combination, and are too lazy to write a transformation file.
-s
Print each output tree on one line (default is pretty-printing).
-m
For every tree that had a matching pattern, print "before" (prepended as "Operated on:") and "after" (prepended as "Result:"). Unoperated trees just pass through the transducer as usual.
-encoding X
Uses character set X for input and output of trees.
delete <name>
deletes the node and everything below it.
prune <name>
Like delete, but if, after the pruning, the parent has no children anymore, the parent is pruned too.
excise <name1> <name2>
The name1 node should either dominate or be the same as the name2 node. This excises out everything from
name1 to name2. All the children of name2 go into the parent of name1, where name1 was.
relabel <name> <new-label>
relabels the node to have the new label.
insert <name> <position>
inserts the named node into the position specified.
move <name> <position>
moves the named node into the specified position
Right now the only ways to specify position are:
$+ <name>
the left sister of the named node
$- <name>
the right sister of the named node
>i
the i_th daughter of the named node
>-i
the i_th daughter, counting from the right, of the named node.
replace <name1> <name2>
deletes name1 and inserts a copy of name2 in its place.
adjoin <auxiliary_tree> <name>
Adjoins the specified auxiliary tree into the named node. The daughters of the target node will become the daughters of the foot of the auxiliary tree.
args
- a list of names of files each of which contains a single tregex matching pattern plus a list, one per line,
of transformation operations to apply to the matched pattern.
Exception
- If an I/O or patern syntax errorpublic static Pair<TregexPattern,TsurgeonPattern> getOperationFromReader(BufferedReader reader) throws IOException
reader
- File to read patterns from
IOException
- If any IO problempublic static String getPatternFromFile(BufferedReader reader) throws IOException
reader
-
IOException
public static TsurgeonPattern getTsurgeonOperationsFromReader(BufferedReader reader) throws IOException
reader
-
IOException
public static String getTsurgeonTextFromReader(BufferedReader reader) throws IOException
reader
-
IOException
public static Pair<TregexPattern,TsurgeonPattern> getOperationFromFile(String filename) throws IOException
filename
- file containing the tsurgeon script
IOException
- If there is any I/O problempublic static List<Tree> processPatternOnTrees(TregexPattern matchPattern, TsurgeonPattern p, Collection<Tree> inputTrees)
matchPattern
- A TregexPattern
to be matched against a Tree
.p
- A TsurgeonPattern
to apply.inputTrees
- The input trees to be processed
public static Tree processPattern(TregexPattern matchPattern, TsurgeonPattern p, Tree t)
TsurgeonPattern
.
matchPattern
- A TregexPattern
to be matched against a Tree
.p
- A TsurgeonPattern
to apply.t
- the Tree
to match against and perform surgery on.
public static Tree processPatternsOnTree(List<Pair<TregexPattern,TsurgeonPattern>> ops, Tree t)
public static TsurgeonPattern parseOperation(String operationString)
TsurgeonPattern
. Throws an IllegalArgumentException
if
the operation string is ill-formed.
Example of use:
TsurgeonPattern p = Tsurgeon.parseOperation("prune ed");
operationString
- The operation to perform, as a text string
public static TsurgeonPattern collectOperations(List<TsurgeonPattern> patterns)
patterns
- a list of TsurgeonPattern
operations that you want to collect together into a single compound operation
TsurgeonPattern
that performs all the operations in the sequence of the patterns
argument
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |