|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon
public class Tsurgeon
Tsurgeon provides a way of editing trees based on a set of operations that are applied to tree locations matching a tregex pattern. A simple example from the command-line:
java edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon -treeFile atree exciseNP renameVerbThe file
atree
has Penn Treebank (S-expression) format trees.
The other (here, two) files have Tsurgeon operations. These consist of
a list of pairs of a tregex expression on one or more
lines, a blank line, and then some number of lines of Tsurgeon operations and then
another blank line.
Tsurgeon uses the Tregex engine to match tree patterns on trees;
for more information on Tregex's tree-matching functionality,
syntax, and semantics, please see the documentation for the
TregexPattern
class.
If you want to use Tsurgeon as an API, the relevant method is
processPattern(edu.stanford.nlp.trees.tregex.TregexPattern, edu.stanford.nlp.trees.tregex.tsurgeon.TsurgeonPattern, edu.stanford.nlp.trees.Tree)
. You will also need to look at the
TsurgeonPattern
class and the parseOperation(java.lang.String)
method.
Here's the simplest form of invocation on a single Tree:
Tree t = Tree.valueOf("(ROOT (S (NP (NP (NNP Bank)) (PP (IN of) (NP (NNP America)))) (VP (VBD called)) (. .)))"); TregexPattern pat = TregexPattern.compile("NP <1 (NP << Bank) <2 PP=remove"); TsurgeonPattern surgery = Tsurgeon.parseOperation("excise remove remove"); Tsurgeon.processPattern(pat, surgery, t).pennPrint();
Here is another sample invocation:
TregexPattern matchPattern = TregexPattern.compile("SQ=sq < (/^WH/ $++ VP)"); List<TsurgeonPattern> ps = new ArrayList<TsurgeonPattern>(); TsurgeonPattern p = Tsurgeon.parseOperation("relabel sq S"); ps.add(p); Treebank lTrees; List<Tree> result = Tsurgeon.processPatternOnTrees(matchPattern,Tsurgeon.collectOperations(ps),lTrees);
Note: If you want to apply multiple surgery patterns, you will not want to call processPatternOnTrees, for each individual pattern. Rather, you should either call processPatternsOnTree and loop through the trees yourself, or, as above, collect all the surgery patterns into one TsurgeonPattern, and then to call processPatternOnTrees. Either of these latter methods is much faster.
For more information on using Tsurgeon from the command line,
see the main(java.lang.String[])
method and the package Javadoc.
Method Summary | |
---|---|
static TsurgeonPattern |
collectOperations(List<TsurgeonPattern> patterns)
Collects a list of operation patterns into a sequence of operations to be applied. |
static Pair<TregexPattern,TsurgeonPattern> |
getOperationFromReader(BufferedReader reader)
Parses a tsurgeon script text input and compiles a tregex pattern and a list of tsurgeon operations into a pair. |
static List<Pair<TregexPattern,TsurgeonPattern>> |
getOperationsFromFile(String filename,
String encoding)
Parses a tsurgeon script file and compiles all operations in the file into a list of pairs of tregex and tsurgeon patterns. |
static String |
getPatternFromFile(BufferedReader reader)
Assumes that we are at the beginning of a tsurgeon script file and gets the string for the tregex pattern leading the file |
static TsurgeonPattern |
getTsurgeonOperationsFromReader(BufferedReader reader)
Assumes the given reader has only tsurgeon operations (not a tregex pattern), and parses these out, collecting them into one operation. |
static String |
getTsurgeonTextFromReader(BufferedReader reader)
Assumes the given reader has only tsurgeon operations (not a tregex pattern), and returns them as a String, mirroring the way the strings appear in the file. |
static void |
main(String[] args)
Usage: java edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon [-s] -treeFile file-with-trees [-po matching-pattern operation] operation-file-1 operation-file-2 ... |
static TsurgeonPattern |
parseOperation(String operationString)
Parses an operation string into a TsurgeonPattern . |
static Tree |
processPattern(TregexPattern matchPattern,
TsurgeonPattern p,
Tree t)
Tries to match a pattern against a tree. |
static List<Tree> |
processPatternOnTrees(TregexPattern matchPattern,
TsurgeonPattern p,
Collection<Tree> inputTrees)
Applies {#processPattern} to a collection of trees. |
static Tree |
processPatternsOnTree(List<Pair<TregexPattern,TsurgeonPattern>> ops,
Tree t)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static void main(String[] args) throws Exception
TregexPattern
pattern on one or more lines, then a
blank line (empty or whitespace), then a list of transformation operations one per line
(as specified by Legal operation syntax below) to apply when the pattern is matched,
and then another blank line (empty or whitespace).
Note the need for blank lines: The code crashes if they are not present as separators
(although the blank line at the end of the file can be omitted).
The script file can include comment lines, either whole comment lines or
trailing comments introduced by %, which extend to the end of line. A needed percent
mark can be escaped by a preceding backslash.
For example, if you want to excise an SBARQ node whenever it is the parent of an SQ node, and relabel the SQ node to S, your transformation file would look like this:
SBARQ=n1 < SQ=n2
excise n1 n1
relabel n2 S
-treeFile <filename>
specify the name of the file that has the trees you want to transform.
-po <matchPattern> <operation>
Apply a single operation to every tree using the specified match pattern and the specified operation. Use this option
when you want to quickly try the effect of one pattern/surgery combination, and are too lazy to write a transformation file.
-s
Print each output tree on one line (default is pretty-printing).
-m
For every tree that had a matching pattern, print "before" (prepended as "Operated on:") and "after" (prepended as "Result:"). Unoperated trees just pass through the transducer as usual.
-encoding X
Uses character set X for input and output of trees.
delete <name>
deletes the node and everything below it.
prune <name>
Like delete, but if, after the pruning, the parent has no children anymore, the parent is pruned too.
excise <name1> <name2>
The name1 node should either dominate or be the same as the name2 node. This excises out everything from
name1 to name2. All the children of name2 go into the parent of name1, where name1 was.
relabel <name> <new-label>
Relabels the node to have the new label.
" There are three possible forms: relabel nodeX VP
- for changing a node label to an alphanumeric string,
" relabel nodeX /''/
- for relabeling a node to something that isn't a valid identifier without quoting, and
" relabel nodeX /^VB(.*)$/verb\\/$1/
- for regular expression based relabeling. In the last case, all matches +
" of the regular expression against the node label are replaced with the replacement String. This has the semantics of
" Java/Perl's replaceAll: you may use capturing groups and put them in replacements with $n. Also, as in the example
" you can escape a slash in the middle of the second and third forms with \\/ and \\\\.
insert <name> <position>
or insert <tree> <position>
inserts the named node or tree into the position specified.
move <name> <position>
moves the named node into the specified position
Right now the only ways to specify position are:
$+ <name>
the left sister of the named node
$- <name>
the right sister of the named node
>i
the i_th daughter of the named node
>-i
the i_th daughter, counting from the right, of the named node.
replace <name1> <name2>
or replace <name1> <tree>
deletes name1 and inserts tree or a copy of name2 in its place.
adjoin <auxiliary_tree> <name>
Adjoins the specified auxiliary tree into the named node.
The daughters of the target node will become the daughters of the foot of the auxiliary tree.
adjoinH <auxiliary_tree> <name>
Similar to adjoin, but preserves the target node
and makes it the root of <tree>. (It is still accessible as name
. The root of the
auxiliary tree is ignored.)
adjoinF <auxiliary_tree> <name>
Similar to adjoin,
but preserves the target node and makes it the foot of <tree>.
(It is still accessible as name
, and retains its status as parent of its children.
The root of the auxiliary tree is ignored.)
coindex <name1> <name2> ... <nameM>
Puts a (Penn Treebank style)
coindexation suffix of the form "-N" on each of nodes name_1 through name_m. The value of N will be
automatically generated in reference to the existing coindexations in the tree, so that there is never
an accidental clash of indices across things that are not meant to be coindexed.
args
- a list of names of files each of which contains a single tregex matching pattern plus a list, one per line,
of transformation operations to apply to the matched pattern.
Exception
- If an I/O or pattern syntax errorpublic static Pair<TregexPattern,TsurgeonPattern> getOperationFromReader(BufferedReader reader) throws IOException
reader
- Reader to read patterns from
null
when the operations in the Reader have been exhausted
IOException
- If any IO problempublic static String getPatternFromFile(BufferedReader reader) throws IOException
IOException
public static TsurgeonPattern getTsurgeonOperationsFromReader(BufferedReader reader) throws IOException
IOException
public static String getTsurgeonTextFromReader(BufferedReader reader) throws IOException
IOException
public static List<Pair<TregexPattern,TsurgeonPattern>> getOperationsFromFile(String filename, String encoding) throws IOException
filename
- file containing the tsurgeon script
IOException
- If there is any I/O problempublic static List<Tree> processPatternOnTrees(TregexPattern matchPattern, TsurgeonPattern p, Collection<Tree> inputTrees)
matchPattern
- A TregexPattern
to be matched against a Tree
.p
- A TsurgeonPattern
to apply.inputTrees
- The input trees to be processed
public static Tree processPattern(TregexPattern matchPattern, TsurgeonPattern p, Tree t)
TsurgeonPattern
.
matchPattern
- A TregexPattern
to be matched against a Tree
.p
- A TsurgeonPattern
to apply.t
- the Tree
to match against and perform surgery on.
public static Tree processPatternsOnTree(List<Pair<TregexPattern,TsurgeonPattern>> ops, Tree t)
public static TsurgeonPattern parseOperation(String operationString)
TsurgeonPattern
. Throws an IllegalArgumentException
if
the operation string is ill-formed.
Example of use:
TsurgeonPattern p = Tsurgeon.parseOperation("prune ed");
operationString
- The operation to perform, as a text string
public static TsurgeonPattern collectOperations(List<TsurgeonPattern> patterns)
patterns
- a list of TsurgeonPattern
operations that you want to collect together into a single compound operation
TsurgeonPattern
that performs all the operations in the sequence of the patterns
argument
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |