public class BobChrisTreeNormalizer extends TreeNormalizer implements TreeTransformer
The normalizations of the original (Prolog) BobChrisNormalize were: 1. Remap the root node to be called 'ROOT' 2. Truncate all nonterminal labels before characters introducing annotations according to TreebankLanguagePack (traditionally, -, =, | or # (last for BLLIP)) 3. Remap the representation of certain leaf symbols (brackets etc.) 4. Map to lowercase all leaf nodes 5. Delete empty/trace nodes (ones marked '-NONE-') 6. Recursively delete any nodes that do not dominate any words 7. Delete A over A nodes where the top A dominates nothing else 8. Remove backslashes from lexical items (the Treebank inserts them to escape slashes (/) and stars (*)). 4 is deliberately omitted, and a few things are purely aesthetic.
14 June 2002: It now deletes unary A over A if both nodes' labels are equal (7), and (6) was always part of the Tree.prune() functionality... 30 June 2005: Also splice out an EDITED node, just in case you're parsing the Brown corpus.
Modifier and Type | Class and Description |
---|---|
static class |
BobChrisTreeNormalizer.AOverAFilter |
static class |
BobChrisTreeNormalizer.EmptyFilter |
Modifier and Type | Field and Description |
---|---|
protected java.util.function.Predicate<Tree> |
aOverAFilter |
protected java.util.function.Predicate<Tree> |
emptyFilter |
protected TreebankLanguagePack |
tlp |
Constructor and Description |
---|
BobChrisTreeNormalizer() |
BobChrisTreeNormalizer(TreebankLanguagePack tlp) |
Modifier and Type | Method and Description |
---|---|
protected java.lang.String |
cleanUpLabel(java.lang.String label)
Remove things like hyphened functional tags and equals from the
end of a node label.
|
java.lang.String |
normalizeNonterminal(java.lang.String category)
Normalizes a nonterminal contents.
|
java.lang.String |
normalizeTerminal(java.lang.String leaf)
Normalizes a leaf contents.
|
Tree |
normalizeWholeTree(Tree tree,
TreeFactory tf)
Normalize a whole tree -- one can assume that this is the
root.
|
Tree |
transformTree(Tree tree)
Does whatever one needs to do to a particular tree.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
apply
protected final TreebankLanguagePack tlp
protected java.util.function.Predicate<Tree> emptyFilter
protected java.util.function.Predicate<Tree> aOverAFilter
public BobChrisTreeNormalizer()
public BobChrisTreeNormalizer(TreebankLanguagePack tlp)
public java.lang.String normalizeTerminal(java.lang.String leaf)
normalizeTerminal
in class TreeNormalizer
leaf
- The String that decorates the leafpublic java.lang.String normalizeNonterminal(java.lang.String category)
normalizeNonterminal
in class TreeNormalizer
category
- The String that decorates this nonterminal nodeprotected java.lang.String cleanUpLabel(java.lang.String label)
null
.label
- The label from the treebankpublic Tree normalizeWholeTree(Tree tree, TreeFactory tf)
normalizeWholeTree
in class TreeNormalizer
tree
- The tree to be normalizedtf
- the TreeFactory to create new nodes (if needed)public Tree transformTree(Tree tree)
TreeTransformer
Tree
, and could itself
work recursively, but the canonical usage is to invoke this method
via the Tree.transform()
method, which will apply the
transformer in a bottom-up manner to each local Tree
,
and hence the implementation of TreeTransformer
should
merely examine and change a local (one-level) Tree
.transformTree
in interface TreeTransformer
tree
- A tree. Classes implementing this interface can assume
that the tree passed in is not null
.Tree