|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--edu.stanford.nlp.trees.TreeNormalizer | +--edu.stanford.nlp.trees.BobChrisTreeNormalizer
Normalizes trees in the way used in Manning and Carpenter 1997.
NB: This implementation is still incomplete!
The normalizations performed are: (i) terminals are interned, (ii)
nonterminals are stripped of alternants, functional tags and
cross-reference codes, and then interned, (iii) empty
elements (ones with nonterminal label "-NONE-") are deleted from the
tree, (iv) the null label at the root node is replaced with the label
"ROOT".
17 Apr 2001: This was fixed to work with different kinds of labels,
by making proper use of the Label interface, after it was moved into
the trees module.
The normalizations of the original (Prolog) BobChrisNormalize were: 1. Remap the root node to be called 'ROOT' 2. Truncate all nonterminal labels before characters introducing annotations according to TreebankLanguagePack (traditionally, -, =, | or # (last for BLLIP)) 3. Remap the representation of certain leaf symbols (brackets etc.) 4. Map to lowercase all leaf nodes 5. Delete empty/trace nodes (ones marked '-NONE-') 6. Recursively delete any nodes that do not dominate any words 7. Delete A over A nodes where the top A dominates nothing else 8. Remove backquotes from lexical items (the Treebank inserts them to escape slashes (/) and stars (*) 4 is deliberately omitted, and a few things are purely aesthetic.
14 June 2002: It now deletes unary A over A if both nodes labels are equal (7), and (6) was always part of the Tree.prune() functionality...
Field Summary | |
protected TreebankLanguagePack |
tlp
|
Constructor Summary | |
BobChrisTreeNormalizer()
|
|
BobChrisTreeNormalizer(TreebankLanguagePack tlp)
|
Method Summary | |
protected String |
cleanUpLabel(String label)
Remove things like hyphened functional tags and equals from the end of a node label. |
String |
normalizeNonterminal(String category)
Normalizes a nonterminal contents. |
String |
normalizeTerminal(String leaf)
Normalizes a leaf contents. |
Tree |
normalizeWholeTree(Tree tree,
TreeFactory tf)
Normalize a whole tree -- one can assume that this is the root. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected final TreebankLanguagePack tlp
Constructor Detail |
public BobChrisTreeNormalizer()
public BobChrisTreeNormalizer(TreebankLanguagePack tlp)
Method Detail |
public String normalizeTerminal(String leaf)
normalizeTerminal
in class TreeNormalizer
leaf
- The String that decorates the leaf
public String normalizeNonterminal(String category)
normalizeNonterminal
in class TreeNormalizer
category
- The String that decorates this nonterminal node
public Tree normalizeWholeTree(Tree tree, TreeFactory tf)
normalizeWholeTree
in class TreeNormalizer
tree
- The tree to be normalizedtf
- the TreeFactory to create new nodes (if needed)
protected String cleanUpLabel(String label)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |