edu.stanford.nlp.trees
Class BobChrisTreeNormalizer
java.lang.Object
edu.stanford.nlp.trees.TreeNormalizer
edu.stanford.nlp.trees.BobChrisTreeNormalizer
- All Implemented Interfaces:
- Serializable
- Direct Known Subclasses:
- ArabicTreeNormalizer, CTBErrorCorrectingTreeNormalizer, FrenchTreeNormalizer, NPTmpRetainingTreeNormalizer
public class BobChrisTreeNormalizer
- extends TreeNormalizer
Normalizes trees in the way used in Manning and Carpenter 1997.
NB: This implementation is still incomplete!
The normalizations performed are: (i) terminals are interned, (ii)
nonterminals are stripped of alternants, functional tags and
cross-reference codes, and then interned, (iii) empty
elements (ones with nonterminal label "-NONE-") are deleted from the
tree, (iv) the null label at the root node is replaced with the label
"ROOT".
17 Apr 2001: This was fixed to work with different kinds of labels,
by making proper use of the Label interface, after it was moved into
the trees module.
The normalizations of the original (Prolog) BobChrisNormalize were:
1. Remap the root node to be called 'ROOT'
2. Truncate all nonterminal labels before characters introducing
annotations according to TreebankLanguagePack
(traditionally, -, =, | or # (last for BLLIP))
3. Remap the representation of certain leaf symbols (brackets etc.)
4. Map to lowercase all leaf nodes
5. Delete empty/trace nodes (ones marked '-NONE-')
6. Recursively delete any nodes that do not dominate any words
7. Delete A over A nodes where the top A dominates nothing else
8. Remove backslahes from lexical items
(the Treebank inserts them to escape slashes (/) and stars (*)).
4 is deliberately omitted, and a few things are purely aesthetic.
14 June 2002: It now deletes unary A over A if both nodes' labels are equal
(7), and (6) was always part of the Tree.prune() functionality...
30 June 2005: Also splice out an EDITED node, just in case you're parsing
the Brown corpus.
- Author:
- Christopher Manning
- See Also:
- Serialized Form
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
tlp
protected final TreebankLanguagePack tlp
emptyFilter
protected Filter<Tree> emptyFilter
aOverAFilter
protected Filter<Tree> aOverAFilter
BobChrisTreeNormalizer
public BobChrisTreeNormalizer()
BobChrisTreeNormalizer
public BobChrisTreeNormalizer(TreebankLanguagePack tlp)
normalizeTerminal
public String normalizeTerminal(String leaf)
- Normalizes a leaf contents.
This implementation interns the leaf.
- Overrides:
normalizeTerminal
in class TreeNormalizer
- Parameters:
leaf
- The String that decorates the leaf
- Returns:
- The normalized form of this leaf String
normalizeNonterminal
public String normalizeNonterminal(String category)
- Normalizes a nonterminal contents.
This implementation strips functional tags, etc. and interns the
nonterminal.
- Overrides:
normalizeNonterminal
in class TreeNormalizer
- Parameters:
category
- The String that decorates this nonterminal node
- Returns:
- The normalized form of this nonterminal String
cleanUpLabel
protected String cleanUpLabel(String label)
- Remove things like hyphened functional tags and equals from the
end of a node label. This version always just returns the phrase
structure category, or "ROOT" if the label was
null
.
- Parameters:
label
- The label from the treebank
- Returns:
- The cleaned up label (phrase structure category)
normalizeWholeTree
public Tree normalizeWholeTree(Tree tree,
TreeFactory tf)
- Normalize a whole tree -- one can assume that this is the
root. This implementation deletes empty elements (ones with
nonterminal tag label '-NONE-') from the tree, and splices out
unary A over A nodes. It does work for a null tree.
- Overrides:
normalizeWholeTree
in class TreeNormalizer
- Parameters:
tree
- The tree to be normalizedtf
- the TreeFactory to create new nodes (if needed)
- Returns:
- Tree the normalized tree
Stanford NLP Group