edu.stanford.nlp.trees
Class BobChrisTreeNormalizer

java.lang.Object
  extended by edu.stanford.nlp.trees.TreeNormalizer
      extended by edu.stanford.nlp.trees.BobChrisTreeNormalizer
All Implemented Interfaces:
TreeTransformer, Serializable
Direct Known Subclasses:
ArabicTreeNormalizer, CTBErrorCorrectingTreeNormalizer, FrenchTreeNormalizer, NPTmpRetainingTreeNormalizer

public class BobChrisTreeNormalizer
extends TreeNormalizer
implements TreeTransformer

Normalizes trees in the way used in Manning and Carpenter 1997. NB: This implementation is still incomplete! The normalizations performed are: (i) terminals are interned, (ii) nonterminals are stripped of alternants, functional tags and cross-reference codes, and then interned, (iii) empty elements (ones with nonterminal label "-NONE-") are deleted from the tree, (iv) the null label at the root node is replaced with the label "ROOT".
17 Apr 2001: This was fixed to work with different kinds of labels, by making proper use of the Label interface, after it was moved into the trees module.

The normalizations of the original (Prolog) BobChrisNormalize were: 1. Remap the root node to be called 'ROOT' 2. Truncate all nonterminal labels before characters introducing annotations according to TreebankLanguagePack (traditionally, -, =, | or # (last for BLLIP)) 3. Remap the representation of certain leaf symbols (brackets etc.) 4. Map to lowercase all leaf nodes 5. Delete empty/trace nodes (ones marked '-NONE-') 6. Recursively delete any nodes that do not dominate any words 7. Delete A over A nodes where the top A dominates nothing else 8. Remove backslahes from lexical items (the Treebank inserts them to escape slashes (/) and stars (*)). 4 is deliberately omitted, and a few things are purely aesthetic.

14 June 2002: It now deletes unary A over A if both nodes' labels are equal (7), and (6) was always part of the Tree.prune() functionality... 30 June 2005: Also splice out an EDITED node, just in case you're parsing the Brown corpus.

Author:
Christopher Manning
See Also:
Serialized Form

Nested Class Summary
static class BobChrisTreeNormalizer.AOverAFilter
           
static class BobChrisTreeNormalizer.EmptyFilter
           
 
Field Summary
protected  Filter<Tree> aOverAFilter
           
protected  Filter<Tree> emptyFilter
           
protected  TreebankLanguagePack tlp
           
 
Constructor Summary
BobChrisTreeNormalizer()
           
BobChrisTreeNormalizer(TreebankLanguagePack tlp)
           
 
Method Summary
protected  String cleanUpLabel(String label)
          Remove things like hyphened functional tags and equals from the end of a node label.
 String normalizeNonterminal(String category)
          Normalizes a nonterminal contents.
 String normalizeTerminal(String leaf)
          Normalizes a leaf contents.
 Tree normalizeWholeTree(Tree tree, TreeFactory tf)
          Normalize a whole tree -- one can assume that this is the root.
 Tree transformTree(Tree tree)
          Does whatever one needs to do to a particular tree.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tlp

protected final TreebankLanguagePack tlp

emptyFilter

protected Filter<Tree> emptyFilter

aOverAFilter

protected Filter<Tree> aOverAFilter
Constructor Detail

BobChrisTreeNormalizer

public BobChrisTreeNormalizer()

BobChrisTreeNormalizer

public BobChrisTreeNormalizer(TreebankLanguagePack tlp)
Method Detail

normalizeTerminal

public String normalizeTerminal(String leaf)
Normalizes a leaf contents. This implementation interns the leaf.

Overrides:
normalizeTerminal in class TreeNormalizer
Parameters:
leaf - The String that decorates the leaf
Returns:
The normalized form of this leaf String

normalizeNonterminal

public String normalizeNonterminal(String category)
Normalizes a nonterminal contents. This implementation strips functional tags, etc. and interns the nonterminal.

Overrides:
normalizeNonterminal in class TreeNormalizer
Parameters:
category - The String that decorates this nonterminal node
Returns:
The normalized form of this nonterminal String

cleanUpLabel

protected String cleanUpLabel(String label)
Remove things like hyphened functional tags and equals from the end of a node label. This version always just returns the phrase structure category, or "ROOT" if the label was null.

Parameters:
label - The label from the treebank
Returns:
The cleaned up label (phrase structure category)

normalizeWholeTree

public Tree normalizeWholeTree(Tree tree,
                               TreeFactory tf)
Normalize a whole tree -- one can assume that this is the root. This implementation deletes empty elements (ones with nonterminal tag label '-NONE-') from the tree, and splices out unary A over A nodes. It does work for a null tree.

Overrides:
normalizeWholeTree in class TreeNormalizer
Parameters:
tree - The tree to be normalized
tf - the TreeFactory to create new nodes (if needed)
Returns:
Tree the normalized tree

transformTree

public Tree transformTree(Tree tree)
Description copied from interface: TreeTransformer
Does whatever one needs to do to a particular tree. This routine is passed a whole Tree, and could itself work recursively, but the canonical usage is to invoke this method via the Tree.transform() method, which will apply the transformer in a bottom-up manner to each local Tree, and hence the implementation of TreeTransformer should merely examine and change a local (one-level) Tree.

Specified by:
transformTree in interface TreeTransformer
Parameters:
tree - A tree. Classes implementing this interface can assume that the tree passed in is not null.
Returns:
the transformed Tree


Stanford NLP Group