edu.stanford.nlp.trees
Class AbstractCollinsHeadFinder

java.lang.Object
  extended by edu.stanford.nlp.trees.AbstractCollinsHeadFinder
All Implemented Interfaces:
HeadFinder, java.io.Serializable
Direct Known Subclasses:
ArabicHeadFinder, BikelChineseHeadFinder, ChineseHeadFinder, CollinsHeadFinder, DybroFrenchHeadFinder, FrenchHeadFinder, NegraHeadFinder, SunJurafskyChineseHeadFinder, TigerHeadFinder, TueBaDZHeadFinder

public abstract class AbstractCollinsHeadFinder
extends java.lang.Object
implements HeadFinder

A base class for Head Finders similar to the one described in Michael Collins' 1999 thesis. For a given constituent we perform

for categoryList in categoryLists for index = 1 to n [or n to 1 if R->L] for category in categoryList if category equals daughter[index] choose it.

with a final default that goes with the direction (L->R or R->L) For most constituents, there will be only one category in the list, the exception being, in Collins' original version, NP.

It is up to the overriding base class to initialize the map from constituent type to categoryLists, "nonTerminalInfo", in its constructor. Entries are presumed to be of type String[][]. Each String[] is a list of categories, except for the first entry, which specifies direction of traversal and must be one of "right", "left" or "rightdis" or "leftdis".

"left" means search left-to-right by category and then by position "leftdis" means search left-to-right by position and then by category "right" means search right-to-left by category and then by position "rightdis" means search right-to-left by position and then by category "leftexcept" means to take the first thing from the left that isn't in the list "rightexcept" means to take the first thing from the right that isn't on the list

2002/10/28 -- Category label identity checking now uses the equals() method instead of ==, so not interning category labels shouldn't break things anymore. (Roger Levy)
2003/02/10 -- Changed to use TreebankLanguagePack and to cut on characters that set off annotations, so this should work even if functional tags are still on nodes.
2004/03/30 -- Made abstract base class and subclasses for CollinsHeadFinder, ModCollinsHeadFinder, SemanticHeadFinder, ChineseHeadFinder (and trees.icegb.ICEGBHeadFinder, trees.international.negra.NegraHeadFinder, and movetrees.EnglishPennMaxProjectionHeadFinder) 2011/01/13 -- Add support for categoriesToAvoid (which can be set to ensure that punctuation is not the head if there are other options)

Author:
Christopher Manning, Galen Andrew
See Also:
Serialized Form

Field Summary
protected  java.lang.String[] defaultRule
          Default direction if no rule is found for category.
protected  java.util.Map<java.lang.String,java.lang.String[][]> nonTerminalInfo
           
protected  TreebankLanguagePack tlp
           
 
Constructor Summary
protected AbstractCollinsHeadFinder(TreebankLanguagePack tlp)
           
 
Method Summary
 Tree determineHead(Tree t)
          Determine which daughter of the current parse tree is the head.
 Tree determineHead(Tree t, Tree parent)
          Determine which daughter of the current parse tree is the head.
protected  Tree determineNonTrivialHead(Tree t, Tree parent)
          Called by determineHead and may be overridden in subclasses if special treatment is necessary for particular categories.
protected  Tree findMarkedHead(Tree t)
          A way for subclasses for corpora with explicit head markings to return the explicitly marked head
protected  int postOperationFix(int headIdx, Tree[] daughterTrees)
          A way for subclasses to fix any heads under special conditions The default does nothing.
protected  void setCategoriesToAvoid(java.lang.String[] categoriesToAvoid)
          Set categories which, if it comes to last resort processing (i.e.
protected  Tree traverseLocate(Tree[] daughterTrees, java.lang.String[] how, boolean lastResort)
          Attempt to locate head daughter tree from among daughters.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tlp

protected final TreebankLanguagePack tlp

nonTerminalInfo

protected java.util.Map<java.lang.String,java.lang.String[][]> nonTerminalInfo

defaultRule

protected java.lang.String[] defaultRule
Default direction if no rule is found for category. Subclasses can turn it on if they like. If they don't it is an error if no rule is defined for a category (null is returned).

Constructor Detail

AbstractCollinsHeadFinder

protected AbstractCollinsHeadFinder(TreebankLanguagePack tlp)
Method Detail

setCategoriesToAvoid

protected void setCategoriesToAvoid(java.lang.String[] categoriesToAvoid)
Set categories which, if it comes to last resort processing (i.e. none of the rules matched), will be avoided as heads. In last resort processing, it will attempt to match the leftmost or rightmost constituent not in this set but will fall back to the left or rightmost constituent if necessary.

Parameters:
categoriesToAvoid - list of constituent types to avoid

findMarkedHead

protected Tree findMarkedHead(Tree t)
A way for subclasses for corpora with explicit head markings to return the explicitly marked head

Parameters:
t - a tree to find the head of
Returns:
the marked head-- null if no marked head

determineHead

public Tree determineHead(Tree t)
Determine which daughter of the current parse tree is the head.

Specified by:
determineHead in interface HeadFinder
Parameters:
t - The parse tree to examine the daughters of. If this is a leaf, null is returned
Returns:
The daughter parse tree that is the head of t
See Also:
for a routine to call this and spread heads throughout a tree

determineHead

public Tree determineHead(Tree t,
                          Tree parent)
Determine which daughter of the current parse tree is the head.

Specified by:
determineHead in interface HeadFinder
Parameters:
t - The parse tree to examine the daughters of. If this is a leaf, null is returned
parent - The parent of t
Returns:
The daughter parse tree that is the head of t. Returns null for leaf nodes.
See Also:
for a routine to call this and spread heads throughout a tree

determineNonTrivialHead

protected Tree determineNonTrivialHead(Tree t,
                                       Tree parent)
Called by determineHead and may be overridden in subclasses if special treatment is necessary for particular categories.


traverseLocate

protected Tree traverseLocate(Tree[] daughterTrees,
                              java.lang.String[] how,
                              boolean lastResort)
Attempt to locate head daughter tree from among daughters. Go through daughterTrees looking for things from a set found by looking up the motherkey specifier in a hash map, and if you do not find one, take leftmost or rightmost thing iff lastResort is true, otherwise return null.


postOperationFix

protected int postOperationFix(int headIdx,
                               Tree[] daughterTrees)
A way for subclasses to fix any heads under special conditions The default does nothing.

Parameters:
headIdx - the index of the proposed head
daughterTrees - the array of daughter trees
Returns:
the new headIndex


Stanford NLP Group