Implements a variant on the HeadFinder found in Michael Collins' 1999
thesis. This starts with
Collins' head finder. As in
added a head rule for NX.
- The PRN rule used to just take the leftmost thing, we now have it
choose the leftmost lexical category (not the common punctuation etc.)
- Delete IN as a possible head of S, and add FRAG (low priority)
- Place NN before QP in ADJP head rules (more to do for ADJP!)
- Place PDT before RB and after CD in QP rules. Also prefer CD to
DT or RB. And DT to RB.
- Add DT, WDT as low priority choice for head of NP. Add PRP before PRN
Add RBR as low priority choice of head for NP.
- Prefer NP or NX as head of NX, and otherwise default to rightmost not
leftmost (NP-like headedness)
- VP: add JJ and NNP as low priority heads (many tagging errors)
Place JJ above NP in priority, as it is to be preferred to NP object.
- PP: add PP as a possible head (rare conjunctions)
- Added rule for POSSP (can be introduced by parser)
- Added a sensible-ish rule for X.
- Added NML head rules, which are the same as for NP.
- NP head rule: NP and NML are treated almost identically (NP has precedence)
- NAC head rule: NML comes after NN/NNS but after NNP/NNPS
- PP head rule: JJ added
- Added JJP (appearing in David Vadas's annotation), which seems to play
the same role as ADJP.
These rules are suitable for the Penn Treebank.
A case that you apparently just can't handle well in this framework is
(NP (NP ... NP)). If this is a conjunction, apposition or similar, then
the leftmost NP is the head, but if the first is a measure phrase like
(NP $ 38) (NP a share) then the second should probably be the head.