edu.stanford.nlp.tagger.maxent
Class ExtractorFramesRare

java.lang.Object
  extended by edu.stanford.nlp.tagger.maxent.ExtractorFramesRare

public class ExtractorFramesRare
extends Object

This class contains feature extractors for the MaxentTagger that are only applied to rare (low frequency/unknown) words. The following options are supported:

NameArgsEffect
wordshapesleft, right Word shape features, eg transform Foo5 into Xxx# (not exactly like that, but that general idea). Creates individual features for each word left ... right
unicodeshapesleft, right Same thing, but works for some unicode characters, too.
unicodeshapeconjunctionleft, right Instead of individual word shape features, combines several word shapes into one feature.
suffixlength, position Features for suffixes of the word position. One feature for each suffix of length 1 ... length.
prefixlength, position Features for prefixes of the word position. One feature for each prefix of length 1 ... length.
prefixsuffixlength Features for concatenated prefix and suffix. One feature for each of length 1 ... length.
capitalizationsuffixlength Current word only. Combines the suffix with a binary value for whether the word contains any capital letters.
distsimfilename, left, right Individual features for each position left ... right. Compares that word with the dictionary in filename.
distsimconjunctionfilename, left, right A concatenation of distsim features from left ... right.
Also available are the macros "naacl2003unknowns", "lnaacl2003unknowns", and "naacl2003conjunctions". naacl2003unknowns and lnaacl2003unknowns include suffix extractors and extractors for specific word shape features, such as containing or not containing a digit.
The macro "frenchunknowns" is a macro for five extractors speific to French, which test the end of the word to see if it matches common suffixes for various POS classes and plural words. Adding this experiment did not improve accuracy over the regular naacl2003unknowns extractor macro, though.

Author:
Kristina Toutanova, Christopher Manning, Michel Galley

Method Summary
protected static Extractor[] getExtractorFramesRare(String identifier, TTags ttags)
          Get an array of rare word feature Extractor identified by a name.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getExtractorFramesRare

protected static Extractor[] getExtractorFramesRare(String identifier,
                                                    TTags ttags)
Get an array of rare word feature Extractor identified by a name. Note: Names used here must also be known in getExtractorFrames, so we can appropriately add error messages. So if you add a keyword here, add it there as one to be ignored, too. (In the next iteration, this class and ExtractorFrames should probably just be combined).

Parameters:
identifier - Describes a set of extractors for rare word features
Returns:
A set of extractors for rare word features


Stanford NLP Group