public class ExtractorFramesRare
- extends java.lang.Object
This class contains feature extractors for the MaxentTagger that are only
applied to rare (low frequency/unknown) words.
The following options are supported:
Also available are the macros "naacl2003unknowns",
"lnaacl2003unknowns", and "naacl2003conjunctions".
naacl2003unknowns and lnaacl2003unknowns include suffix extractors
and extractors for specific word shape features, such as containing
or not containing a digit.
||Word shape features, eg transform Foo5 into Xxx#
(not exactly like that, but that general idea).
Creates individual features for each word left ... right|
||Same thing, but works for some unicode characters, too.|
||Instead of individual word shape features, combines several
word shapes into one feature.|
||Features for suffixes of the word position. One feature for
each suffix of length 1 ... length.|
||Features for prefixes of the word position. One feature for
each prefix of length 1 ... length.|
||Features for concatenated prefix and suffix. One feature for
each of length 1 ... length.|
||Current word only. Combines the suffix with a binary value
for whether the word contains any capital letters.|
|distsim||filename, left, right
||Individual features for each position left ... right.
Compares that word with the dictionary in filename.|
|distsimconjunction||filename, left, right
||A concatenation of distsim features from left ... right.|
The macro "frenchunknowns" is a macro for five extractors speific
to French, which test the end of the word to see if it matches
common suffixes for various POS classes and plural words. Adding
this experiment did not improve accuracy over the regular
naacl2003unknowns extractor macro, though.
- Kristina Toutanova, Christopher Manning, Michel Galley
|Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
protected static Extractor getExtractorFramesRare(java.lang.String identifier,
- Get an array of rare word feature Extractor identified by a name.
Note: Names used here must also be known in getExtractorFrames, so we
can appropriately add error messages. So if you add a keyword here,
add it there as one to be ignored, too. (In the next iteration, this
class and ExtractorFrames should probably just be combined).
identifier - Describes a set of extractors for rare word features
- A set of extractors for rare word features
Stanford NLP Group