This class contains the basic feature extractors used for all words and
tag sequences (and interaction terms) for the MaxentTagger, but not the
feature extractors explicitly targeting generalization for rare or unknown
words.
The following options are supported:
Name | Args | Effect |
words | begin, end |
Individual features for words begin ... end |
tags | begin, end |
Individual features for tags begin ... end |
biword | w1, w2 |
One feature for the pair of words w1, w2 |
biwords | begin, end |
One feature for each sequential pair of words
from begin to end |
twoTags | t1, t2 |
One feature for the pair of tags t1, t2 |
lowercasewords | begin, end |
One feature for each word begin ... end, lowercased |
order | left, right |
A feature for tags left through 0 and a feature for
tags 0 through right. Lower order left and right features are
also added.
This gets very expensive for higher order terms. |
wordTag | w, t |
A feature combining word w and tag t. |
wordTwoTags | w, t1, t2 |
A feature combining word w and tags t1, t2. |
threeTags | t1, t2, t3 |
A feature combining tags t1, t2, t3. |
vbn | length |
A feature that looks at the left length words for something that
appears to be a VBN (in English) without looking at the actual tags.
It is zeroeth order, as it does not look at the tag predictions.
It also is never used, since it doesn't seem to help. |
allwordshapes | left, right |
Word shape features, eg transform Foo5 into Xxx#
(not exactly like that, but that general idea).
Creates individual features for each word left ... right
Compare with the feature "wordshapes" in ExtractorFramesRare,
which is only applied to rare words. |
allunicodeshapes | left, right |
Same thing, but works for some unicode characters, too. |
allunicodeshapeconjunction | left, right |
Instead of individual word shape features, combines several
word shapes into one feature. |
See