edu.stanford.nlp.tagger.maxent
Class ExtractorFrames

java.lang.Object
  extended by edu.stanford.nlp.tagger.maxent.ExtractorFrames

public class ExtractorFrames
extends java.lang.Object

This class contains the basic feature extractors used for all words and tag sequences (and interaction terms) for the MaxentTagger, but not the feature extractors explicitly targeting generalization for rare or unknown words. The following options are supported:

NameArgsEffect
wordsbegin, end Individual features for words begin ... end
tagsbegin, end Individual features for tags begin ... end
biwordw1, w2 One feature for the pair of words w1, w2
biwordsbegin, end One feature for each sequential pair of words from begin to end
twoTagst1, t2 One feature for the pair of tags t1, t2
lowercasewordsbegin, end One feature for each word begin ... end, lowercased
orderleft, right A feature for tags left through 0 and a feature for tags 0 through right. Lower order left and right features are also added. This gets very expensive for higher order terms.
wordTagw, t A feature combining word w and tag t.
wordTwoTagsw, t1, t2 A feature combining word w and tags t1, t2.
threeTagst1, t2, t3 A feature combining tags t1, t2, t3.
vbnlength A feature that looks at the left length words for something that appears to be a VBN (in English) without looking at the actual tags. It is zeroeth order, as it does not look at the tag predictions. It also is never used, since it doesn't seem to help.
See ExtractorFramesRare for more options.
There are also macro features:
left3words = words(-1,1),order(2)
left5words = words(-2,2),order(2)
generic = words(-1,1),order(2),biwords(-1,0),wordTag(0,-1)
bidirectional5words = words(-2,2),order(-2,2),twoTags(-1,1), wordTag(0,-1),wordTag(0,1),biwords(-1,1)
bidirectional = words(-1,1),order(-2,2),twoTags(-1,1), wordTag(0,-1),wordTag(0,1),biwords(-1,1)
german = some random stuff
sighan2005 = some other random stuff
The left3words architectures are faster, but slightly less accurate, than the bidirectional architectures. 'naacl2003unknowns' was our traditional set of unknown word features, but you can now specify features more flexibility via the various other supported keywords. The 'shapes' options map words to equivalence classes, which slightly increase accuracy.

Author:
Kristina Toutanova, Michel Galley

Method Summary
protected static Extractor[] getExtractorFrames(java.lang.String arch)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getExtractorFrames

protected static Extractor[] getExtractorFrames(java.lang.String arch)


Stanford NLP Group