|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
edu.stanford.nlp.parser.lexparser.EnglishUnknownWordModel
public class EnglishUnknownWordModel
This is a basic unknown word model for English. It supports 5 different
types of feature modeling; see getSignature(String, int)
.
Implementation note: the contents of this class tend to overlap somewhat
with ArabicUnknownWordModel
and were originally included in BaseLexicon
.
Field Summary | |
---|---|
protected boolean |
smartMutation
|
protected int |
unknownPrefixSize
|
protected int |
unknownSuffixSize
|
Fields inherited from class edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel |
---|
NULL_ITW, nullTag, nullWord, tagHash, tagIndex, trainOptions, unknown, unknownLevel, unSeenCounter, useFirst, useGT, VERBOSE, wordIndex |
Constructor Summary | |
---|---|
EnglishUnknownWordModel(Options op,
Lexicon lex,
Index<String> wordIndex,
Index<String> tagIndex)
|
Method Summary | |
---|---|
String |
getSignature(String word,
int loc)
This routine returns a String that is the "signature" of the class of a word. |
int |
getSignatureIndex(int index,
int sentencePosition,
String word)
Returns the index of the signature of the word numbered wordIndex, where the signature is the String representation of unknown word features. |
protected List<IntTaggedWord> |
listOfLabeledWordsToEvents(List<LabeledWord> taggedWords)
|
protected List<IntTaggedWord> |
listToEvents(List<TaggedWord> taggedWords)
|
float |
score(IntTaggedWord iTW,
int loc,
double c_Tseen,
double total,
double smooth,
String word)
Currently we don't consider loc or the other parameters in determining score in the default implementation; only English uses them. |
double |
scoreProbTagGivenWordSignature(IntTaggedWord iTW,
int loc,
double smooth,
String word)
Calculate P(Tag|Signature) with Bayesian smoothing via just P(Tag|Unknown) |
void |
train(Collection<Tree> trees)
Trains this lexicon on the Collection of trees. |
void |
train(Collection<Tree> trees,
boolean keepTagsAsLabels)
Trains this lexicon on the Collection of trees. |
void |
train(Collection<Tree> trees,
double weight)
|
void |
train(Collection<Tree> trees,
double weight,
boolean keepTagsAsLabels)
|
protected List<IntTaggedWord> |
treeToEvents(Tree tree)
|
protected List<IntTaggedWord> |
treeToEvents(Tree tree,
boolean keepTagsAsLabels)
|
Methods inherited from class edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel |
---|
addTagging, getLexicon, getUnknownLevel, score, scoreGT, setUnknownLevel, trainUnknownGT, unSeenCounter |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected final boolean smartMutation
protected final int unknownSuffixSize
protected final int unknownPrefixSize
Constructor Detail |
---|
public EnglishUnknownWordModel(Options op, Lexicon lex, Index<String> wordIndex, Index<String> tagIndex)
Method Detail |
---|
public void train(Collection<Tree> trees)
train
in interface UnknownWordModel
train
in class BaseUnknownWordModel
trees
- the collection of trees to be trained overpublic void train(Collection<Tree> trees, boolean keepTagsAsLabels)
public void train(Collection<Tree> trees, double weight)
public void train(Collection<Tree> trees, double weight, boolean keepTagsAsLabels)
protected List<IntTaggedWord> treeToEvents(Tree tree, boolean keepTagsAsLabels)
protected List<IntTaggedWord> treeToEvents(Tree tree)
protected List<IntTaggedWord> listToEvents(List<TaggedWord> taggedWords)
protected List<IntTaggedWord> listOfLabeledWordsToEvents(List<LabeledWord> taggedWords)
public float score(IntTaggedWord iTW, int loc, double c_Tseen, double total, double smooth, String word)
BaseUnknownWordModel
score
in interface UnknownWordModel
score
in class BaseUnknownWordModel
iTW
- An IntTaggedWord pairing a word and POS tagloc
- The position in the sentence. In the default implementation
this is used only for unknown words to change their
probability distribution when sentence initial. Now,
a negative value c_Tseen
- Total count of this tag (on seen words) in trainingtotal
- Total count of word tokens in trainingsmooth
- Weighting on prior P(T|U) in estimateword
- The word itself; useful so we don't look it up in the index
public double scoreProbTagGivenWordSignature(IntTaggedWord iTW, int loc, double smooth, String word)
scoreProbTagGivenWordSignature
in interface UnknownWordModel
scoreProbTagGivenWordSignature
in class BaseUnknownWordModel
public int getSignatureIndex(int index, int sentencePosition, String word)
getSignatureIndex
in interface UnknownWordModel
getSignatureIndex
in class BaseUnknownWordModel
public String getSignature(String word, int loc)
getSignature
in interface UnknownWordModel
getSignature
in class BaseUnknownWordModel
word
- The word to make a signature forloc
- Its position in the sentence (mainly so sentence-initial
capitalized words can be treated differently)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |