|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface UnknownWordModel
Method Summary | |
---|---|
void |
addTagging(boolean seen,
IntTaggedWord itw,
double count)
Adds the tagging with count to the data structures in this Lexicon. |
Lexicon |
getLexicon()
Returns the lexicon used by this unknown word model; lexicon is used to check information about words being seen/unseen |
String |
getSignature(String word,
int loc)
This routine returns a String that is the "signature" of the class of a word. |
int |
getSignatureIndex(int wordIndex,
int sentencePosition)
|
int |
getUnknownLevel()
Get the level of equivalence classing for the model. |
float |
score(IntTaggedWord iTW,
int loc,
double c_Tseen,
double total,
double smooth)
Get the score of this word with this tag (as an IntTaggedWord) at this loc. |
double |
scoreProbTagGivenWordSignature(IntTaggedWord iTW,
int loc,
double smooth)
Calculate P(Tag|Signature) with Bayesian smoothing via just P(Tag|Unknown) |
void |
setUnknownLevel(int unknownLevel)
One unknown word model may allow different options to be set; for example, several models of unknown words for a given language could be included in one class. |
void |
train(Collection<Tree> trees)
Trains this unknown word model on the Collection of trees. |
Counter<IntTaggedWord> |
unSeenCounter()
|
Method Detail |
---|
void setUnknownLevel(int unknownLevel)
unknownLevel
- Provides a choice between different unknown word
processing schemesint getUnknownLevel()
Lexicon getLexicon()
void train(Collection<Tree> trees)
trees
- The trees to train onfloat score(IntTaggedWord iTW, int loc, double c_Tseen, double total, double smooth)
iTW
- An IntTaggedWord pairing a word and POS tagloc
- The position in the sentence. In the default implementation
this is used only for unknown words to change their
probability distribution when sentence initial. Now,
a negative value c_Tseen
- Total count of this tag (on seen words) in trainingtotal
- Total count of word tokens in trainingsmooth
- Weighting on prior P(T|U) in estimate
double scoreProbTagGivenWordSignature(IntTaggedWord iTW, int loc, double smooth)
String getSignature(String word, int loc)
word
- The word to make a signature forloc
- Its position in the sentence (mainly so sentence-initial
capitalized words can be treated differently)
int getSignatureIndex(int wordIndex, int sentencePosition)
void addTagging(boolean seen, IntTaggedWord itw, double count)
seen
- Whether tagging is seenitw
- The taggingcount
- Its weightCounter<IntTaggedWord> unSeenCounter()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |