public interface UnknownWordModel
extends java.io.Serializable
Modifier and Type | Method and Description |
---|---|
void |
addTagging(boolean seen,
IntTaggedWord itw,
double count)
Adds the tagging with count to the data structures in this Lexicon.
|
Lexicon |
getLexicon()
Returns the lexicon used by this unknown word model.
|
java.lang.String |
getSignature(java.lang.String word,
int loc)
This routine returns a String that is the "signature" of the class of a
word.
|
int |
getSignatureIndex(int wordIndex,
int sentencePosition,
java.lang.String word)
Returns an unknown word signature as an integer index rather than as a String.
|
int |
getUnknownLevel()
Get the level of equivalence classing for the model.
|
float |
score(IntTaggedWord iTW,
int loc,
double c_Tseen,
double total,
double smooth,
java.lang.String word)
Get the score of this word with this tag (as an IntTaggedWord) at this
location loc in a sentence.
|
double |
scoreProbTagGivenWordSignature(IntTaggedWord iTW,
int loc,
double smooth,
java.lang.String word)
Calculate P(Tag|Signature) with Bayesian smoothing via just P(Tag|Unknown).
|
Counter<IntTaggedWord> |
unSeenCounter()
Returns a Counter from IntTaggedWord to how often they have been seen.
|
int getUnknownLevel()
Lexicon getLexicon()
float score(IntTaggedWord iTW, int loc, double c_Tseen, double total, double smooth, java.lang.String word)
iTW
- An IntTaggedWord pairing a word and POS tagloc
- The position in the sentence. In the default implementation
this is used only for unknown words to change their
probability distribution when sentence initial. Now,
a negative value c_Tseen
- Total count of this tag (on seen words) in trainingtotal
- Total count of word tokens in trainingsmooth
- Weighting on prior P(T|U) in estimateword
- The word itself; useful so we don't look it up in the indexdouble scoreProbTagGivenWordSignature(IntTaggedWord iTW, int loc, double smooth, java.lang.String word)
java.lang.String getSignature(java.lang.String word, int loc)
word
- The word to make a signature forloc
- Its position in the sentence (mainly so sentence-initial
capitalized words can be treated differently)int getSignatureIndex(int wordIndex, int sentencePosition, java.lang.String word)
void addTagging(boolean seen, IntTaggedWord itw, double count)
seen
- Whether tagging is seenitw
- The taggingcount
- Its weightCounter<IntTaggedWord> unSeenCounter()