Skip navigation links

Class ArabicUnknownWordModel

    • Field Detail

      • smartMutation

        protected final boolean smartMutation
      • unknownSuffixSize

        protected final int unknownSuffixSize
      • unknownPrefixSize

        protected final int unknownPrefixSize
    • Method Detail

      • score

        public float score(IntTaggedWord iTW,
                           int loc,
                           double c_Tseen,
                           double total,
                           double smooth,
                           String word)
        Description copied from class: BaseUnknownWordModel
        Currently we don't consider loc or the other parameters in determining score in the default implementation; only English uses them.
        Specified by:
        score in interface UnknownWordModel
        score in class BaseUnknownWordModel
        iTW - An IntTaggedWord pairing a word and POS tag
        loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial. Now, a negative value
        c_Tseen - Total count of this tag (on seen words) in training
        total - Total count of word tokens in training
        smooth - Weighting on prior P(T|U) in estimate
        word - The word itself; useful so we don't look it up in the index
        A double valued score, usually - log P(word|tag)
      • getSignature

        public String getSignature(String word,
                                   int loc)
        6-9 were added for Arabic. 6 looks for the prefix Al- (and knows that Buckwalter uses various symbols as letters), while 7 just looks for numbers and last letter. 8 looks for Al-, looks for several useful suffixes, and tracks the first letter of the word. (note that the first letter seems a bit more informative than the last letter, overall.) 9 tries to build on 8, but avoiding some of its perceived flaws: really it was using the first AND last letter.
        Specified by:
        getSignature in interface UnknownWordModel
        getSignature in class BaseUnknownWordModel
        word - The word to make a signature for
        loc - Its position in the sentence (mainly so sentence-initial capitalized words can be treated differently)
        A String that is its signature (equivalence class)
      • getUnknownLevel

        public int getUnknownLevel()
        Description copied from interface: UnknownWordModel
        Get the level of equivalence classing for the model. One unknown word model may allow different options to be set; for example, several models of unknown words for a given language could be included in one class. The unknown level can be queried with this method.
        Specified by:
        getUnknownLevel in interface UnknownWordModel
        getUnknownLevel in class BaseUnknownWordModel
        The current level of unknown word equivalence classing

Stanford NLP Group