edu.stanford.nlp.parser.lexparser
Interface UnknownWordModel

All Superinterfaces:
Serializable
All Known Implementing Classes:
BaseUnknownWordModel, EnglishUnknownWordModel

public interface UnknownWordModel
extends Serializable


Method Summary
 Lexicon getLexicon()
          Returns the lexicon used by this unknown word model; lexicon is used to check information about words being seen/unseen
 String getSignature(String word, int loc)
          This routine returns a String that is the "signature" of the class of a word.
 int getSignatureIndex(int wordIndex, int sentencePosition)
           
 int getUnknownLevel()
          Get the level of equivalence classing for the model.
 void readData(BufferedReader in)
           
 double score(IntTaggedWord iTW, int loc)
          Get the score of this word with this tag (as an IntTaggedWord) at this loc.
 void setLexicon(Lexicon l)
          Connect the unknown word model to a specific lexicon; often required to set a lexicon prior to using the model.
 void setUnknownLevel(int unknownLevel)
          One unknown word model may allow different options to be set; for example, several models of unknown words for a given language could be included in one class.
 void train(Collection<Tree> trees)
          Trains this unknown word model on the Collection of trees.
 

Method Detail

setUnknownLevel

void setUnknownLevel(int unknownLevel)
One unknown word model may allow different options to be set; for example, several models of unknown words for a given language could be included in one class. The unknown level can be used to set the model one would like. Effects of the level will vary based on the implementing class. If a given class only includes one model, setting the unknown level should have no effect.


getUnknownLevel

int getUnknownLevel()
Get the level of equivalence classing for the model.

Returns:

getLexicon

Lexicon getLexicon()
Returns the lexicon used by this unknown word model; lexicon is used to check information about words being seen/unseen

Returns:

setLexicon

void setLexicon(Lexicon l)
Connect the unknown word model to a specific lexicon; often required to set a lexicon prior to using the model.

Parameters:
l -

train

void train(Collection<Tree> trees)
Trains this unknown word model on the Collection of trees.


score

double score(IntTaggedWord iTW,
             int loc)
Get the score of this word with this tag (as an IntTaggedWord) at this loc. (Presumably an estimate of P(word | tag).) Assumes the word is unknown.

Parameters:
iTW - An IntTaggedWord pairing a word and POS tag
loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial.
Returns:
A double valued score, usually - log P(word|tag)

getSignature

String getSignature(String word,
                    int loc)
This routine returns a String that is the "signature" of the class of a word. For, example, it might represent whether it is a number of ends in -s. The strings returned by convention match the pattern UNK or UNK-.* , which is just assumed to not match any real word. Behavior depends on the unknownLevel (-uwm flag) passed in to the class.

Parameters:
word - The word to make a signature for
loc - Its position in the sentence (mainly so sentence-initial capitalized words can be treated differently)
Returns:
A String that is its signature (equivalence class)

getSignatureIndex

int getSignatureIndex(int wordIndex,
                      int sentencePosition)

readData

void readData(BufferedReader in)
              throws IOException
Throws:
IOException


Stanford NLP Group