edu.stanford.nlp.parser.lexparser
Class BaseUnknownWordModel

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
All Implemented Interfaces:
UnknownWordModel, Serializable
Direct Known Subclasses:
EnglishUnknownWordModel

public class BaseUnknownWordModel
extends Object
implements UnknownWordModel, Serializable

An unknown word model for a generic language. This was originally designed for German, changing only to remove German-specific numeric features. Models unknown words based on their prefix and suffixes, as well as capital letters.

Author:
Roger Levy, Greg Donaker (corrections and modeling improvements), Christopher Manning (generalized and improved what Greg did), Anna Rafferty
See Also:
Serialized Form

Constructor Summary
BaseUnknownWordModel()
           
BaseUnknownWordModel(Options.LexOptions op)
           
 
Method Summary
 Lexicon getLexicon()
          Get the lexicon associated with this unknown word model; usually not used, but might be useful to tell you if a related word is known or unknown, for example.
 String getSignature(String word, int loc)
          Signature for a specific German word; loc parameter is ignored.
 int getSignatureIndex(int wordIndex, int sentencePosition)
           
 int getUnknownLevel()
          Get the level of equivalence classing for the model.
 void readData(BufferedReader in)
          This operation not supported by this model.
 double score(IntTaggedWord itw)
           
 double score(IntTaggedWord itw, int loc)
          Currently we don't consider loc in determining score.
 double score(TaggedWord tw)
          Calculate the log-prob score of a particular TaggedWord in the unknown word model.
 void setLexicon(Lexicon l)
          Connect the unknown word model to a specific lexicon; often required to set a lexicon prior to using the model.
 void setUnknownLevel(int unknownLevel)
          One unknown word model may allow different options to be set; for example, several models of unknown words for a given language could be included in one class.
 void train(Collection<Tree> trees)
          trains the end-character based unknown word model.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BaseUnknownWordModel

public BaseUnknownWordModel()

BaseUnknownWordModel

public BaseUnknownWordModel(Options.LexOptions op)
Method Detail

score

public double score(IntTaggedWord itw,
                    int loc)
Currently we don't consider loc in determining score.

Specified by:
score in interface UnknownWordModel
Parameters:
itw - An IntTaggedWord pairing a word and POS tag
loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial.
Returns:
A double valued score, usually - log P(word|tag)

score

public double score(IntTaggedWord itw)

score

public double score(TaggedWord tw)
Calculate the log-prob score of a particular TaggedWord in the unknown word model.

Parameters:
tw - the tag->word production in TaggedWord form
Returns:
The log-prob score of a particular TaggedWord.

getSignature

public String getSignature(String word,
                           int loc)
Signature for a specific German word; loc parameter is ignored.

Specified by:
getSignature in interface UnknownWordModel
Parameters:
word -
loc -
Returns:

getSignatureIndex

public int getSignatureIndex(int wordIndex,
                             int sentencePosition)
Specified by:
getSignatureIndex in interface UnknownWordModel

train

public void train(Collection<Tree> trees)
trains the end-character based unknown word model.

Specified by:
train in interface UnknownWordModel
Parameters:
trees - the collection of trees to be trained over

getLexicon

public Lexicon getLexicon()
Get the lexicon associated with this unknown word model; usually not used, but might be useful to tell you if a related word is known or unknown, for example.

Specified by:
getLexicon in interface UnknownWordModel
Returns:

readData

public void readData(BufferedReader in)
              throws IOException
This operation not supported by this model.

Specified by:
readData in interface UnknownWordModel
Throws:
IOException

setLexicon

public void setLexicon(Lexicon l)
Description copied from interface: UnknownWordModel
Connect the unknown word model to a specific lexicon; often required to set a lexicon prior to using the model.

Specified by:
setLexicon in interface UnknownWordModel

getUnknownLevel

public int getUnknownLevel()
Description copied from interface: UnknownWordModel
Get the level of equivalence classing for the model.

Specified by:
getUnknownLevel in interface UnknownWordModel
Returns:

setUnknownLevel

public void setUnknownLevel(int unknownLevel)
Description copied from interface: UnknownWordModel
One unknown word model may allow different options to be set; for example, several models of unknown words for a given language could be included in one class. The unknown level can be used to set the model one would like. Effects of the level will vary based on the implementing class. If a given class only includes one model, setting the unknown level should have no effect.

Specified by:
setUnknownLevel in interface UnknownWordModel


Stanford NLP Group