edu.stanford.nlp.parser.lexparser
Interface Lexicon

All Superinterfaces:
java.io.Serializable
All Known Implementing Classes:
BaseLexicon, ChineseCharacterBasedLexicon, ChineseLexicon, ChineseLexiconAndWordSegmenter, FactoredLexicon, PetrovLexicon

public interface Lexicon
extends java.io.Serializable

An interface for lexicons interfacing to lexparser. Its primary responsibility is to provide a conditional probability P(word|tag), which is fulfilled by the {#score} method. Inside the lexparser, Strings are interned and tags and words are usually represented as integers.

Author:
Galen Andrew

Field Summary
static java.lang.String BOUNDARY
           
static java.lang.String BOUNDARY_TAG
           
static java.lang.String UNKNOWN_WORD
           
 
Method Summary
 UnknownWordModel getUnknownWordModel()
           
 boolean isKnown(int word)
          Checks whether a word is in the lexicon.
 boolean isKnown(java.lang.String word)
          Checks whether a word is in the lexicon.
 int numRules()
          Returns the number of rules (tag rewrites as word) in the Lexicon.
 void readData(java.io.BufferedReader in)
          Read the lexicon from the BufferedReader in the format written by writeData.
 java.util.Iterator<IntTaggedWord> ruleIteratorByWord(int word, int loc, java.lang.String featureSpec)
          Get an iterator over all rules (pairs of (word, POS)) for this word.
 float score(IntTaggedWord iTW, int loc)
          Get the score of this word with this tag (as an IntTaggedWord) at this loc.
 void setUnknownWordModel(UnknownWordModel uwm)
           
 void train(java.util.Collection<Tree> trees)
          Trains this lexicon on the Collection of trees.
 void writeData(java.io.Writer w)
          Write the lexicon in human-readable format to the Writer.
 

Field Detail

UNKNOWN_WORD

static final java.lang.String UNKNOWN_WORD
See Also:
Constant Field Values

BOUNDARY

static final java.lang.String BOUNDARY
See Also:
Constant Field Values

BOUNDARY_TAG

static final java.lang.String BOUNDARY_TAG
See Also:
Constant Field Values
Method Detail

isKnown

boolean isKnown(int word)
Checks whether a word is in the lexicon.

Parameters:
word - The word as an int
Returns:
Whether the word is in the lexicon

isKnown

boolean isKnown(java.lang.String word)
Checks whether a word is in the lexicon.

Parameters:
word - The word as a String
Returns:
Whether the word is in the lexicon

ruleIteratorByWord

java.util.Iterator<IntTaggedWord> ruleIteratorByWord(int word,
                                                     int loc,
                                                     java.lang.String featureSpec)
Get an iterator over all rules (pairs of (word, POS)) for this word.

Parameters:
word - The word, represented as an integer in Numberer
loc - The position of the word in the sentence (counting from 0). Implementation note: The BaseLexicon class doesn't actually make use of this position information.
featureSpec - Additional word features like morphosyntactic information.
Returns:
An Iterator over a List ofIntTaggedWords, which pair the word with possible taggings as integer pairs. (Each can be thought of as a tag -> word rule.)

numRules

int numRules()
Returns the number of rules (tag rewrites as word) in the Lexicon. This method assumes that the lexicon has been initialized.

Returns:
The number of rules (tag rewrites as word) in the Lexicon.

train

void train(java.util.Collection<Tree> trees)
Trains this lexicon on the Collection of trees.

Parameters:
trees - Trees to train on

score

float score(IntTaggedWord iTW,
            int loc)
Get the score of this word with this tag (as an IntTaggedWord) at this loc. (Presumably an estimate of P(word | tag).)

Parameters:
iTW - An IntTaggedWord pairing a word and POS tag
loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial.
Returns:
A score, usually, log P(word|tag)

writeData

void writeData(java.io.Writer w)
               throws java.io.IOException
Write the lexicon in human-readable format to the Writer. (An optional operation.)

Parameters:
w - The writer to output to
Throws:
java.io.IOException - If any I/O problem

readData

void readData(java.io.BufferedReader in)
              throws java.io.IOException
Read the lexicon from the BufferedReader in the format written by writeData. (An optional operation.)

Parameters:
in - The BufferedReader to read from
Throws:
java.io.IOException - If any I/O problem

getUnknownWordModel

UnknownWordModel getUnknownWordModel()

setUnknownWordModel

void setUnknownWordModel(UnknownWordModel uwm)


Stanford NLP Group