edu.stanford.nlp.parser.lexparser
Interface Lexicon

All Superinterfaces:
Serializable
All Known Implementing Classes:
BaseLexicon, ChineseCharacterBasedLexicon, ChineseLexicon, ChineseLexiconAndWordSegmenter

public interface Lexicon
extends Serializable

An interface for lexicons interfacing to lexparser. Its primary responsibility is to provide a conditional probability P(word|tag), which is fulfilled by the {#score} method. Inside the lexparser, Strings are interned and tags and words are usually represented as integers.

Author:
Galen Andrew

Field Summary
static String BOUNDARY
           
static String BOUNDARY_TAG
           
static String UNKNOWN_WORD
           
 
Method Summary
 boolean isKnown(int word)
          Checks whether a word is in the lexicon.
 boolean isKnown(String word)
          Checks whether a word is in the lexicon.
 void readData(BufferedReader in)
          Read the lexicon from the BufferedReader in the format written by writeData.
 Iterator<IntTaggedWord> ruleIteratorByWord(int word, int loc)
          Get an iterator over all rules (pairs of (word, POS)) for this word.
 float score(IntTaggedWord iTW, int loc)
          Get the score of this word with this tag (as an IntTaggedWord) at this loc.
 void train(Collection<Tree> trees)
          Trains this lexicon on the Collection of trees.
 void writeData(Writer w)
          Write the lexicon in human-readable format to the Writer.
 

Field Detail

UNKNOWN_WORD

static final String UNKNOWN_WORD
See Also:
Constant Field Values

BOUNDARY

static final String BOUNDARY
See Also:
Constant Field Values

BOUNDARY_TAG

static final String BOUNDARY_TAG
See Also:
Constant Field Values
Method Detail

isKnown

boolean isKnown(int word)
Checks whether a word is in the lexicon.

Parameters:
word - The word as an int
Returns:
Whether the word is in the lexicon

isKnown

boolean isKnown(String word)
Checks whether a word is in the lexicon.

Parameters:
word - The word as a String
Returns:
Whether the word is in the lexicon

ruleIteratorByWord

Iterator<IntTaggedWord> ruleIteratorByWord(int word,
                                           int loc)
Get an iterator over all rules (pairs of (word, POS)) for this word.

Parameters:
word - The word, represented as an integer in Numberer
loc - The position of the word in the sentence (counting from 0). Implementation note: The BaseLexicon class doesn't actually make use of this position information.
Returns:
An Iterator over a List ofIntTaggedWords, which pair the word with possible taggings as integer pairs. (Each can be thought of as a tag -> word rule.)

train

void train(Collection<Tree> trees)
Trains this lexicon on the Collection of trees.


score

float score(IntTaggedWord iTW,
            int loc)
Get the score of this word with this tag (as an IntTaggedWord) at this loc. (Presumably an estimate of P(word | tag).)

Parameters:
iTW - An IntTaggedWord pairing a word and POS tag
loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial.
Returns:
A double valued score, usually - log P(word|tag)

writeData

void writeData(Writer w)
               throws IOException
Write the lexicon in human-readable format to the Writer. (An optional operation.)

Parameters:
w - The writer to output to
Throws:
IOException

readData

void readData(BufferedReader in)
              throws IOException
Read the lexicon from the BufferedReader in the format written by writeData. (An optional operation.)

Parameters:
in - The BufferedReader to read from
Throws:
IOException


Stanford NLP Group