edu.stanford.nlp.parser.lexparser
Class FactoredLexicon

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.BaseLexicon
      extended by edu.stanford.nlp.parser.lexparser.FactoredLexicon
All Implemented Interfaces:
Lexicon, java.io.Serializable

public class FactoredLexicon
extends BaseLexicon

Author:
Spence Green
See Also:
Serialized Form

Field Summary
 
Fields inherited from class edu.stanford.nlp.parser.lexparser.BaseLexicon
DEBUG_LEXICON, DEBUG_LEXICON_SCORE, flexiTag, NULL_ITW, nullTag, nullWord, op, rulesWithWord, seenCounter, smartMutation, smoothInUnknownsThreshold, tagIndex, tags, testOptions, trainOptions, useSignatureForKnownSmoothing, uwModel, uwModelTrainer, uwModelTrainerClass, wordIndex, words
 
Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon
BOUNDARY, BOUNDARY_TAG, UNKNOWN_WORD
 
Constructor Summary
FactoredLexicon(MorphoFeatureSpecification morphoSpec, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex)
           
FactoredLexicon(Options op, MorphoFeatureSpecification morphoSpec, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex)
           
 
Method Summary
protected  void initRulesWithWord()
          Rule table is lemmas!
static void main(java.lang.String[] args)
           
 java.util.Iterator<IntTaggedWord> ruleIteratorByWord(int word, int loc, java.lang.String featureSpec)
          Rule table is lemmas.
 float score(IntTaggedWord iTW, int loc, java.lang.String word, java.lang.String featureSpec)
          Get the score of this word with this tag (as an IntTaggedWord) at this location.
 void train(java.util.Collection<Tree> trees, java.util.Collection<Tree> rawTrees)
          This method should populate wordIndex, tagIndex, and morphIndex.
 
Methods inherited from class edu.stanford.nlp.parser.lexparser.BaseLexicon
addAll, addAll, addTagging, evaluateCoverage, examineIntersection, finishTraining, getBaseTag, getUnknownWordModel, incrementTreesRead, initializeTraining, isKnown, isKnown, listToEvents, numRules, printLexStats, readData, ruleIteratorByWord, ruleIteratorByWord, setUnknownWordModel, train, train, train, train, train, trainUnannotated, trainWithExpansion, treeToEvents, tune, writeData
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FactoredLexicon

public FactoredLexicon(MorphoFeatureSpecification morphoSpec,
                       Index<java.lang.String> wordIndex,
                       Index<java.lang.String> tagIndex)

FactoredLexicon

public FactoredLexicon(Options op,
                       MorphoFeatureSpecification morphoSpec,
                       Index<java.lang.String> wordIndex,
                       Index<java.lang.String> tagIndex)
Method Detail

ruleIteratorByWord

public java.util.Iterator<IntTaggedWord> ruleIteratorByWord(int word,
                                                            int loc,
                                                            java.lang.String featureSpec)
Rule table is lemmas. So isKnown() is slightly trickier.

Specified by:
ruleIteratorByWord in interface Lexicon
Overrides:
ruleIteratorByWord in class BaseLexicon
Parameters:
word - The word (as an int)
loc - Its index in the sentence (usually only relevant for unknown words)
featureSpec - Additional word features like morphosyntactic information.
Returns:
A list of possible taggings

score

public float score(IntTaggedWord iTW,
                   int loc,
                   java.lang.String word,
                   java.lang.String featureSpec)
Description copied from class: BaseLexicon
Get the score of this word with this tag (as an IntTaggedWord) at this location. (Presumably an estimate of P(word | tag).)

Implementation documentation: Seen: c_W = count(W) c_TW = count(T,W) c_T = count(T) c_Tunseen = count(T) among new words in 2nd half total = count(seen words) totalUnseen = count("unseen" words) p_T_U = Pmle(T|"unseen") pb_T_W = P(T|W). If (c_W > smoothInUnknownsThreshold) = c_TW/c_W Else (if not smart mutation) pb_T_W = bayes prior smooth[1] with p_T_U p_T= Pmle(T) p_W = Pmle(W) pb_W_T = log(pb_T_W * p_W / p_T) [Bayes rule] Note that this doesn't really properly reserve mass to unknowns. Unseen: c_TS = count(T,Sig|Unseen) c_S = count(Sig) c_T = count(T|Unseen) c_U = totalUnseen above p_T_U = Pmle(T|Unseen) pb_T_S = Bayes smooth of Pmle(T|S) with P(T|Unseen) [smooth[0]] pb_W_T = log(P(W|T)) inverted

Specified by:
score in interface Lexicon
Overrides:
score in class BaseLexicon
Parameters:
iTW - An IntTaggedWord pairing a word and POS tag
loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial
word - The word itself; useful so we don't have to look it up in an index
featureSpec - TODO
Returns:
A float score, usually, log P(word|tag)

train

public void train(java.util.Collection<Tree> trees,
                  java.util.Collection<Tree> rawTrees)
This method should populate wordIndex, tagIndex, and morphIndex.

Specified by:
train in interface Lexicon
Overrides:
train in class BaseLexicon

initRulesWithWord

protected void initRulesWithWord()
Rule table is lemmas!

Overrides:
initRulesWithWord in class BaseLexicon

main

public static void main(java.lang.String[] args)
Parameters:
args -


Stanford NLP Group