edu.stanford.nlp.parser.lexparser
Class FrenchUnknownWordModel
java.lang.Object
edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
edu.stanford.nlp.parser.lexparser.FrenchUnknownWordModel
- All Implemented Interfaces:
- UnknownWordModel, java.io.Serializable
public class FrenchUnknownWordModel
- extends BaseUnknownWordModel
- See Also:
- Serialized Form
Method Summary |
java.lang.String |
getSignature(java.lang.String word,
int loc)
TODO Can add various signatures, setting the signature via Options. |
int |
getSignatureIndex(int wordIndex,
int sentencePosition)
Returns the index of the signature of the word numbered wordIndex, where
the signature is the String representation of unknown word features. |
protected java.util.List<IntTaggedWord> |
listOfLabeledWordsToEvents(java.util.List<LabeledWord> taggedWords)
|
protected java.util.List<IntTaggedWord> |
listToEvents(java.util.List<TaggedWord> taggedWords)
|
float |
score(IntTaggedWord iTW,
int loc,
double c_Tseen,
double total,
double smooth)
Currently we don't consider loc or the other parameters in determining
score in the default implementation; only English uses them. |
void |
train(java.util.Collection<Tree> trees)
Trains this lexicon on the Collection of trees. |
protected java.util.List<IntTaggedWord> |
treeToEvents(Tree tree)
|
protected java.util.List<IntTaggedWord> |
treeToEvents(Tree tree,
boolean keepTagsAsLabels)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
smartMutation
protected boolean smartMutation
lastSignatureIndex
protected transient int lastSignatureIndex
- We cache the last signature looked up, because it asks for the same one
many times when an unknown word is encountered! (Note that under the
current scheme, one unknown word, if seen sentence-initially and
non-initially, will be parsed with two different signatures....)
lastSentencePosition
protected transient int lastSentencePosition
lastWordToSignaturize
protected transient int lastWordToSignaturize
unknownSuffixSize
protected int unknownSuffixSize
unknownPrefixSize
protected int unknownPrefixSize
FrenchUnknownWordModel
public FrenchUnknownWordModel(Options.LexOptions op,
Lexicon lex)
train
public void train(java.util.Collection<Tree> trees)
- Trains this lexicon on the Collection of trees.
- Specified by:
train
in interface UnknownWordModel
- Overrides:
train
in class BaseUnknownWordModel
- Parameters:
trees
- the collection of trees to be trained over
treeToEvents
protected java.util.List<IntTaggedWord> treeToEvents(Tree tree,
boolean keepTagsAsLabels)
treeToEvents
protected java.util.List<IntTaggedWord> treeToEvents(Tree tree)
listToEvents
protected java.util.List<IntTaggedWord> listToEvents(java.util.List<TaggedWord> taggedWords)
listOfLabeledWordsToEvents
protected java.util.List<IntTaggedWord> listOfLabeledWordsToEvents(java.util.List<LabeledWord> taggedWords)
score
public float score(IntTaggedWord iTW,
int loc,
double c_Tseen,
double total,
double smooth)
- Description copied from class:
BaseUnknownWordModel
- Currently we don't consider loc or the other parameters in determining
score in the default implementation; only English uses them.
- Specified by:
score
in interface UnknownWordModel
- Overrides:
score
in class BaseUnknownWordModel
- Parameters:
iTW
- An IntTaggedWord pairing a word and POS tagloc
- The position in the sentence. In the default implementation
this is used only for unknown words to change their
probability distribution when sentence initial. Now,
a negative value c_Tseen
- Total count of this tag (on seen words) in trainingtotal
- Total count of word tokens in trainingsmooth
- Weighting on prior P(T|U) in estimate
- Returns:
- A double valued score, usually - log P(word|tag)
getSignatureIndex
public int getSignatureIndex(int wordIndex,
int sentencePosition)
- Returns the index of the signature of the word numbered wordIndex, where
the signature is the String representation of unknown word features.
Caches the last signature index returned.
- Specified by:
getSignatureIndex
in interface UnknownWordModel
- Overrides:
getSignatureIndex
in class BaseUnknownWordModel
getSignature
public java.lang.String getSignature(java.lang.String word,
int loc)
- TODO Can add various signatures, setting the signature via Options.
- Specified by:
getSignature
in interface UnknownWordModel
- Overrides:
getSignature
in class BaseUnknownWordModel
- Parameters:
word
- The word to make a signature forloc
- Its position in the sentence (mainly so sentence-initial
capitalized words can be treated differently)
- Returns:
- A String that is its signature (equivalence class)
Stanford NLP Group