edu.stanford.nlp.parser.lexparser
Class FrenchUnknownWordModel
java.lang.Object
edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
edu.stanford.nlp.parser.lexparser.FrenchUnknownWordModel
- All Implemented Interfaces:
- UnknownWordModel, Serializable
public class FrenchUnknownWordModel
- extends BaseUnknownWordModel
- See Also:
- Serialized Form
Fields inherited from class edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel |
NULL_ITW, nullTag, nullWord, tagHash, tagIndex, trainOptions, unknown, unknownLevel, unSeenCounter, useFirst, useGT, VERBOSE, wordIndex |
Method Summary |
String |
getSignature(String word,
int loc)
TODO Can add various signatures, setting the signature via Options. |
int |
getSignatureIndex(int index,
int sentencePosition,
String word)
Returns the index of the signature of the word numbered wordIndex, where
the signature is the String representation of unknown word features. |
protected List<IntTaggedWord> |
listOfLabeledWordsToEvents(List<LabeledWord> taggedWords)
|
protected List<IntTaggedWord> |
listToEvents(List<TaggedWord> taggedWords)
|
float |
score(IntTaggedWord iTW,
int loc,
double c_Tseen,
double total,
double smooth,
String word)
Currently we don't consider loc or the other parameters in determining
score in the default implementation; only English uses them. |
void |
train(Collection<Tree> trees)
Trains this lexicon on the Collection of trees. |
protected List<IntTaggedWord> |
treeToEvents(Tree tree)
|
protected List<IntTaggedWord> |
treeToEvents(Tree tree,
boolean keepTagsAsLabels)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
smartMutation
protected boolean smartMutation
unknownSuffixSize
protected int unknownSuffixSize
unknownPrefixSize
protected int unknownPrefixSize
FrenchUnknownWordModel
public FrenchUnknownWordModel(Options op,
Lexicon lex,
Index<String> wordIndex,
Index<String> tagIndex)
train
public void train(Collection<Tree> trees)
- Trains this lexicon on the Collection of trees.
- Specified by:
train
in interface UnknownWordModel
- Overrides:
train
in class BaseUnknownWordModel
- Parameters:
trees
- the collection of trees to be trained over
treeToEvents
protected List<IntTaggedWord> treeToEvents(Tree tree,
boolean keepTagsAsLabels)
treeToEvents
protected List<IntTaggedWord> treeToEvents(Tree tree)
listToEvents
protected List<IntTaggedWord> listToEvents(List<TaggedWord> taggedWords)
listOfLabeledWordsToEvents
protected List<IntTaggedWord> listOfLabeledWordsToEvents(List<LabeledWord> taggedWords)
score
public float score(IntTaggedWord iTW,
int loc,
double c_Tseen,
double total,
double smooth,
String word)
- Description copied from class:
BaseUnknownWordModel
- Currently we don't consider loc or the other parameters in determining
score in the default implementation; only English uses them.
- Specified by:
score
in interface UnknownWordModel
- Overrides:
score
in class BaseUnknownWordModel
- Parameters:
iTW
- An IntTaggedWord pairing a word and POS tagloc
- The position in the sentence. In the default implementation
this is used only for unknown words to change their
probability distribution when sentence initial. Now,
a negative value c_Tseen
- Total count of this tag (on seen words) in trainingtotal
- Total count of word tokens in trainingsmooth
- Weighting on prior P(T|U) in estimateword
- The word itself; useful so we don't look it up in the index
- Returns:
- A double valued score, usually - log P(word|tag)
getSignatureIndex
public int getSignatureIndex(int index,
int sentencePosition,
String word)
- Returns the index of the signature of the word numbered wordIndex, where
the signature is the String representation of unknown word features.
- Specified by:
getSignatureIndex
in interface UnknownWordModel
- Overrides:
getSignatureIndex
in class BaseUnknownWordModel
getSignature
public String getSignature(String word,
int loc)
- TODO Can add various signatures, setting the signature via Options.
- Specified by:
getSignature
in interface UnknownWordModel
- Overrides:
getSignature
in class BaseUnknownWordModel
- Parameters:
word
- The word to make a signature forloc
- Its position in the sentence (mainly so sentence-initial
capitalized words can be treated differently)
- Returns:
- A String that is its signature (equivalence class)
Stanford NLP Group