|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
edu.stanford.nlp.parser.lexparser.ArabicUnknownWordModel
public class ArabicUnknownWordModel
This is a basic unknown word model for Arabic. It supports 4 different
types of feature modeling; see getSignature(String, int)
.
Implementation note: the contents of this class tend to overlap somewhat
with EnglishUnknownWordModel
and were originally included in BaseLexicon
.
Field Summary | |
---|---|
protected int |
lastSentencePosition
|
protected int |
lastSignatureIndex
We cache the last signature looked up, because it asks for the same one many times when an unknown word is encountered! (Note that under the current scheme, one unknown word, if seen sentence-initially and non-initially, will be parsed with two different signatures....) |
protected int |
lastWordToSignaturize
|
protected boolean |
smartMutation
|
protected int |
unknownPrefixSize
|
protected int |
unknownSuffixSize
|
Fields inherited from class edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel |
---|
nullTag, nullWord, tagHash, unknown, unknownLevel, unSeenCounter, useFirst, useGT, VERBOSE |
Constructor Summary | |
---|---|
ArabicUnknownWordModel(Options.LexOptions op,
Lexicon lex)
|
Method Summary | |
---|---|
java.lang.String |
getSignature(java.lang.String word,
int loc)
6-9 were added for Arabic. |
int |
getSignatureIndex(int wordIndex,
int sentencePosition)
Returns the index of the signature of the word numbered wordIndex, where the signature is the String representation of unknown word features. |
int |
getUnknownLevel()
Get the level of equivalence classing for the model. |
protected java.util.List<IntTaggedWord> |
listOfLabeledWordsToEvents(java.util.List<LabeledWord> taggedWords)
|
protected java.util.List<IntTaggedWord> |
listToEvents(java.util.List<TaggedWord> taggedWords)
|
float |
score(IntTaggedWord iTW,
int loc,
double c_Tseen,
double total,
double smooth)
Currently we don't consider loc or the other parameters in determining score in the default implementation; only English uses them. |
void |
setUnknownLevel(int unknownLevel)
One unknown word model may allow different options to be set; for example, several models of unknown words for a given language could be included in one class. |
void |
train(java.util.Collection<Tree> trees)
Trains this lexicon on the Collection of trees. |
void |
train(java.util.Collection<Tree> trees,
boolean keepTagsAsLabels)
Trains this lexicon on the Collection of trees. |
void |
train(java.util.Collection<Tree> trees,
double weight)
|
void |
train(java.util.Collection<Tree> trees,
double weight,
boolean keepTagsAsLabels)
|
protected java.util.List<IntTaggedWord> |
treeToEvents(Tree tree)
|
protected java.util.List<IntTaggedWord> |
treeToEvents(Tree tree,
boolean keepTagsAsLabels)
|
Methods inherited from class edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel |
---|
addTagging, getLexicon, score, scoreGT, scoreProbTagGivenWordSignature, trainUnknownGT, unSeenCounter |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected boolean smartMutation
protected transient int lastSignatureIndex
protected transient int lastSentencePosition
protected transient int lastWordToSignaturize
protected int unknownSuffixSize
protected int unknownPrefixSize
Constructor Detail |
---|
public ArabicUnknownWordModel(Options.LexOptions op, Lexicon lex)
Method Detail |
---|
public void train(java.util.Collection<Tree> trees)
train
in interface UnknownWordModel
train
in class BaseUnknownWordModel
trees
- the collection of trees to be trained overpublic void train(java.util.Collection<Tree> trees, boolean keepTagsAsLabels)
trees
- The trees tro build a lexicon fromkeepTagsAsLabels
- Whether tags should be represented as Labels or
Strings in the lexicon.public void train(java.util.Collection<Tree> trees, double weight)
public void train(java.util.Collection<Tree> trees, double weight, boolean keepTagsAsLabels)
protected java.util.List<IntTaggedWord> treeToEvents(Tree tree, boolean keepTagsAsLabels)
protected java.util.List<IntTaggedWord> treeToEvents(Tree tree)
protected java.util.List<IntTaggedWord> listToEvents(java.util.List<TaggedWord> taggedWords)
protected java.util.List<IntTaggedWord> listOfLabeledWordsToEvents(java.util.List<LabeledWord> taggedWords)
public float score(IntTaggedWord iTW, int loc, double c_Tseen, double total, double smooth)
BaseUnknownWordModel
score
in interface UnknownWordModel
score
in class BaseUnknownWordModel
iTW
- An IntTaggedWord pairing a word and POS tagloc
- The position in the sentence. In the default implementation
this is used only for unknown words to change their
probability distribution when sentence initial. Now,
a negative value c_Tseen
- Total count of this tag (on seen words) in trainingtotal
- Total count of word tokens in trainingsmooth
- Weighting on prior P(T|U) in estimate
public int getSignatureIndex(int wordIndex, int sentencePosition)
getSignatureIndex
in interface UnknownWordModel
getSignatureIndex
in class BaseUnknownWordModel
public java.lang.String getSignature(java.lang.String word, int loc)
getSignature
in interface UnknownWordModel
getSignature
in class BaseUnknownWordModel
word
- The word to make a signature forloc
- Its position in the sentence (mainly so sentence-initial
capitalized words can be treated differently)
public void setUnknownLevel(int unknownLevel)
UnknownWordModel
setUnknownLevel
in interface UnknownWordModel
setUnknownLevel
in class BaseUnknownWordModel
unknownLevel
- Provides a choice between different unknown word
processing schemespublic int getUnknownLevel()
UnknownWordModel
getUnknownLevel
in interface UnknownWordModel
getUnknownLevel
in class BaseUnknownWordModel
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |