|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
edu.stanford.nlp.parser.lexparser.EnglishUnknownWordModel
public class EnglishUnknownWordModel
This is a basic unknown word model for English. It supports 5 different
types of feature modeling; see getSignature(String, int)
.
Implementation note: the contents of this class tend to overlap somewhat
with ArabicUnknownWordModel
and were originally included in BaseLexicon
.
Field Summary | |
---|---|
protected int |
lastSentencePosition
|
protected int |
lastSignatureIndex
We cache the last signature looked up, because it asks for the same one many times when an unknown word is encountered! (Note that under the current scheme, one unknown word, if seen sentence-initially and non-initially, will be parsed with two different signatures....) |
protected int |
lastWordToSignaturize
|
protected static short |
nullTag
|
protected static int |
nullWord
|
ClassicCounter<IntTaggedWord> |
seenCounter
Records the number of times word/tag pair was seen in training data. |
protected boolean |
smartMutation
|
protected Set<IntTaggedWord> |
tags
Set of all tags as IntTaggedWord. |
protected int |
unknownLevel
What type of equivalence classing is done in getSignature |
protected int |
unknownPrefixSize
|
protected int |
unknownSuffixSize
|
protected ClassicCounter<IntTaggedWord> |
unSeenCounter
Has counts for taggings in terms of unseen signatures. |
protected Set<IntTaggedWord> |
words
|
Constructor Summary | |
---|---|
EnglishUnknownWordModel()
|
|
EnglishUnknownWordModel(Options.LexOptions op)
|
Method Summary | |
---|---|
protected void |
addTagging(boolean seen,
IntTaggedWord itw,
double count)
Adds the tagging with count to the data structures in this Lexicon. |
String |
getSignature(String word,
int loc)
This routine returns a String that is the "signature" of the class of a word. |
int |
getSignatureIndex(int wordIndex,
int sentencePosition)
Returns the index of the signature of the word numbered wordIndex, where the signature is the String representation of unknown word features. |
protected List<IntTaggedWord> |
listOfLabeledWordsToEvents(List<LabeledWord> taggedWords)
|
protected List<IntTaggedWord> |
listToEvents(List<TaggedWord> taggedWords)
|
void |
readData(BufferedReader in)
Populates data in this Lexicon from the character stream given by the Reader r. |
double |
score(IntTaggedWord iTW,
int loc)
Currently we don't consider loc in determining score. |
void |
train(Collection<Tree> trees)
Trains this lexicon on the Collection of trees. |
void |
train(Collection<Tree> trees,
boolean keepTagsAsLabels)
Trains this lexicon on the Collection of trees. |
void |
train(Collection<Tree> trees,
double weight)
|
void |
train(Collection<Tree> trees,
double weight,
boolean keepTagsAsLabels)
|
protected List<IntTaggedWord> |
treeToEvents(Tree tree)
|
protected List<IntTaggedWord> |
treeToEvents(Tree tree,
boolean keepTagsAsLabels)
|
void |
tune(Collection<Tree> trees)
|
Methods inherited from class edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel |
---|
getLexicon, getUnknownLevel, score, score, setLexicon, setUnknownLevel |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected boolean smartMutation
protected transient Set<IntTaggedWord> tags
protected transient Set<IntTaggedWord> words
protected transient int lastSignatureIndex
protected transient int lastSentencePosition
protected transient int lastWordToSignaturize
protected static final int nullWord
protected static final short nullTag
protected int unknownLevel
protected int unknownSuffixSize
protected int unknownPrefixSize
public ClassicCounter<IntTaggedWord> seenCounter
protected ClassicCounter<IntTaggedWord> unSeenCounter
Constructor Detail |
---|
public EnglishUnknownWordModel()
public EnglishUnknownWordModel(Options.LexOptions op)
Method Detail |
---|
public void train(Collection<Tree> trees)
train
in interface UnknownWordModel
train
in class BaseUnknownWordModel
trees
- the collection of trees to be trained overpublic void train(Collection<Tree> trees, boolean keepTagsAsLabels)
public void train(Collection<Tree> trees, double weight)
public void train(Collection<Tree> trees, double weight, boolean keepTagsAsLabels)
public void tune(Collection<Tree> trees)
protected List<IntTaggedWord> treeToEvents(Tree tree, boolean keepTagsAsLabels)
protected List<IntTaggedWord> treeToEvents(Tree tree)
protected List<IntTaggedWord> listToEvents(List<TaggedWord> taggedWords)
protected List<IntTaggedWord> listOfLabeledWordsToEvents(List<LabeledWord> taggedWords)
public double score(IntTaggedWord iTW, int loc)
BaseUnknownWordModel
score
in interface UnknownWordModel
score
in class BaseUnknownWordModel
iTW
- An IntTaggedWord pairing a word and POS tagloc
- The position in the sentence. In the default implementation
this is used only for unknown words to change their
probability distribution when sentence initial.
public int getSignatureIndex(int wordIndex, int sentencePosition)
getSignatureIndex
in interface UnknownWordModel
getSignatureIndex
in class BaseUnknownWordModel
public String getSignature(String word, int loc)
getSignature
in interface UnknownWordModel
getSignature
in class BaseUnknownWordModel
word
- The word to make a signature forloc
- Its position in the sentence (mainly so sentence-initial
capitalized words can be treated differently)
public void readData(BufferedReader in) throws IOException
readData
in interface UnknownWordModel
readData
in class BaseUnknownWordModel
IOException
protected void addTagging(boolean seen, IntTaggedWord itw, double count)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |