|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface Lexicon
An interface for lexicons interfacing to lexparser. Its primary responsibility is to provide a conditional probability P(word|tag), which is fulfilled by the {#score} method. Inside the lexparser, Strings are interned and tags and words are usually represented as integers.
Field Summary | |
---|---|
static String |
BOUNDARY
|
static String |
BOUNDARY_TAG
|
static String |
UNKNOWN_WORD
|
Method Summary | |
---|---|
void |
finishTraining()
Done collecting statistics for the lexicon. |
UnknownWordModel |
getUnknownWordModel()
|
void |
initializeTraining(double numTrees)
Start training this lexicon on the expected number of trees. |
boolean |
isKnown(int word)
Checks whether a word is in the lexicon. |
boolean |
isKnown(String word)
Checks whether a word is in the lexicon. |
int |
numRules()
Returns the number of rules (tag rewrites as word) in the Lexicon. |
void |
readData(BufferedReader in)
Read the lexicon from the BufferedReader in the format written by writeData. |
Iterator<IntTaggedWord> |
ruleIteratorByWord(int word,
int loc,
String featureSpec)
Get an iterator over all rules (pairs of (word, POS)) for this word. |
Iterator<IntTaggedWord> |
ruleIteratorByWord(String word,
int loc,
String featureSpec)
Same thing, but with a string that needs to be translated by the lexicon's word index |
float |
score(IntTaggedWord iTW,
int loc,
String word,
String featureSpec)
Get the score of this word with this tag (as an IntTaggedWord) at this loc. |
void |
setUnknownWordModel(UnknownWordModel uwm)
|
void |
train(Collection<Tree> trees)
Trains this lexicon on the Collection of trees. |
void |
train(Collection<Tree> trees,
Collection<Tree> rawTrees)
|
void |
train(Collection<Tree> trees,
double weight)
|
void |
train(List<TaggedWord> sentence,
double weight)
Not all subclasses support this particular method. |
void |
train(TaggedWord tw,
int loc,
double weight)
Not all subclasses support this particular method. |
void |
train(Tree tree,
double weight)
|
void |
trainUnannotated(List<TaggedWord> sentence,
double weight)
Sometimes we might have a sentence of tagged words which we would like to add to the lexicon, but they weren't part of a binarized, markovized, or otherwize annotated tree. |
void |
writeData(Writer w)
Write the lexicon in human-readable format to the Writer. |
Field Detail |
---|
static final String UNKNOWN_WORD
static final String BOUNDARY
static final String BOUNDARY_TAG
Method Detail |
---|
boolean isKnown(int word)
word
- The word as an int
boolean isKnown(String word)
word
- The word as a String
Iterator<IntTaggedWord> ruleIteratorByWord(int word, int loc, String featureSpec)
word
- The word, represented as an integer in Indexloc
- The position of the word in the sentence (counting from 0).
Implementation note: The BaseLexicon class doesn't
actually make use of this position information.featureSpec
- Additional word features like morphosyntactic information.
tag -> word rule.)
Iterator<IntTaggedWord> ruleIteratorByWord(String word, int loc, String featureSpec)
int numRules()
void initializeTraining(double numTrees)
void train(Collection<Tree> trees)
trees
- Trees to train onvoid train(Collection<Tree> trees, double weight)
void train(Collection<Tree> trees, Collection<Tree> rawTrees)
void train(Tree tree, double weight)
void train(List<TaggedWord> sentence, double weight)
void train(TaggedWord tw, int loc, double weight)
void trainUnannotated(List<TaggedWord> sentence, double weight)
void finishTraining()
float score(IntTaggedWord iTW, int loc, String word, String featureSpec)
iTW
- An IntTaggedWord pairing a word and POS tagloc
- The position in the sentence. In the default implementation
this is used only for unknown words to change their
probability distribution when sentence initial.word
- The word itself; useful so we don't have to look it
up in an indexfeatureSpec
- TODO
void writeData(Writer w) throws IOException
w
- The writer to output to
IOException
- If any I/O problemvoid readData(BufferedReader in) throws IOException
in
- The BufferedReader to read from
IOException
- If any I/O problemUnknownWordModel getUnknownWordModel()
void setUnknownWordModel(UnknownWordModel uwm)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |