Lexicon (Stanford JavaNLP API)

All Superinterfaces:

java.io.Serializable

All Known Implementing Classes:

BaseLexicon, ChineseCharacterBasedLexicon, ChineseLexicon, ChineseLexiconAndWordSegmenter, ChineseMaxentLexicon, FactoredLexicon
```
public interface Lexicon
extends java.io.Serializable
```
An interface for lexicons interfacing to lexparser. Its primary responsibility is to provide a conditional probability P(word|tag), which is fulfilled by the {#score} method. Inside the lexparser, Strings are interned and tags and words are usually represented as integers.

Author:

Galen Andrew

Field Summary

Fields
Modifier and Type Field and Description

static java.lang.String BOUNDARY

static java.lang.String BOUNDARY_TAG

static java.lang.String UNKNOWN_WORD

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`BOUNDARY`
`static java.lang.String`	`BOUNDARY_TAG`
`static java.lang.String`	`UNKNOWN_WORD`

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method and Description
`void`	`finishTraining()` Done collecting statistics for the lexicon.
`UnknownWordModel`	`getUnknownWordModel()`
`void`	`incrementTreesRead(double weight)` If training on a per-word basis instead of on a per-tree basis, we will want to increment the tree count as this happens.
`void`	`initializeTraining(double numTrees)` Start training this lexicon on the expected number of trees.
`boolean`	`isKnown(int word)` Checks whether a word is in the lexicon.
`boolean`	`isKnown(java.lang.String word)` Checks whether a word is in the lexicon.
`int`	`numRules()` Returns the number of rules (tag rewrites as word) in the Lexicon.
`void`	`readData(java.io.BufferedReader in)` Read the lexicon from the BufferedReader in the format written by writeData.
`java.util.Iterator<IntTaggedWord>`	`ruleIteratorByWord(int word, int loc, java.lang.String featureSpec)` Get an iterator over all rules (pairs of (word, POS)) for this word.
`java.util.Iterator<IntTaggedWord>`	`ruleIteratorByWord(java.lang.String word, int loc, java.lang.String featureSpec)` Same thing, but with a string that needs to be translated by the lexicon's word index
`float`	`score(IntTaggedWord iTW, int loc, java.lang.String word, java.lang.String featureSpec)` Get the score of this word with this tag (as an IntTaggedWord) at this loc.
`void`	`setUnknownWordModel(UnknownWordModel uwm)`
`java.util.Set<java.lang.String>`	`tagSet(java.util.function.Function<java.lang.String,java.lang.String> basicCategoryFunction)` Return the Set of tags used by this tagger (available after training the tagger).
`void`	`train(java.util.Collection<Tree> trees)` Trains this lexicon on the Collection of trees.
`void`	`train(java.util.Collection<Tree> trees, java.util.Collection<Tree> rawTrees)`
`void`	`train(java.util.Collection<Tree> trees, double weight)`
`void`	`train(java.util.List<TaggedWord> sentence, double weight)` Not all subclasses support this particular method.
`void`	`train(TaggedWord tw, int loc, double weight)` Not all subclasses support this particular method.
`void`	`train(Tree tree, double weight)`
`void`	`trainUnannotated(java.util.List<TaggedWord> sentence, double weight)` Sometimes we might have a sentence of tagged words which we would like to add to the lexicon, but they weren't part of a binarized, markovized, or otherwise annotated tree.
`void`	`writeData(java.io.Writer w)` Write the lexicon in human-readable format to the Writer.

- Field Detail
  - UNKNOWN_WORD
```
static final java.lang.String UNKNOWN_WORD
```
    See Also:
    
    Constant Field Values
  - BOUNDARY
```
static final java.lang.String BOUNDARY
```
    See Also:
    
    Constant Field Values
  - BOUNDARY_TAG
```
static final java.lang.String BOUNDARY_TAG
```
    See Also:
    
    Constant Field Values
- Method Detail
  - isKnown
```
boolean isKnown(int word)
```
    Checks whether a word is in the lexicon.
    
    Parameters:
    
    word - The word as an int
    
    Returns:
    
    Whether the word is in the lexicon
  - isKnown
```
boolean isKnown(java.lang.String word)
```
    Checks whether a word is in the lexicon.
    
    Parameters:
    
    word - The word as a String
    
    Returns:
    
    Whether the word is in the lexicon
  - tagSet
```
java.util.Set<java.lang.String> tagSet(java.util.function.Function<java.lang.String,java.lang.String> basicCategoryFunction)
```
    Return the Set of tags used by this tagger (available after training the tagger).
    
    Returns:
    
    The Set of tags used by this tagger
  - ruleIteratorByWord
```
java.util.Iterator<IntTaggedWord> ruleIteratorByWord(int word,
                                                     int loc,
                                                     java.lang.String featureSpec)
```
    Get an iterator over all rules (pairs of (word, POS)) for this word.
    
    Parameters:
    
    word - The word, represented as an integer in Index
    
    loc - The position of the word in the sentence (counting from 0). Implementation note: The BaseLexicon class doesn't actually make use of this position information.
    
    featureSpec - Additional word features like morphosyntactic information.
    
    Returns:
    
    An Iterator over a List ofIntTaggedWords, which pair the word with possible taggings as integer pairs. (Each can be thought of as a tag -> word rule.)
  - ruleIteratorByWord
```
java.util.Iterator<IntTaggedWord> ruleIteratorByWord(java.lang.String word,
                                                     int loc,
                                                     java.lang.String featureSpec)
```
    Same thing, but with a string that needs to be translated by the lexicon's word index
  - numRules
```
int numRules()
```
    Returns the number of rules (tag rewrites as word) in the Lexicon. This method assumes that the lexicon has been initialized.
    
    Returns:
    
    The number of rules (tag rewrites as word) in the Lexicon.
  - initializeTraining
```
void initializeTraining(double numTrees)
```
    Start training this lexicon on the expected number of trees. (Some UnknownWordModels use the number of trees to know when to start counting statistics.)
  - train
```
void train(java.util.Collection<Tree> trees)
```
    Trains this lexicon on the Collection of trees. Can be called more than once with different collections of trees.
    
    Parameters:
    
    trees - Trees to train on
  - train
```
void train(java.util.Collection<Tree> trees,
           double weight)
```
  - train
```
void train(java.util.Collection<Tree> trees,
           java.util.Collection<Tree> rawTrees)
```
  - train
```
void train(Tree tree,
           double weight)
```
  - train
```
void train(java.util.List<TaggedWord> sentence,
           double weight)
```
    Not all subclasses support this particular method. Those that don't will barf...
  - train
```
void train(TaggedWord tw,
           int loc,
           double weight)
```
    Not all subclasses support this particular method. Those that don't will barf...
  - incrementTreesRead
```
void incrementTreesRead(double weight)
```
    If training on a per-word basis instead of on a per-tree basis, we will want to increment the tree count as this happens.
  - trainUnannotated
```
void trainUnannotated(java.util.List<TaggedWord> sentence,
                      double weight)
```
    Sometimes we might have a sentence of tagged words which we would like to add to the lexicon, but they weren't part of a binarized, markovized, or otherwise annotated tree.
  - finishTraining
```
void finishTraining()
```
    Done collecting statistics for the lexicon.
  - score
```
float score(IntTaggedWord iTW,
            int loc,
            java.lang.String word,
            java.lang.String featureSpec)
```
    Get the score of this word with this tag (as an IntTaggedWord) at this loc. (Presumably an estimate of P(word | tag).)
    
    Parameters:
    
    iTW - An IntTaggedWord pairing a word and POS tag
    
    loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial.
    
    word - The word itself; useful so we don't have to look it up in an index
    
    featureSpec - TODO
    
    Returns:
    
    A score, usually, log P(word|tag)
  - writeData
```
void writeData(java.io.Writer w)
        throws java.io.IOException
```
    Write the lexicon in human-readable format to the Writer. (An optional operation.)
    
    Parameters:
    
    w - The writer to output to
    
    Throws:
    
    java.io.IOException - If any I/O problem
  - readData
```
void readData(java.io.BufferedReader in)
       throws java.io.IOException
```
    Read the lexicon from the BufferedReader in the format written by writeData. (An optional operation.)
    
    Parameters:
    
    in - The BufferedReader to read from
    
    Throws:
    
    java.io.IOException - If any I/O problem
  - getUnknownWordModel
```
UnknownWordModel getUnknownWordModel()
```
  - setUnknownWordModel
```
void setUnknownWordModel(UnknownWordModel uwm)
```

Interface Lexicon

Field Summary

Method Summary

Field Detail

UNKNOWN_WORD

BOUNDARY

BOUNDARY_TAG

Method Detail

isKnown

isKnown

tagSet

ruleIteratorByWord

ruleIteratorByWord

numRules

initializeTraining

train

train

train

train

train

train

incrementTreesRead

trainUnannotated

finishTraining

score

writeData

readData

getUnknownWordModel

setUnknownWordModel