ChineseMaxentLexicon (Stanford JavaNLP API)

java.lang.Object
- edu.stanford.nlp.parser.lexparser.ChineseMaxentLexicon

All Implemented Interfaces:

Lexicon, java.io.Serializable
```
public class ChineseMaxentLexicon
extends java.lang.Object
implements Lexicon
```
A Lexicon class that computes the score of word|tag according to a maxent model of tag|word (divided by MLE estimate of P(tag)).
It would be nice to factor out a superclass MaxentLexicon that takes a WordFeatureExtractor

Author:

Galen Andrew

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type Field and Description

static boolean fixUnkFunctionWords

static boolean seenTagsOnly

CollectionValuedMap<java.lang.String,java.lang.String> tagsForWord
- Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon
  BOUNDARY, BOUNDARY_TAG, UNKNOWN_WORD

Fields
Modifier and Type	Field and Description
`static boolean`	`fixUnkFunctionWords`
`static boolean`	`seenTagsOnly`
`CollectionValuedMap<java.lang.String,java.lang.String>`	`tagsForWord`

Constructor Summary

Constructors
Constructor and Description
`ChineseMaxentLexicon(Options op, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex, int featureLevel)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`finishTraining()` Done collecting statistics for the lexicon.
`UnknownWordModel`	`getUnknownWordModel()`
`void`	`incrementTreesRead(double weight)` If training on a per-word basis instead of on a per-tree basis, we will want to increment the tree count as this happens.
`void`	`initializeTraining(double numTrees)` Start training this lexicon on the expected number of trees.
`boolean`	`isKnown(int word)` Checks whether a word is in the lexicon.
`boolean`	`isKnown(java.lang.String word)` Checks whether a word is in the lexicon.
`static void`	`main(java.lang.String[] args)`
`int`	`numRules()` Returns the number of rules (tag rewrites as word) in the Lexicon.
`void`	`readData(java.io.BufferedReader in)` Read the lexicon from the BufferedReader in the format written by writeData.
`java.util.Iterator<IntTaggedWord>`	`ruleIteratorByWord(int word, int loc, java.lang.String featureSpec)` Get an iterator over all rules (pairs of (word, POS)) for this word.
`java.util.Iterator<IntTaggedWord>`	`ruleIteratorByWord(java.lang.String word, int loc, java.lang.String featureSpec)` Same thing, but with a string that needs to be translated by the lexicon's word index
`float`	`score(IntTaggedWord iTW, int loc, java.lang.String word, java.lang.String featureSpec)` Get the score of this word with this tag (as an IntTaggedWord) at this loc.
`void`	`setUnknownWordModel(UnknownWordModel uwm)`
`java.util.Set<java.lang.String>`	`tagSet(java.util.function.Function<java.lang.String,java.lang.String> basicCategoryFunction)` Return the Set of tags used by this tagger (available after training the tagger).
`void`	`train(java.util.Collection<Tree> trees)` Add the given collection of trees to the statistics counted.
`void`	`train(java.util.Collection<Tree> trees, java.util.Collection<Tree> rawTrees)`
`void`	`train(java.util.Collection<Tree> trees, double weight)` Add the given collection of trees to the statistics counted.
`void`	`train(java.util.List<TaggedWord> sentence, double weight)` Add the given sentence to the statistics counted.
`void`	`train(TaggedWord tw, int loc, double weight)` Not all subclasses support this particular method.
`void`	`train(Tree tree, double weight)` Add the given tree to the statistics counted.
`void`	`trainUnannotated(java.util.List<TaggedWord> sentence, double weight)` Sometimes we might have a sentence of tagged words which we would like to add to the lexicon, but they weren't part of a binarized, markovized, or otherwise annotated tree.
`void`	`writeData(java.io.Writer w)` Write the lexicon in human-readable format to the Writer.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - seenTagsOnly
```
public static final boolean seenTagsOnly
```
    See Also:
    
    Constant Field Values
  - fixUnkFunctionWords
```
public static final boolean fixUnkFunctionWords
```
    See Also:
    
    Constant Field Values
  - tagsForWord
```
public CollectionValuedMap<java.lang.String,java.lang.String> tagsForWord
```
- Constructor Detail
  - ChineseMaxentLexicon
```
public ChineseMaxentLexicon(Options op,
                            Index<java.lang.String> wordIndex,
                            Index<java.lang.String> tagIndex,
                            int featureLevel)
```
- Method Detail
  - isKnown
```
public boolean isKnown(int word)
```
    Description copied from interface: Lexicon
    
    Checks whether a word is in the lexicon.
    
    Specified by:
    
    isKnown in interface Lexicon
    
    Parameters:
    
    word - The word as an int
    
    Returns:
    
    Whether the word is in the lexicon
  - isKnown
```
public boolean isKnown(java.lang.String word)
```
    Description copied from interface: Lexicon
    
    Checks whether a word is in the lexicon.
    
    Specified by:
    
    isKnown in interface Lexicon
    
    Parameters:
    
    word - The word as a String
    
    Returns:
    
    Whether the word is in the lexicon
  - tagSet
```
public java.util.Set<java.lang.String> tagSet(java.util.function.Function<java.lang.String,java.lang.String> basicCategoryFunction)
```
    Return the Set of tags used by this tagger (available after training the tagger).
    
    Specified by:
    
    tagSet in interface Lexicon
    
    Returns:
    
    The Set of tags used by this tagger
  - ruleIteratorByWord
```
public java.util.Iterator<IntTaggedWord> ruleIteratorByWord(int word,
                                                            int loc,
                                                            java.lang.String featureSpec)
```
    Description copied from interface: Lexicon
    
    Get an iterator over all rules (pairs of (word, POS)) for this word.
    
    Specified by:
    
    ruleIteratorByWord in interface Lexicon
    
    Parameters:
    
    word - The word, represented as an integer in Index
    
    loc - The position of the word in the sentence (counting from 0). Implementation note: The BaseLexicon class doesn't actually make use of this position information.
    
    featureSpec - Additional word features like morphosyntactic information.
    
    Returns:
    
    An Iterator over a List ofIntTaggedWords, which pair the word with possible taggings as integer pairs. (Each can be thought of as a tag -> word rule.)
  - ruleIteratorByWord
```
public java.util.Iterator<IntTaggedWord> ruleIteratorByWord(java.lang.String word,
                                                            int loc,
                                                            java.lang.String featureSpec)
```
    Description copied from interface: Lexicon
    
    Same thing, but with a string that needs to be translated by the lexicon's word index
    
    Specified by:
    
    ruleIteratorByWord in interface Lexicon
  - numRules
```
public int numRules()
```
    Returns the number of rules (tag rewrites as word) in the Lexicon. This method isn't yet implemented in this class. It currently just returns 0, which may or may not be helpful.
    
    Specified by:
    
    numRules in interface Lexicon
    
    Returns:
    
    The number of rules (tag rewrites as word) in the Lexicon.
  - initializeTraining
```
public void initializeTraining(double numTrees)
```
    Description copied from interface: Lexicon
    
    Start training this lexicon on the expected number of trees. (Some UnknownWordModels use the number of trees to know when to start counting statistics.)
    
    Specified by:
    
    initializeTraining in interface Lexicon
  - train
```
public final void train(java.util.Collection<Tree> trees)
```
    Add the given collection of trees to the statistics counted. Can be called multiple times with different trees.
    
    Specified by:
    
    train in interface Lexicon
    
    Parameters:
    
    trees - Trees to train on
  - train
```
public void train(java.util.Collection<Tree> trees,
                  double weight)
```
    Add the given collection of trees to the statistics counted. Can be called multiple times with different trees.
    
    Specified by:
    
    train in interface Lexicon
  - train
```
public void train(Tree tree,
                  double weight)
```
    Add the given tree to the statistics counted. Can be called multiple times with different trees.
    
    Specified by:
    
    train in interface Lexicon
  - train
```
public void train(java.util.List<TaggedWord> sentence,
                  double weight)
```
    Add the given sentence to the statistics counted. Can be called multiple times with different sentences.
    
    Specified by:
    
    train in interface Lexicon
  - trainUnannotated
```
public void trainUnannotated(java.util.List<TaggedWord> sentence,
                             double weight)
```
    Description copied from interface: Lexicon
    
    Sometimes we might have a sentence of tagged words which we would like to add to the lexicon, but they weren't part of a binarized, markovized, or otherwise annotated tree.
    
    Specified by:
    
    trainUnannotated in interface Lexicon
  - incrementTreesRead
```
public void incrementTreesRead(double weight)
```
    Description copied from interface: Lexicon
    
    If training on a per-word basis instead of on a per-tree basis, we will want to increment the tree count as this happens.
    
    Specified by:
    
    incrementTreesRead in interface Lexicon
  - train
```
public void train(TaggedWord tw,
                  int loc,
                  double weight)
```
    Description copied from interface: Lexicon
    
    Not all subclasses support this particular method. Those that don't will barf...
    
    Specified by:
    
    train in interface Lexicon
  - finishTraining
```
public void finishTraining()
```
    Description copied from interface: Lexicon
    
    Done collecting statistics for the lexicon.
    
    Specified by:
    
    finishTraining in interface Lexicon
  - main
```
public static void main(java.lang.String[] args)
```
  - score
```
public float score(IntTaggedWord iTW,
                   int loc,
                   java.lang.String word,
                   java.lang.String featureSpec)
```
    Description copied from interface: Lexicon
    
    Get the score of this word with this tag (as an IntTaggedWord) at this loc. (Presumably an estimate of P(word | tag).)
    
    Specified by:
    
    score in interface Lexicon
    
    Parameters:
    
    iTW - An IntTaggedWord pairing a word and POS tag
    
    loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial.
    
    word - The word itself; useful so we don't have to look it up in an index
    
    featureSpec - TODO
    
    Returns:
    
    A score, usually, log P(word|tag)
  - writeData
```
public void writeData(java.io.Writer w)
               throws java.io.IOException
```
    Description copied from interface: Lexicon
    
    Write the lexicon in human-readable format to the Writer. (An optional operation.)
    
    Specified by:
    
    writeData in interface Lexicon
    
    Parameters:
    
    w - The writer to output to
    
    Throws:
    
    java.io.IOException - If any I/O problem
  - readData
```
public void readData(java.io.BufferedReader in)
              throws java.io.IOException
```
    Description copied from interface: Lexicon
    
    Read the lexicon from the BufferedReader in the format written by writeData. (An optional operation.)
    
    Specified by:
    
    readData in interface Lexicon
    
    Parameters:
    
    in - The BufferedReader to read from
    
    Throws:
    
    java.io.IOException - If any I/O problem
  - getUnknownWordModel
```
public UnknownWordModel getUnknownWordModel()
```
    Specified by:
    
    getUnknownWordModel in interface Lexicon
  - setUnknownWordModel
```
public void setUnknownWordModel(UnknownWordModel uwm)
```
    Specified by:
    
    setUnknownWordModel in interface Lexicon
  - train
```
public void train(java.util.Collection<Tree> trees,
                  java.util.Collection<Tree> rawTrees)
```
    Specified by:
    
    train in interface Lexicon

Class ChineseMaxentLexicon

Field Summary

Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

seenTagsOnly

fixUnkFunctionWords

tagsForWord

Constructor Detail

ChineseMaxentLexicon

Method Detail

isKnown

isKnown

tagSet

ruleIteratorByWord

ruleIteratorByWord

numRules

initializeTraining

train

train

train

train

trainUnannotated

incrementTreesRead

train

finishTraining

main

score

writeData

readData

getUnknownWordModel

setUnknownWordModel

train