ChineseCharacterBasedLexicon (Stanford CoreNLP API)

java.lang.Object
- edu.stanford.nlp.parser.lexparser.ChineseCharacterBasedLexicon

All Implemented Interfaces:

Lexicon, Serializable
```
public class ChineseCharacterBasedLexicon
extends Object
implements Lexicon
```
Author:

Galen Andrew

See Also:

Serialized Form

Field Summary
- Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon
  BOUNDARY, BOUNDARY_TAG, UNKNOWN_WORD

Constructor Summary

Constructors
Constructor and Description
`ChineseCharacterBasedLexicon(ChineseTreebankParserParams params, Index<String> wordIndex, Index<String> tagIndex)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`finishTraining()` Done collecting statistics for the lexicon.
`Distribution<String>`	`getPOSDistribution()`
`UnknownWordModel`	`getUnknownWordModel()`
`void`	`incrementTreesRead(double weight)` If training on a per-word basis instead of on a per-tree basis, we will want to increment the tree count as this happens.
`void`	`initializeTraining(double numTrees)` Start training this lexicon on the expected number of trees.
`static boolean`	`isForeign(String s)`
`boolean`	`isKnown(int word)` Checks whether a word is in the lexicon.
`boolean`	`isKnown(String word)` Checks whether a word is in the lexicon.
`int`	`numRules()` Returns the number of rules (tag rewrites as word) in the Lexicon.
`void`	`readData(BufferedReader in)` Read the lexicon from the BufferedReader in the format written by writeData.
`Iterator<IntTaggedWord>`	`ruleIteratorByWord(int word, int loc, String featureSpec)` Get an iterator over all rules (pairs of (word, POS)) for this word.
`Iterator<IntTaggedWord>`	`ruleIteratorByWord(String word, int loc, String featureSpec)` Same thing, but with a string that needs to be translated by the lexicon's word index
`String`	`sampleFrom()` Samples over words regardless of POS: first samples POS, then samples word according to that POS
`String`	`sampleFrom(String tag)` Samples from the distribution over words with this POS according to the lexicon.
`float`	`score(IntTaggedWord iTW, int loc, String word, String featureSpec)` Get the score of this word with this tag (as an IntTaggedWord) at this loc.
`void`	`setUnknownWordModel(UnknownWordModel uwm)`
`Set<String>`	`tagSet(java.util.function.Function<String,String> basicCategoryFunction)` Return the Set of tags used by this tagger (available after training the tagger).
`void`	`train(Collection<Tree> trees)` Train this lexicon on the given set of trees.
`void`	`train(Collection<Tree> trees, Collection<Tree> rawTrees)`
`void`	`train(Collection<Tree> trees, double weight)` Train this lexicon on the given set of trees.
`void`	`train(List<TaggedWord> sentence, double weight)` Not all subclasses support this particular method.
`void`	`train(TaggedWord tw, int loc, double weight)` Not all subclasses support this particular method.
`void`	`train(Tree tree, double weight)` TODO: make this method do something with the weight
`void`	`trainUnannotated(List<TaggedWord> sentence, double weight)` Sometimes we might have a sentence of tagged words which we would like to add to the lexicon, but they weren't part of a binarized, markovized, or otherwise annotated tree.
`void`	`writeData(Writer w)` Write the lexicon in human-readable format to the Writer.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ChineseCharacterBasedLexicon
```
public ChineseCharacterBasedLexicon(ChineseTreebankParserParams params,
                                    Index<String> wordIndex,
                                    Index<String> tagIndex)
```
- Method Detail
  - initializeTraining
```
public void initializeTraining(double numTrees)
```
    Description copied from interface: Lexicon
    
    Start training this lexicon on the expected number of trees. (Some UnknownWordModels use the number of trees to know when to start counting statistics.)
    
    Specified by:
    
    initializeTraining in interface Lexicon
  - train
```
public void train(Collection<Tree> trees)
```
    Train this lexicon on the given set of trees.
    
    Specified by:
    
    train in interface Lexicon
    
    Parameters:
    
    trees - Trees to train on
  - train
```
public void train(Collection<Tree> trees,
                  double weight)
```
    Train this lexicon on the given set of trees.
    
    Specified by:
    
    train in interface Lexicon
  - train
```
public void train(Tree tree,
                  double weight)
```
    TODO: make this method do something with the weight
    
    Specified by:
    
    train in interface Lexicon
  - trainUnannotated
```
public void trainUnannotated(List<TaggedWord> sentence,
                             double weight)
```
    Description copied from interface: Lexicon
    
    Sometimes we might have a sentence of tagged words which we would like to add to the lexicon, but they weren't part of a binarized, markovized, or otherwise annotated tree.
    
    Specified by:
    
    trainUnannotated in interface Lexicon
  - incrementTreesRead
```
public void incrementTreesRead(double weight)
```
    Description copied from interface: Lexicon
    
    If training on a per-word basis instead of on a per-tree basis, we will want to increment the tree count as this happens.
    
    Specified by:
    
    incrementTreesRead in interface Lexicon
  - train
```
public void train(TaggedWord tw,
                  int loc,
                  double weight)
```
    Description copied from interface: Lexicon
    
    Not all subclasses support this particular method. Those that don't will barf...
    
    Specified by:
    
    train in interface Lexicon
  - train
```
public void train(List<TaggedWord> sentence,
                  double weight)
```
    Description copied from interface: Lexicon
    
    Not all subclasses support this particular method. Those that don't will barf...
    
    Specified by:
    
    train in interface Lexicon
  - finishTraining
```
public void finishTraining()
```
    Description copied from interface: Lexicon
    
    Done collecting statistics for the lexicon.
    
    Specified by:
    
    finishTraining in interface Lexicon
  - getPOSDistribution
```
public Distribution<String> getPOSDistribution()
```
  - isForeign
```
public static boolean isForeign(String s)
```
  - score
```
public float score(IntTaggedWord iTW,
                   int loc,
                   String word,
                   String featureSpec)
```
    Description copied from interface: Lexicon
    
    Get the score of this word with this tag (as an IntTaggedWord) at this loc. (Presumably an estimate of P(word | tag).)
    
    Specified by:
    
    score in interface Lexicon
    
    Parameters:
    
    iTW - An IntTaggedWord pairing a word and POS tag
    
    loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial.
    
    word - The word itself; useful so we don't have to look it up in an index
    
    featureSpec - TODO
    
    Returns:
    
    A score, usually, log P(word|tag)
  - sampleFrom
```
public String sampleFrom(String tag)
```
    Samples from the distribution over words with this POS according to the lexicon.
    
    Parameters:
    
    tag - the POS of the word to sample
    
    Returns:
    
    a sampled word
  - sampleFrom
```
public String sampleFrom()
```
    Samples over words regardless of POS: first samples POS, then samples word according to that POS
    
    Returns:
    
    a sampled word
  - ruleIteratorByWord
```
public Iterator<IntTaggedWord> ruleIteratorByWord(int word,
                                                  int loc,
                                                  String featureSpec)
```
    Description copied from interface: Lexicon
    
    Get an iterator over all rules (pairs of (word, POS)) for this word.
    
    Specified by:
    
    ruleIteratorByWord in interface Lexicon
    
    Parameters:
    
    word - The word, represented as an integer in Index
    
    loc - The position of the word in the sentence (counting from 0). Implementation note: The BaseLexicon class doesn't actually make use of this position information.
    
    featureSpec - Additional word features like morphosyntactic information.
    
    Returns:
    
    An Iterator over a List ofIntTaggedWords, which pair the word with possible taggings as integer pairs. (Each can be thought of as a tag -> word rule.)
  ruleIteratorByWord public Iterator<IntTaggedWord> ruleIteratorByWord(String word, int loc, String featureSpec) Description copied from interface: Lexicon Same thing, but with a string that needs to be translated by the lexicon's word index Specified by: ruleIteratorByWord in interface Lexicon numRules public int numRules() Returns the number of rules (tag rewrites as word) in the Lexicon. This method isn't yet implemented in this class. It currently just returns 0, which may or may not be helpful. Specified by: numRules in interface Lexicon Returns: The number of rules (tag rewrites as word) in the Lexicon. readData public void readData(BufferedReader in) throws IOException Description copied from interface: Lexicon Read the lexicon from the BufferedReader in the format written by writeData. (An optional operation.) Specified by: readData in interface Lexicon Parameters: in - The BufferedReader to read from Throws: IOException - If any I/O problem writeData public void writeData(Writer w) throws IOException Description copied from interface: Lexicon Write the lexicon in human-readable format to the Writer. (An optional operation.) Specified by: writeData in interface Lexicon Parameters: w - The writer to output to Throws: IOException - If any I/O problem isKnown public boolean isKnown(int word) Description copied from interface: Lexicon Checks whether a word is in the lexicon. Specified by: isKnown in interface Lexicon Parameters: word - The word as an int Returns: Whether the word is in the lexicon isKnown public boolean isKnown(String word) Description copied from interface: Lexicon Checks whether a word is in the lexicon. Specified by: isKnown in interface Lexicon Parameters: word - The word as a String Returns: Whether the word is in the lexicon tagSet public Set<String> tagSet(java.util.function.Function<String,String> basicCategoryFunction) Return the Set of tags used by this tagger (available after training the tagger). Specified by: tagSet in interface Lexicon Returns: The Set of tags used by this tagger getUnknownWordModel public UnknownWordModel getUnknownWordModel() Specified by: getUnknownWordModel in interface Lexicon setUnknownWordModel public void setUnknownWordModel(UnknownWordModel uwm) Specified by: setUnknownWordModel in interface Lexicon train public void train(Collection<Tree> trees, Collection<Tree> rawTrees) Specified by: train in interface Lexicon

Class ChineseCharacterBasedLexicon

Field Summary

Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

ChineseCharacterBasedLexicon

Method Detail

initializeTraining

train

train

train

trainUnannotated

incrementTreesRead

train

train

finishTraining

getPOSDistribution

isForeign

score

sampleFrom

sampleFrom

ruleIteratorByWord

ruleIteratorByWord

numRules

readData

writeData

isKnown

isKnown

tagSet

getUnknownWordModel

setUnknownWordModel

train