edu.stanford.nlp.parser.lexparser
Class ChineseUnknownWordModel

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
      extended by edu.stanford.nlp.parser.lexparser.ChineseUnknownWordModel
All Implemented Interfaces:
UnknownWordModel, Serializable

public class ChineseUnknownWordModel
extends BaseUnknownWordModel

Stores, trains, and scores with an unknown word model. A couple of filters deterministically force rewrites for certain proper nouns, dates, and cardinal and ordinal numbers; when none of these filters are met, either the distribution of terminals with the same first character is used, or Good-Turing smoothing is used. Although this is developed for Chinese, the training and storage methods could be used cross-linguistically.

Author:
Roger Levy
See Also:
Serialized Form

Field Summary
 
Fields inherited from class edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
nullTag, nullWord, tagHash, unknown, unknownLevel, unSeenCounter, useFirst, useGT, VERBOSE
 
Constructor Summary
ChineseUnknownWordModel(Options.LexOptions op, Lexicon lex)
           
 
Method Summary
 String getSignature(String word, int loc)
          Signature for a specific word; loc parameter is ignored.
static void main(String[] args)
           
 float score(IntTaggedWord itw)
           
 void train(Collection<Tree> trees)
          trains the first-character based unknown word model.
 
Methods inherited from class edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
addTagging, getLexicon, getSignatureIndex, getUnknownLevel, score, scoreGT, setUnknownLevel, trainUnknownGT, unSeenCounter
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ChineseUnknownWordModel

public ChineseUnknownWordModel(Options.LexOptions op,
                               Lexicon lex)
Method Detail

score

public float score(IntTaggedWord itw)
Overrides:
score in class BaseUnknownWordModel

train

public void train(Collection<Tree> trees)
trains the first-character based unknown word model.

Specified by:
train in interface UnknownWordModel
Overrides:
train in class BaseUnknownWordModel
Parameters:
trees - the collection of trees to be trained over

main

public static void main(String[] args)

getSignature

public String getSignature(String word,
                           int loc)
Description copied from class: BaseUnknownWordModel
Signature for a specific word; loc parameter is ignored.

Specified by:
getSignature in interface UnknownWordModel
Overrides:
getSignature in class BaseUnknownWordModel
Parameters:
word - The word
loc - Its sentence position
Returns:
A "signature" (which represents an equivalence class of Strings), e.g., a suffix of the string


Stanford NLP Group