edu.stanford.nlp.parser.lexparser
Class ChineseUnknownWordModel
java.lang.Object
edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
edu.stanford.nlp.parser.lexparser.ChineseUnknownWordModel
- All Implemented Interfaces:
- UnknownWordModel, Serializable
public class ChineseUnknownWordModel
- extends BaseUnknownWordModel
Stores, trains, and scores with an unknown word model. A couple
of filters deterministically force rewrites for certain proper
nouns, dates, and cardinal and ordinal numbers; when none of these
filters are met, either the distribution of terminals with the same
first character is used, or Good-Turing smoothing is used. Although
this is developed for Chinese, the training and storage methods
could be used cross-linguistically.
- Author:
- Roger Levy
- See Also:
- Serialized Form
Fields inherited from class edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel |
NULL_ITW, nullTag, nullWord, tagHash, tagIndex, trainOptions, unknown, unknownLevel, unSeenCounter, useFirst, useGT, VERBOSE, wordIndex |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ChineseUnknownWordModel
public ChineseUnknownWordModel(Options op,
Lexicon lex,
Index<String> wordIndex,
Index<String> tagIndex)
score
public float score(IntTaggedWord itw,
String word)
- Overrides:
score
in class BaseUnknownWordModel
train
public void train(Collection<Tree> trees)
- trains the first-character based unknown word model.
- Specified by:
train
in interface UnknownWordModel
- Overrides:
train
in class BaseUnknownWordModel
- Parameters:
trees
- the collection of trees to be trained over
main
public static void main(String[] args)
getSignature
public String getSignature(String word,
int loc)
- Description copied from class:
BaseUnknownWordModel
- Signature for a specific word; loc parameter is ignored.
- Specified by:
getSignature
in interface UnknownWordModel
- Overrides:
getSignature
in class BaseUnknownWordModel
- Parameters:
word
- The wordloc
- Its sentence position
- Returns:
- A "signature" (which represents an equivalence class of Strings), e.g., a suffix of the string
Stanford NLP Group