Stores, trains, and scores with an unknown word model. A couple
of filters deterministically force rewrites for certain proper
nouns, dates, and cardinal and ordinal numbers; when none of these
filters are met, either the distribution of terminals with the same
first character is used, or Good-Turing smoothing is used. Although
this is developed for Chinese, the training and storage methods
could be used cross-linguistically.
public ChineseUnknownWordModel(Options op,
This constructor creates an UWM with empty data structures. Only
use if loading in the data separately, such as by reading in text
lines containing the data.
TODO: would need to set useGT correctly if you saved a model with
useGT and then wanted to recover it from text.