Stores, trains, and scores with an unknown word model. A couple
of filters deterministically force rewrites for certain proper
nouns, dates, and cardinal and ordinal numbers; when none of these
filters are met, either the distribution of terminals with the same
first character is used, or Good-Turing smoothing is used. Although
this is developed for Chinese, the training and storage methods
could be used cross-linguistically.
This constructor creates an UWM with empty data structures. Only
use if loading in the data separately, such as by reading in text
lines containing the data.
TODO: would need to set useGT correctly if you saved a model with
useGT and then wanted to recover it from text.