Package mark.nlp.features

Interface Summary
Orderer Orders a set of potential features.
 

Class Summary
BagCorpusCounter A corpus counter implemented as a collection of bags.
ChiSquared A SimpleOrderer that orders according to Pearson's Chi-Squared test.
ClassificationOrderer An Orderer that orders according to usefullness during classification.
CorpusCounter Assume a sampling process, where each sample determines two discrete random variables, C and W.
Method0 A TwoLevelOrderer that combines the first level score with the second level scores according to the following formula: firstLevelScore / Sum_c_in_categories (seconedLevelScore_c) This orderer ignores its parameter.
Method3 A TwoLevelOrderer that combines the first level score with the second level scores according to the following formula: <> This orderer uses its double parameter in its calculation.
MutualInformation A SimpleOrderer that orders according to the following formula: score (w) = sum_c_in_C_and_w_in_W [p(c,w) * log [p(c,w) / p(c) / p(w)]] where C is the set of categories and V is {^w,w}.
NegativeMutualInformation A SimpleOrderer that orders according to the following formula: score (w) = -1 * sum_c_in_C_and_w_in_W [p(c,w) * log [p(c,w) / p(c) / p(w)]] where C is the set of categories and V is {w,^w}.
None An Orderer that does not change the order of the potential features.
PointwiseChiSquared A SimpleOrderer that orders according to according to the maximum of the constiuent addends in Pearson's Chi-Squared test.
PointwiseMutualInformation A SimpleOrderer that orders according to the following formula: score (w) = max_c_in_C_and_w_in_W [log [p(c,w)/p(c)/p(w)]] where C is the set of categories and W is {^w, w}.
Random A SimpleOrderer that orders randomly.
Reducer Provides a routine that eliminates from a set of potential features those potential features that occur too seldom or too often.
Selector Provides a routine that, given a ordered set of potential features, selects the best.
SimpleOrderer A ClassificationOrderer that orders from the corpus counter alone.
TwoLevelOrderer A ClassificationOrderer that orders from the corpus counter and the category counters.
TwoLevelScore.ReverseScore1 The following Comparator orders TwoLevelScores in reverse order of their first level scores (not their combined scores as the comparator in DoubleWrap would).
Util Provides some utility routines.