Despite the differences between the two methods, the classification accuracy of feature sets selected with and MI does not seem to differ systematically. In most text classification problems, there are a few strong indicators and many weak indicators. As long as all strong indicators and a large number of weak indicators are selected, accuracy is expected to be good. Both methods do this.
Figure 13.8 compares MI and feature selection for the multinomial model. Peak effectiveness is virtually the same for both methods. reaches this peak later, at 300 features, probably because the rare, but highly significant features it selects initially do not cover all documents in the class. However, features selected later (in the range of 100-300) are of better quality than those selected by MI.
All three methods - MI, and frequency based - are greedy methods. They may select features that contribute no incremental information over previously selected features. In Figure 13.7 , kong is selected as the seventh term even though it is highly correlated with previously selected hong and therefore redundant. Although such redundancy can negatively impact accuracy, non-greedy methods (see Section 13.7 for references) are rarely used in text classification due to their computational cost.
Exercises.
Select two of these four terms based on (i) , (ii) mutual information, (iii) frequency .
term brazil 98,012 102 1835 51 council 96,322 133 3525 20 producers 98,524 119 1118 34 roasted 99,824 143 23 10