Another popular feature selection
method is .
In statistics, the test is
applied to test the independence of two events,
where two events A and B are defined to be
independent if
or, equivalently,
and
. In
feature selection, the two events are occurrence of the
term and occurrence of the class.
We then rank terms with respect to the following
quantity:
Worked example. We first compute for the data in Example 13.5.1:
(134) | |||
(135) |
We compute the other in the same way:
Plugging these values into
Equation 133, we get a value of 284:
(136) |
critical value | |
0.1 | 2.71 |
0.05 | 3.84 |
0.01 | 6.63 |
0.005 | 7.88 |
0.001 | 10.83 |
is a measure of how much expected counts and observed counts deviate from each other. A high value of indicates that the hypothesis of independence, which implies that expected and observed counts are similar, is incorrect. In our example, . Based on Table 13.6 , we can reject the hypothesis that poultry and export are independent with only a 0.001 chance of being wrong.^{}Equivalently, we say that the outcome is statistically significant at the 0.001 level. If the two events are dependent, then the occurrence of the term makes the occurrence of the class more likely (or less likely), so it should be helpful as a feature. This is the rationale of feature selection.
An arithmetically simpler way of computing is the
following: