where and are defined as in Equation 130. is the

**Worked example.** We first
compute for the
data in Example 13.5.1:

(134) | |||

(135) |

where is the total number of documents as before.

We compute the other in the same way:

Plugging these values into
Equation 133, we get a value of 284:

(136) |

is a measure of how much expected counts and observed
counts deviate from each other. A high value of
indicates that the hypothesis of independence, which implies
that expected and observed counts are similar, is
incorrect. In our example,
. Based
on Table 13.6 , we can reject the hypothesis that
poultry and export are independent with only
a 0.001 chance of being wrong.^{}Equivalently, we say that the outcome
is *statistically
significant* at the 0.001 level. If the two events are
dependent, then the occurrence of the term makes the occurrence
of the class more likely (or less likely), so it should be
helpful as a feature. This is the rationale of
feature selection.

critical value | |||

0.1 | 2.71 | ||

0.05 | 3.84 | ||

0.01 | 6.63 | ||

0.005 | 7.88 | ||

0.001 | 10.83 |

An arithmetically simpler way of computing
is the
following:

This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.

2009-04-07