next up previous contents index
Next: Assessing chi-square as a Up: Feature selection Previous: Mutual information   Contents   Index

Chi-square feature selection

Another popular feature selection method is $\chi ^2$ . In statistics, the $\chi ^2$ test is applied to test the independence of two events, where two events A and B are defined to be independent if $P(AB) = P(A)P(B)$ or, equivalently, $P(A\vert B)=P(A)$ and $P(B\vert A)=P(B)$. In feature selection, the two events are occurrence of the term and occurrence of the class. We then rank terms with respect to the following quantity:

$\displaystyle X^2(\docsetlabeled,\tcword,c) =
\sum_{e_\tcword \in \{ 0,1 \} }
\frac{(\observationo_{e_\tcword e_c}-E_{e_\tcword e_c})^2}{E_{e_\tcword e_c}}$     (133)

where $e_\tcword$ and $e_c$ are defined as in Equation 130. $\observationo$ is the observed frequency in $\docsetlabeled$ and $E$ the expected frequency. For example, $E_{11}$ is the expected frequency of $\tcword$ and $c$ occurring together in a document assuming that term and class are independent.

Worked example. We first compute $E_{11}$ for the data in Example 13.5.1:

$\displaystyle E_{11}$ $\textstyle =$ $\displaystyle N\times P(\tcword) \times P(c) = N\times \frac{\observationo_{11}+\observationo_{10}}{N} \times
\frac{\observationo_{11}+\observationo_{01}}{N}$ (134)
  $\textstyle =$ $\displaystyle N \times \frac{49+141}{N}
\times \frac{49+27652}{N}\approx 6.6$ (135)

where $N$ is the total number of documents as before.

We compute the other $E_{e_\tcword e_c}$ in the same way:

  $e_{\class{poultry}}=1$ $e_{\class{poultry}}=0$
$e_{\term{export}} = 1$ $ \observationo_{11}=49$ $E_{11}\approx 6.6$ $\observationo_{01} =27{,}652
$ $E_{01}\approx 27{,}694.4$
$e_{\term{export}} = 0$ $\observationo_{10} = 141$ $E_{10}\approx 183.4$ $ \observationo_{00}=774{,}106$ $ E_{00}\approx 774{,}063.6$

Plugging these values into Equation 133, we get a $X^2$ value of 284:

$\displaystyle X^2(\docsetlabeled,\tcword,c) = \sum_{e_\tcword \in \{ 0,1 \} }
...servationo_{e_\tcword e_c}-E_{e_\tcword e_c})^2}{E_{e_\tcword e_c}}
\approx 284$     (136)

End worked example.

$p$ $\chi ^2$ critical value
0.1 2.71
0.05 3.84
0.01 6.63
0.005 7.88
0.001 10.83
Critical values of the $\chi ^2$ distribution with one degree of freedom.For example, if the two events are independent, then $\mbox{P}(X^2>6.63)<0.01$. So for $X^2>6.63$ the assumption of independence can be rejected with 99% confidence.

$X^2$ is a measure of how much expected counts $E$ and observed counts $\observationo$ deviate from each other. A high value of $X^2$ indicates that the hypothesis of independence, which implies that expected and observed counts are similar, is incorrect. In our example, $X^2 \approx 284 > 10.83$. Based on Table 13.6 , we can reject the hypothesis that poultry and export are independent with only a 0.001 chance of being wrong.[*]Equivalently, we say that the outcome $X^2 \approx 284 > 10.83$ is statistically significant at the 0.001 level. If the two events are dependent, then the occurrence of the term makes the occurrence of the class more likely (or less likely), so it should be helpful as a feature. This is the rationale of $\chi ^2$ feature selection.

An arithmetically simpler way of computing $X^2$ is the following:

X^2(\docsetlabeled,\tcword,c) = \frac{(\observationo_{11}+\o...
...ervationo_{00})\times (\observationo_{01}+\observationo_{00})}
\end{displaymath} (137)

This is equivalent to Equation 133 (Exercise 13.8 ).

next up previous contents index
Next: Assessing chi-square as a Up: Feature selection Previous: Mutual information   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.