next up previous contents index
Next: Comparison of feature selection Up: Feature selection Previous: Frequency-based feature selection   Contents   Index


Feature selection for multiple classifiers

In an operational system with a large number of classifiers, it is desirable to select a single set of features instead of a different one for each classifier. One way of doing this is to compute the $X^2$ statistic for an $n \times 2$ table where the columns are occurrence and nonoccurrence of the term and each row corresponds to one of the classes. We can then select the $\ktopk$ terms with the highest $X^2$ statistic as before.

More commonly, feature selection statistics are first computed separately for each class on the two-class classification task $c$ versus $\overline{c}$ and then combined. One combination method computes a single figure of merit for each feature, for example, by averaging the values $A(\tcword,c)$ for feature $\tcword$, and then selects the $\ktopk$ features with highest figures of merit. Another frequently used combination method selects the top $\ktopk / n$ features for each of $n$ classifiers and then combines these $n$ sets into one global feature set.

Classification accuracy often decreases when selecting $\ktopk$ common features for a system with $n$ classifiers as opposed to $n$ different sets of size $\ktopk$. But even if it does, the gain in efficiency owing to a common document representation may be worth the loss in accuracy .


next up previous contents index
Next: Comparison of feature selection Up: Feature selection Previous: Frequency-based feature selection   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07