Roc- | Dec. | linear SVM | rbf-SVM | ||||
NB | chio | Trees | kNN | ![]() |
![]() |
![]() |
|
earn | 96.0 | 96.1 | 96.1 | 97.8 | 98.0 | 98.2 | 98.1 |
acq | 90.7 | 92.1 | 85.3 | 91.8 | 95.5 | 95.6 | 94.7 |
money-fx | 59.6 | 67.6 | 69.4 | 75.4 | 78.8 | 78.5 | 74.3 |
grain | 69.8 | 79.5 | 89.1 | 82.6 | 91.9 | 93.1 | 93.4 |
crude | 81.2 | 81.5 | 75.5 | 85.8 | 89.4 | 89.4 | 88.7 |
trade | 52.2 | 77.4 | 59.2 | 77.9 | 79.2 | 79.2 | 76.6 |
interest | 57.6 | 72.5 | 49.1 | 76.7 | 75.6 | 74.8 | 69.1 |
ship | 80.9 | 83.1 | 80.9 | 79.8 | 87.4 | 86.5 | 85.8 |
wheat | 63.4 | 79.4 | 85.5 | 72.9 | 86.6 | 86.8 | 82.4 |
corn | 45.2 | 62.2 | 87.7 | 71.4 | 87.5 | 87.8 | 84.6 |
microavg. | 72.3 | 79.9 | 79.4 | 82.6 | 86.7 | 87.5 | 86.4 |
We presented results in Section 13.6 showing that
an SVM is a very effective text classifier.
The results of Dumais et al. (1998) given in Table 13.9
show SVMs clearly performing the best.
This was one of several pieces of
work from this time that established the strong reputation of SVMs for text
classification. Another pioneering work on scaling and evaluating SVMs for
text classification was (Joachims, 1998). We present some of his results from (Joachims, 2002a) in Table 15.2 .Joachims used a large number of term
features
in contrast to Dumais et al. (1998),
who used MI feature selection (Section 13.5.1 ,
page 13.5.1 )
to build classifiers with a much more limited number of features.
The success of the linear SVM mirrors the results discussed
in Section 14.6 (page
)
on other linear approaches like Naive Bayes. It seems that
working with simple term features can get one a long way.
It is again noticeable the extent to which different papers' results for
the same machine learning methods differ.
In particular, based on replications by other researchers, the Naive Bayes
results of (Joachims, 1998) appear too weak, and
the results in Table 13.9 should be taken as representative.