We presented results in Section 13.6 showing that an SVM is a very effective text classifier. The results of Dumais et al. (1998) given in Table 13.9 show SVMs clearly performing the best. This was one of several pieces of work from this time that established the strong reputation of SVMs for text classification. Another pioneering work on scaling and evaluating SVMs for text classification was (Joachims, 1998). We present some of his results from (Joachims, 2002a) in Table 15.2 .Joachims used a large number of term features in contrast to Dumais et al. (1998), who used MI feature selection (Section 13.5.1 , page 13.5.1 ) to build classifiers with a much more limited number of features. The success of the linear SVM mirrors the results discussed in Section 14.6 (page ) on other linear approaches like Naive Bayes. It seems that working with simple term features can get one a long way. It is again noticeable the extent to which different papers' results for the same machine learning methods differ. In particular, based on replications by other researchers, the Naive Bayes results of (Joachims, 1998) appear too weak, and the results in Table 13.9 should be taken as representative.