Classification for classes that are not mutually exclusive is called any-of , multilabel , or multivalue classification . In this case, a document can belong to several classes simultaneously, or to a single class, or to none of the classes. A decision on one class leaves all options open for the others. It is sometimes said that the classes are independent of each other, but this is misleading since the classes are rarely statistically independent in the sense defined on page 13.5.2 . In terms of the formal definition of the classification problem in Equation 112 (page 112 ), we learn different classifiers in any-of classification, each returning either or : .
Solving an any-of classification task with linear classifiers is straightforward:
The second type of classification with more than two classes is one-of classification . Here, the classes are mutually exclusive. Each document must belong to exactly one of the classes. One-of classification is also called multinomial , polytomous , multiclass , or single-label classification . Formally, there is a single classification function in one-of classification whose range is , i.e., . kNN is a (nonlinear) one-of classifier.
True one-of problems are less common in text classification than any-of problems. With classes like UK, China, poultry, or coffee, a document can be relevant to many topics simultaneously - as when the prime minister of the UK visits China to talk about the coffee and poultry trade.
Nevertheless, we will often make a one-of assumption, as we did in Figure 14.1 , even if classes are not really mutually exclusive. For the classification problem of identifying the language of a document, the one-of assumption is a good approximation as most text is written in only one language. In such cases, imposing a one-of constraint can increase the classifier's effectiveness because errors that are due to the fact that the any-of classifiers assigned a document to either no class or more than one class are eliminated.
hyperplanes do not divide into distinct regions as illustrated in Figure 14.12 . Thus, we must use a combination method when using two-class linear classifiers for one-of classification. The simplest method is to rank classes and then select the top-ranked class. Geometrically, the ranking can be with respect to the distances from the linear separators. Documents close to a class's separator are more likely to be misclassified, so the greater the distance from the separator, the more plausible it is that a positive classification decision is correct. Alternatively, we can use a direct measure of confidence to rank classes, e.g., probability of class membership. We can state this algorithm for one-of classification with linear classifiers as follows:
assigned class | money-fx | trade | interest | wheat | corn | grain | |
true class | |||||||
money-fx | 95 | 0 | 10 | 0 | 0 | 0 | |
trade | 1 | 1 | 90 | 0 | 1 | 0 | |
interest | 13 | 0 | 0 | 0 | 0 | 0 | |
wheat | 0 | 0 | 1 | 34 | 3 | 7 | |
corn | 1 | 0 | 2 | 13 | 26 | 5 | |
grain | 0 | 0 | 2 | 14 | 5 | 10 |
An important tool for analyzing the performance of a classifier for classes is the confusion matrix . The confusion matrix shows for each pair of classes , how many documents from were incorrectly assigned to . In Table 14.5 , the classifier manages to distinguish the three financial classes money-fx, trade, and interest from the three agricultural classes wheat, corn, and grain, but makes many errors within these two groups. The confusion matrix can help pinpoint opportunities for improving the accuracy of the system. For example, to address the second largest error in Table 14.5 (14 in the row grain), one could attempt to introduce features that distinguish wheat documents from grain documents.
Exercises.