SVMs are inherently two-class
classifiers. The traditional way to do
multiclass classification with SVMs is to use one of the methods discussed in
Section 14.5 (page 14.5 ).
In particular, the most common technique in practice has been to build
one-versus-rest classifiers (commonly referred to as
``one-versus-all'' or OVA classification), and to choose the class
which classifies the test datum with greatest margin. Another
strategy is to build a set of one-versus-one classifiers, and to
choose the class that is selected by the most classifiers. While this
involves building
classifiers, the time for
training classifiers may actually decrease, since the
training data set for each classifier is much smaller.
However, these are not very elegant approaches to solving multiclass
problems.
A better alternative is provided by the construction of multiclass SVMs,
where we build a two-class classifier over a feature vector
derived
from the pair consisting of the input features and the class of the datum.
At test time, the classifier chooses the class
. The margin
during training is the gap between this value for the
correct class and for the nearest other class, and so the quadratic
program formulation will require that
.
This general method can be extended to give a multiclass formulation of
various kinds of linear classifiers. It is also a simple instance
of a generalization of classification where the classes are not just
a set of independent, categorical labels, but may be arbitrary
structured objects with relationships defined between them. In the
SVM world, such work comes under the label of structural SVMs .
We mention them again in Section 15.4.2 .