Unlike Rocchio, nearest neighbor or kNN classification determines the decision boundary locally. For 1NN we assign each document to the class of its closest neighbor. For kNN we assign each document to the majority class of its closest neighbors where is a parameter. The rationale of kNN classification is that, based on the contiguity hypothesis, we expect a test document to have the same label as the training documents located in the local region surrounding .
Decision boundaries in 1NN are concatenated segments of the Voronoi tessellation as shown in Figure 14.6 . The Voronoi tessellation of a set of objects decomposes space into Voronoi cells, where each object's cell consists of all points that are closer to the object than to other objects. In our case, the objects are documents. The Voronoi tessellation then partitions the plane into convex polygons, each containing its corresponding document (and no other) as shown in Figure 14.6 , where a convex polygon is a convex region in 2-dimensional space bounded by lines.
For general in kNN, consider the region in the space for which the set of nearest neighbors is the same. This again is a convex polygon and the space is partitioned into convex polygons , within each of which the set of nearest neighbors is invariant (Exercise 14.8 ).
1NN is not very robust. The classification decision of each test document relies on the class of a single training document, which may be incorrectly labeled or atypical. kNN for is more robust. It assigns documents to the majority class of their closest neighbors, with ties broken randomly.
There is a probabilistic version of this kNN classification algorithm. We can estimate the probability of membership in class as the proportion of the nearest neighbors in . Figure 14.6 gives an example for . Probability estimates for class membership of the star are , , and . The 3nn estimate ( ) and the 1nn estimate ( ) differ with 3nn preferring the X class and 1nn preferring the circle class .
The parameter in kNN is often chosen based on experience or knowledge about the classification problem at hand. It is desirable for to be odd to make ties less likely. and are common choices, but much larger values between 50 and 100 are also used. An alternative way of setting the parameter is to select the that gives best results on a held-out portion of the training set.
We can also weight the ``votes'' of the nearest
neighbors by their cosine similarity. In this scheme, a class's
score is computed as:
(143) |
Figure 14.7 summarizes the kNN algorithm.
Worked example. The distances of the test document from the four training documents in Table 14.1 are and . 's nearest neighbor is therefore and 1NN assigns to 's class, . End worked example.