Exercises

$\begin{figure} % latex2html id marker 20737 \psset{unit=0.4cm} \begin{pspicture}... ...2 \ \ 2)^T$, $\vec{b} = (4 \ \ 4)^T$, and $\vec{c} = (8 \ \ 6)^T$.} \end{figure}$

Exercises.

In Figure 14.13 , which of the three vectors $\vec{a}$ , $\vec{b}$ , and $\vec{c}$ is (i) most similar to $\vec{x}$ according to dot product similarity, (ii) most similar to $\vec{x}$ according to cosine similarity, (iii) closest to $\vec{x}$ according to Euclidean distance?

Download Reuters-21578 and train and test Rocchio and kNN classifiers for the classes acquisitions, corn, crude, earn, grain, interest, money-fx, ship, trade, and wheat. Use the ModApte split. You may want to use one of a number of software packages that implement Rocchio classification and kNN classification, for example, the Bow toolkit (McCallum, 1996).

Download 20 Newgroups (page 8.2 ) and train and test Rocchio and kNN classifiers for its 20 classes.

Show that the decision boundaries in Rocchio classification are, as in kNN, given by the Voronoi tessellation.

Computing the distance between a dense centroid and a sparse vector is $\Theta(M)$ for a naive implementation that iterates over all

dimensions. Based on the equality $\sum (x_i-\mu_i)^2 = 1.0+\sum \mu_i^2 - 2\sum x_i \mu_i$ and assuming that $\sum \mu_i^2$ has been precomputed, write down an algorithm that is $\Theta( M_{a})$ instead, where $M_{a}$ is the number of distinct terms in the test document.

Prove that the region of the plane consisting of all points with the same

nearest neighbors is a convex polygon.

Design an algorithm that performs an efficient 1NN search in 1 dimension (where efficiency is with respect to the number of documents

). What is the time complexity of the algorithm?

Design an algorithm that performs an efficient 1NN search in 2 dimensions with at most polynomial (in

) preprocessing time.

Can one design an exact efficient algorithm for 1NN for very large

along the ideas you used to solve the last exercise?

Show that Equation 145 defines a hyperplane with $\vec{w}= \vec{\mu}(c_1)-\vec{\mu}(c_2)$ and $b=0.5*(\vert\vec{\mu}(c_1) \vert^2-\vert\vec{\mu}(c_2) \vert^2)$ .

**Figure 14.14:** A simple non-separable set of points.

We can easily construct non-separable data sets in high dimensions by embedding a non-separable set like the one shown in Figure 14.14 . Consider embedding Figure 14.14 in 3D and then perturbing the 4 points slightly (i.e., moving them a small distance in a random direction). Why would you expect the resulting configuration to be linearly separable? How likely is then a non-separable set of $m \ll M$ points in

-dimensional space?

Assuming two classes, show that the percentage of non-separable assignments of the vertices of a hypercube decreases with dimensionality

for

. For example, for

the proportion of non-separable assignments is 0, for

, it is

. One of the two non-separable cases for

is shown in Figure 14.14 , the other is its mirror image. Solve the exercise either analytically or by simulation.

Although we point out the similarities of Naive Bayes with linear vector space classifiers, it does not make sense to represent count vectors (the document representations in NB) in a continuous vector space. There is however a formalization of NB that is analogous to Rocchio. Show that NB assigns a document to the class (represented as a parameter vector) whose Kullback-Leibler (KL) divergence (Section 12.4 , page 12.4 ) to the document (represented as a count vector as in Section 13.4.1 (page

), normalized to sum to 1) is smallest.

Support vector machines and

Up:

Vector space classification

References and further reading

© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07