Next: Latent semantic indexing Up: Matrix decompositions and latent Previous: Term-document matrices and singular Contents Index

Low-rank approximations

We next state a matrix approximation problem that at first seems to have little to do with information retrieval. We describe a solution to this matrix problem using singular-value decompositions, then develop its application to information retrieval.

Given an $\lsinoterms\times \lsinodocs$ matrix $\lsimatrix$ and a positive integer , we wish to find an $\lsinoterms\times \lsinodocs$ matrix $\lsimatrix_k$ of rank at most , so as to minimize the Frobenius norm of the matrix difference $X=\lsimatrix-\lsimatrix_k$ , defined to be

$\begin{displaymath} \Vert X \Vert _F = \sqrt{\sum_{i=1}^\lsinoterms \sum_{j=1}^\lsinodocs X_{ij}^2}. \end{displaymath}$

(238)

Thus, the Frobenius norm of

measures the discrepancy between $\lsimatrix_k$ and $\lsimatrix$ ; our goal is to find a matrix $\lsimatrix_k$ that minimizes this discrepancy, while constraining $\lsimatrix_k$ to have rank at most

. If

is the rank of $\lsimatrix$ , clearly $\lsimatrix_r=\lsimatrix$ and the Frobenius norm of the discrepancy is zero in this case. When

is far smaller than

, we refer to $\lsimatrix_k$ as a low-rank approximation .

The singular value decomposition can be used to solve the low-rank matrix approximation problem. We then derive from it an application to approximating term-document matrices. We invoke the following three-step procedure to this end:

Given $\lsimatrix$ , construct its SVD in the form shown in (232); thus, $\lsimatrix=U\Sigma V^T$ .
Derive from $\Sigma$ the matrix $\Sigma_k$ formed by replacing by zeros the smallest singular values on the diagonal of $\Sigma$ .
Compute and output $\lsimatrix_k=U\Sigma_k V^T$ as the rank- approximation to $\lsimatrix$ .

The rank of $\lsimatrix_k$ is at most

: this follows from the fact that $\Sigma_k$ has at most

non-zero values. Next, we recall the intuition of Example 18.1: the effect of small eigenvalues on matrix products is small. Thus, it seems plausible that replacing these small eigenvalues by zero will not substantially alter the product, leaving it ``close'' to $\lsimatrix$ . The following theorem due to Eckart and Young tells us that, in fact, this procedure yields the matrix of rank

with the lowest possible Frobenius error.

Theorem.

$\begin{displaymath} \min_{Z \vert \mbox{ rank}(Z)=k} \Vert\lsimatrix-Z\Vert _F = \Vert\lsimatrix-\lsimatrix_k\Vert _F = \sigma_{k+1}. \end{displaymath}$

(239)

End theorem.

Recalling that the singular values are in decreasing order $\sigma_1\geq \sigma_2 \geq \cdots$ , we learn from Theorem 18.3 that $\lsimatrix_k$ is the best rank- approximation to $\lsimatrix$ , incurring an error (measured by the Frobenius norm of $\lsimatrix-\lsimatrix_k$ ) equal to $\sigma_{k+1}$ . Thus the larger is, the smaller this error (and in particular, for , the error is zero since $\Sigma_r=\Sigma$ ; provided , then $\sigma_{r+1}=0$ and thus $\lsimatrix_r=\lsimatrix$ ).

$\begin{figure} % latex2html id marker 28855 \begin{picture}(600,100) \put(65,65)... ... entries affected by \lq\lq zeroing out'' the smallest singular values.} \end{figure}$

To derive further insight into why the process of truncating the smallest singular values in $\Sigma$ helps generate a rank- approximation of low error, we examine the form of $\lsimatrix_k$ :

$\displaystyle \lsimatrix_k$	$\textstyle =$	$\displaystyle U\Sigma_k V^T$	(240)
	$\textstyle =$	$\displaystyle U \left( \begin{array}{ccccc} \sigma_1 & 0 & 0 & 0 & 0 \\ 0 ... ... 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & \cdots \\ \end{array} \right) V^T$	(241)
	$\textstyle =$	$\displaystyle \sum_{i=1}^k \sigma_i \vec{u}_i \vec{v}_i^T,$	(242)

where $\vec{u_i}$ and $\vec{v_i}$ are the

th columns of

and

, respectively. Thus, $\vec{u}_i \vec{v}_i^T$ is a rank-1 matrix, so that we have just expressed $\lsimatrix_k$ as the sum of

rank-1 matrices each weighted by a singular value. As

increases, the contribution of the rank-1 matrix $\vec{u}_i \vec{v}_i^T$ is weighted by a sequence of shrinking singular values $\sigma_i$ .

Exercises.

Compute a rank 1 approximation to the matrix $\lsimatrix$ in Example 235, using the SVD as in Exercise 236. What is the Frobenius norm of the error of this approximation?
Consider now the computation in Exercise 18.3 . Following the schematic in Figure 18.2 , notice that for a rank 1 approximation we have $\sigma_1$ being a scalar. Denote by the first column of and by the first column of . Show that the rank-1 approximation to $\lsimatrix$ can then be written as $U_1\sigma_1V_1^T=\sigma_1U_1V_1^T$ .
reduced can be generalized to rank approximations: we let and denote the ``reduced'' matrices formed by retaining only the first columns of and , respectively. Thus is an $\lsinoterms\times k$ matrix while ${V'}_k^T$ is a $k\times \lsinodocs$ matrix. Then, we have

$\begin{displaymath} \lsimatrix_k=U'_k\Sigma'_k{V'}_k^T, \end{displaymath}$ (243)

where $\Sigma'_k$ is the square $k\times k$ submatrix of $\Sigma_k$ with the singular values $\sigma_1,\ldots,\sigma_k$ on the diagonal. The primary advantage of using (243) is to eliminate a lot of redundant columns of zeros in and , thereby explicitly eliminating multiplication by columns that do not affect the low-rank approximation; this version of the SVD is sometimes known as the reduced SVD or truncated SVD and is a computationally simpler representation from which to compute the low rank approximation.
For the matrix $\lsimatrix$ in Example 18.2, write down both $\Sigma_2$ and $\Sigma'_2$ .

Next: Latent semantic indexing Up: Matrix decompositions and latent Previous: Term-document matrices and singular Contents Index

© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07