next up previous contents index
Next: Latent semantic indexing Up: Matrix decompositions and latent Previous: Term-document matrices and singular   Contents   Index


Low-rank approximations

We next state a matrix approximation problem that at first seems to have little to do with information retrieval. We describe a solution to this matrix problem using singular-value decompositions, then develop its application to information retrieval.

Given an $\lsinoterms\times \lsinodocs$ matrix $\lsimatrix$ and a positive integer $k$, we wish to find an $\lsinoterms\times \lsinodocs$ matrix $\lsimatrix_k$ of rank at most $k$, so as to minimize the Frobenius norm of the matrix difference $X=\lsimatrix-\lsimatrix_k$, defined to be

\begin{displaymath}
\Vert X \Vert _F = \sqrt{\sum_{i=1}^\lsinoterms \sum_{j=1}^\lsinodocs X_{ij}^2}.
\end{displaymath} (238)

Thus, the Frobenius norm of $X$ measures the discrepancy between $\lsimatrix_k$ and $\lsimatrix$; our goal is to find a matrix $\lsimatrix_k$ that minimizes this discrepancy, while constraining $\lsimatrix_k$ to have rank at most $k$. If $r$ is the rank of $\lsimatrix$, clearly $\lsimatrix_r=\lsimatrix$ and the Frobenius norm of the discrepancy is zero in this case. When $k$ is far smaller than $r$, we refer to $\lsimatrix_k$ as a low-rank approximation .

The singular value decomposition can be used to solve the low-rank matrix approximation problem. We then derive from it an application to approximating term-document matrices. We invoke the following three-step procedure to this end:

  1. Given $\lsimatrix$, construct its SVD in the form shown in (232); thus, $\lsimatrix=U\Sigma V^T$.
  2. Derive from $\Sigma$ the matrix $\Sigma_k$ formed by replacing by zeros the $r-k$ smallest singular values on the diagonal of $\Sigma$.
  3. Compute and output $\lsimatrix_k=U\Sigma_k V^T$ as the rank-$k$ approximation to $\lsimatrix$.
The rank of $\lsimatrix_k$ is at most $k$: this follows from the fact that $\Sigma_k$ has at most $k$ non-zero values. Next, we recall the intuition of Example 18.1: the effect of small eigenvalues on matrix products is small. Thus, it seems plausible that replacing these small eigenvalues by zero will not substantially alter the product, leaving it ``close'' to $\lsimatrix$. The following theorem due to Eckart and Young tells us that, in fact, this procedure yields the matrix of rank $k$ with the lowest possible Frobenius error.

Theorem.

\begin{displaymath}
\min_{Z \vert \mbox{ rank}(Z)=k} \Vert\lsimatrix-Z\Vert _F = \Vert\lsimatrix-\lsimatrix_k\Vert _F = \sigma_{k+1}.
\end{displaymath} (239)

End theorem.

Recalling that the singular values are in decreasing order $\sigma_1\geq \sigma_2 \geq \cdots$, we learn from Theorem 18.3 that $\lsimatrix_k$ is the best rank-$k$ approximation to $\lsimatrix$, incurring an error (measured by the Frobenius norm of $\lsimatrix-\lsimatrix_k$) equal to $\sigma_{k+1}$. Thus the larger $k$ is, the smaller this error (and in particular, for $k=r$, the error is zero since $\Sigma_r=\Sigma$; provided $r<M,N$, then $\sigma_{r+1}=0$ and thus $\lsimatrix_r=\lsimatrix$).

\begin{figure}
% latex2html id marker 28855
\begin{picture}(600,100)
\put(65,65)...
... entries affected by \lq\lq zeroing out'' the smallest singular values.}
\end{figure}

To derive further insight into why the process of truncating the smallest $r-k$ singular values in $\Sigma$ helps generate a rank-$k$ approximation of low error, we examine the form of $\lsimatrix_k$:

$\displaystyle \lsimatrix_k$ $\textstyle =$ $\displaystyle U\Sigma_k V^T$ (240)
  $\textstyle =$ $\displaystyle U \left(
\begin{array}{ccccc}
\sigma_1 & 0 & 0 & 0 & 0 \\
0 ...
...
0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & \cdots \\
\end{array} \right)
V^T$ (241)
  $\textstyle =$ $\displaystyle \sum_{i=1}^k \sigma_i \vec{u}_i \vec{v}_i^T,$ (242)

where $\vec{u_i}$ and $\vec{v_i}$ are the $i$th columns of $U$ and $V$, respectively. Thus, $\vec{u}_i \vec{v}_i^T$ is a rank-1 matrix, so that we have just expressed $\lsimatrix_k$ as the sum of $k$ rank-1 matrices each weighted by a singular value. As $i$ increases, the contribution of the rank-1 matrix $\vec{u}_i \vec{v}_i^T$ is weighted by a sequence of shrinking singular values $\sigma_i$.

Exercises.


next up previous contents index
Next: Latent semantic indexing Up: Matrix decompositions and latent Previous: Term-document matrices and singular   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07