Next: Latent semantic indexing Up: Matrix decompositions and latent Previous: Term-document matrices and singular   Contents   Index

# Low-rank approximations

We next state a matrix approximation problem that at first seems to have little to do with information retrieval. We describe a solution to this matrix problem using singular-value decompositions, then develop its application to information retrieval.

Given an matrix and a positive integer , we wish to find an matrix of rank at most , so as to minimize the Frobenius norm of the matrix difference , defined to be

 (238)

Thus, the Frobenius norm of measures the discrepancy between and ; our goal is to find a matrix that minimizes this discrepancy, while constraining to have rank at most . If is the rank of , clearly and the Frobenius norm of the discrepancy is zero in this case. When is far smaller than , we refer to as a low-rank approximation .

The singular value decomposition can be used to solve the low-rank matrix approximation problem. We then derive from it an application to approximating term-document matrices. We invoke the following three-step procedure to this end:

1. Given , construct its SVD in the form shown in (232); thus, .
2. Derive from the matrix formed by replacing by zeros the smallest singular values on the diagonal of .
3. Compute and output as the rank- approximation to .
The rank of is at most : this follows from the fact that has at most non-zero values. Next, we recall the intuition of Example 18.1: the effect of small eigenvalues on matrix products is small. Thus, it seems plausible that replacing these small eigenvalues by zero will not substantially alter the product, leaving it close'' to . The following theorem due to Eckart and Young tells us that, in fact, this procedure yields the matrix of rank with the lowest possible Frobenius error.

Theorem.

 (239)

End theorem.

Recalling that the singular values are in decreasing order , we learn from Theorem 18.3 that is the best rank- approximation to , incurring an error (measured by the Frobenius norm of ) equal to . Thus the larger is, the smaller this error (and in particular, for , the error is zero since ; provided , then and thus ).

To derive further insight into why the process of truncating the smallest singular values in helps generate a rank- approximation of low error, we examine the form of :

 (240) (241) (242)

where and are the th columns of and , respectively. Thus, is a rank-1 matrix, so that we have just expressed as the sum of rank-1 matrices each weighted by a singular value. As increases, the contribution of the rank-1 matrix is weighted by a sequence of shrinking singular values .

Exercises.

• Compute a rank 1 approximation to the matrix in Example 235, using the SVD as in Exercise 236. What is the Frobenius norm of the error of this approximation?

• Consider now the computation in Exercise 18.3 . Following the schematic in Figure 18.2 , notice that for a rank 1 approximation we have being a scalar. Denote by the first column of and by the first column of . Show that the rank-1 approximation to can then be written as .

• reduced can be generalized to rank approximations: we let and denote the reduced'' matrices formed by retaining only the first columns of and , respectively. Thus is an matrix while is a matrix. Then, we have
 (243)

where is the square submatrix of with the singular values on the diagonal. The primary advantage of using (243) is to eliminate a lot of redundant columns of zeros in and , thereby explicitly eliminating multiplication by columns that do not affect the low-rank approximation; this version of the SVD is sometimes known as the reduced SVD or truncated SVD and is a computationally simpler representation from which to compute the low rank approximation.

For the matrix in Example 18.2, write down both and .

Next: Latent semantic indexing Up: Matrix decompositions and latent Previous: Term-document matrices and singular   Contents   Index