next up previous contents index
Next: Static quality scores and Up: Efficient scoring and ranking Previous: Index elimination   Contents   Index


Champion lists

The idea of champion lists (sometimes also called fancy lists or top docs) is to precompute, for each term $t$ in the dictionary, the set of the $r$ documents with the highest weights for $t$; the value of $r$ is chosen in advance. For tf-idf weighting, these would be the $r$ documents with the highest tf values for term $t$. We call this set of $r$ documents the champion list for term $t$.

Now, given a query $q$ we create a set $A$ as follows: we take the union of the champion lists for each of the terms comprising $q$. We now restrict cosine computation to only the documents in $A$. A critical parameter in this scheme is the value $r$, which is highly application dependent. Intuitively, $r$ should be large compared with $K$, especially if we use any form of the index elimination described in Section 7.1.2 . One issue here is that the value $r$ is set at the time of index construction, whereas $K$ is application dependent and may not be available until the query is received; as a result we may (as in the case of index elimination) find ourselves with a set $A$ that has fewer than $K$ documents. There is no reason to have the same value of $r$ for all terms in the dictionary; it could for instance be set to be higher for rarer terms.


next up previous contents index
Next: Static quality scores and Up: Efficient scoring and ranking Previous: Index elimination   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07