Champion lists

Next: Static quality scores and Up: Efficient scoring and ranking Previous: Index elimination Contents Index

Champion lists

The idea of champion lists (sometimes also called fancy lists or top docs) is to precompute, for each term

in the dictionary, the set of the

documents with the highest weights for

; the value of

is chosen in advance. For tf-idf weighting, these would be the

documents with the highest tf values for term

. We call this set of

documents the champion list for term

Now, given a query we create a set as follows: we take the union of the champion lists for each of the terms comprising . We now restrict cosine computation to only the documents in . A critical parameter in this scheme is the value , which is highly application dependent. Intuitively, should be large compared with , especially if we use any form of the index elimination described in Section 7.1.2 . One issue here is that the value is set at the time of index construction, whereas is application dependent and may not be available until the query is received; as a result we may (as in the case of index elimination) find ourselves with a set that has fewer than documents. There is no reason to have the same value of for all terms in the dictionary; it could for instance be set to be higher for rarer terms.

Next: Static quality scores and Up: Efficient scoring and ranking Previous: Index elimination Contents Index

© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07