next up previous contents index
Next: Cluster pruning Up: Efficient scoring and ranking Previous: Static quality scores and   Contents   Index


Impact ordering

In all the postings lists described thus far, we order the documents consistently by some common ordering: typically by document ID but in Section 7.1.4 by static quality scores. As noted at the end of Section 6.3.3 , such a common ordering supports the concurrent traversal of all of the query terms' postings lists, computing the score for each document as we encounter it. Computing scores in this manner is sometimes referred to as document-at-a-time scoring. We will now introduce a technique for inexact top-$K$ retrieval in which the postings are not all ordered by a common ordering, thereby precluding such a concurrent traversal. We will therefore require scores to be ``accumulated'' one term at a time as in the scheme of Figure 6.14 , so that we have term-at-a-time scoring.

The idea is to order the documents $d$ in the postings list of term $t$ by decreasing order of $\mbox{tf}_{t,d}$. Thus, the ordering of documents will vary from one postings list to another, and we cannot compute scores by a concurrent traversal of the postings lists of all query terms. Given postings lists ordered by decreasing order of $\mbox{tf}_{t,d}$, two ideas have been found to significantly lower the number of documents for which we accumulate scores: (1) when traversing the postings list for a query term $t$, we stop after considering a prefix of the postings list - either after a fixed number of documents $r$ have been seen, or after the value of $\mbox{tf}_{t,d}$ has dropped below a threshold; (2) when accumulating scores in the outer loop of Figure 6.14 , we consider the query terms in decreasing order of idf, so that the query terms likely to contribute the most to the final scores are considered first. This latter idea too can be adaptive at the time of processing a query: as we get to query terms with lower idf, we can determine whether to proceed based on the changes in document scores from processing the previous query term. If these changes are minimal, we may omit accumulation from the remaining query terms, or alternatively process shorter prefixes of their postings lists.

These ideas form a common generalization of the methods introduced in Sections 7.1.2 -7.1.4 . We may also implement a version of static ordering in which each postings list is ordered by an additive combination of static and query-dependent scores. We would again lose the consistency of ordering across postings, thereby having to process query terms one at time accumulating scores for all documents as we go along. Depending on the particular scoring function, the postings list for a document may be ordered by other quantities than term frequency; under this more general setting, this idea is known as impact ordering.


next up previous contents index
Next: Cluster pruning Up: Efficient scoring and ranking Previous: Static quality scores and   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07