Computing scores in a complete search system

Chapter 6 developed the theory underlying term weighting in documents for the purposes of scoring, leading up to vector space models and the basic cosine scoring algorithm of Section 6.3.3 (page ). In this chapter we begin in Section 7.1 with heuristics for speeding up this computation; many of these heuristics achieve their speed at the risk of not finding quite the top documents matching the query. Some of these heuristics generalize beyond cosine scoring. With Section 7.1 in place, we have essentially all the components needed for a complete search engine. We therefore take a step back from cosine scoring, to the more general problem of computing scores in a search engine. In Section 7.2 we outline a complete search engine, including indexes and structures to support not only cosine scoring but also more general ranking factors such as query term proximity. We describe how all of the various pieces fit together in Section 7.2.4 . We conclude this chapter with Section 7.3 , where we discuss how the vector space model for free text queries interacts with common query operators.

- Efficient scoring and ranking
- Inexact top K document retrieval
- Index elimination
- Champion lists
- Static quality scores and ordering
- Impact ordering
- Cluster pruning

- Components of an information retrieval system

- Vector space scoring and query operator interaction

- References and further reading

This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.

2009-04-07