next up previous contents index
Next: Biword indexes Up: The term vocabulary and Previous: Faster postings list intersection   Contents   Index


Positional postings and phrase queries

Many complex or technical concepts and many organization and product names are multiword compounds or phrases. We would like to be able to pose a query such as Stanford University by treating it as a phrase so that a sentence in a document like The inventor Stanford Ovshinsky never went to university. is not a match. Most recent search engines support a double quotes syntax (``stanford university'') for phrase queries , which has proven to be very easily understood and successfully used by users. As many as 10% of web queries are phrase queries, and many more are implicit phrase queries (such as person names), entered without use of double quotes. To be able to support such queries, it is no longer sufficient for postings lists to be simply lists of documents that contain individual terms. In this section we consider two approaches to supporting phrase queries and their combination. A search engine should not only support phrase queries, but implement them efficiently. A related but distinct concept is term proximity weighting, where a document is preferred to the extent that the query terms appear close to each other in the text. This technique is covered in Section 7.2.2 (page [*]) in the context of ranked retrieval.



Subsections
next up previous contents index
Next: Biword indexes Up: The term vocabulary and Previous: Faster postings list intersection   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07