A common suggestion to users for coming up with good queries is to think of words that would likely appear in a relevant document, and to use those words as the query. The language modeling approach to IR directly models that idea: a document is a good match to a query if the document model is likely to generate the query, which will in turn happen if the document contains the query words often. This approach thus provides a different realization of some of the basic ideas for document ranking which we saw in Section 6.2 (page ). Instead of overtly modeling the probability of relevance of a document to a query , as in the traditional probabilistic approach to IR (Chapter 11 ), the basic language modeling approach instead builds a probabilistic language model from each document , and ranks documents based on the probability of the model generating the query: .
In this chapter, we first introduce the concept of language models (Section 12.1 ) and then describe the basic and most commonly used language modeling approach to IR, the Query Likelihood Model (Section 12.2 ). After some comparisons between the language modeling approach and other approaches to IR (Section 12.3 ), we finish by briefly describing various extensions to the language modeling approach (Section 12.4 ).