For the reasons given, a biword index is not the standard solution. Rather, a positional index is most commonly employed. Here, for each term in the vocabulary, we store postings of the form docID: position1, position2, ..., as shown in Figure 2.11 , where each position is a token index in the document. Each posting will also usually record the term frequency, for reasons discussed in Chapter 6 .
To process a phrase query, you still need to access the inverted index entries for each distinct term. As before, you would start with the least frequent term and then work to further restrict the list of possible candidates. In the merge operation, the same general technique is used as before, but rather than simply checking that both terms are in a document, you also need to check that their positions of appearance in the document are compatible with the phrase query being evaluated. This requires working out offsets between the words.
Worked example. Satisfying phrase queries.phrasequery Suppose the postings lists for to and be are as in Figure 2.11 , and the query is ``to be or not to be''. The postings lists to access are: to, be, or, not. We will examine intersecting the postings lists for to and be. We first look for documents that contain both terms. Then, we look for places in the lists where there is an occurrence of be with a token index one higher than a position of to, and then we look for another occurrence of each word with token index 4 higher than the first occurrence. In the above lists, the pattern of occurrences that is a possible match is:
to: ...; 4:...,429,433; ...End worked example.
be: ...; 4:...,430,434; ...
The same general method is applied for within word proximity searches, of the sort we saw in westlaw:
employment /3 placeHere, / means ``within words of (on either side)''. Clearly, positional indexes can be used for such queries; biword indexes cannot. We show in Figure 2.12 an algorithm for satisfying within word proximity searches; it is further discussed in Exercise 2.4.3 .