In the indexes we have considered so far, postings lists are ordered with respect to docID. As we will see in the next chapter, this is advantageous for compression - instead of docIDs we can compress smaller gaps between IDs, thus reducing space requirements for the index. However, this structure for the index is not optimal when we build ranked (Chapters 6 7 ) - as opposed to Boolean - retrieval systems. In ranked retrieval, postings are often ordered according to weight or impact , with the highest-weighted postings occurring first. With this organization, scanning of long postings lists during query processing can usually be terminated early when weights have become so small that any further documents can be predicted to be of low similarity to the query (see Chapter 6 ). In a docID-sorted index, new documents are always inserted at the end of postings lists. In an impact-sorted index impactordered, the insertion can occur anywhere, thus complicating the update of the inverted index.
Security is an important consideration for retrieval systems in corporations. A low-level employee should not be able to find the salary roster of the corporation, but authorized managers need to be able to search for it. Users' results lists must not contain documents they are barred from opening since the very existence of a document can be sensitive information.
A user-document matrix for
access control lists.
Element User authorization is often mediated through access control lists or ACLs. ACLs can be dealt with in an information retrieval system by representing each document as the set of users that can access them (Figure 4.8 ) and then inverting the resulting user-document matrix. The inverted ACL index has, for each user, a ``postings list'' of documents they can access - the user's access list. Search results are then intersected with this list. However, such an index is difficult to maintain when access permissions change - we discussed these difficulties in the context of incremental indexing for regular postings lists in the last section. It also requires the processing of very long postings lists for users with access to large document subsets. User membership is therefore often verified by retrieving access information directly from the file system at query time - even though this slows down retrieval.
We discussed indexes for storing and retrieving terms (as opposed to documents) in Chapter 3 .
Exercises.