The practical pursuit of computerized information retrieval began in the late 1940s (Cleverdon, 1991, Liddy, 2005). A great increase in the production of scientific literature, much in the form of less formal technical reports rather than traditional journal articles, coupled with the availability of computers, led to interest in automatic document retrieval. However, in those days, document retrieval was always based on author, title, and keywords; full-text search came much later.
The article of Bush (1945) provided lasting inspiration for the new field:
``Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, `memex' will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.''The term Information Retrieval was coined by Calvin Mooers in 1948/1950 (Mooers, 1950).
In 1958, much newspaper attention was paid to demonstrations at a conference (see Taube and Wooster, 1958) of IBM ``auto-indexing'' machines, based primarily on the work of H. P. Luhn. Commercial interest quickly gravitated towards Boolean retrieval systems, but the early years saw a heady debate over various disparate technologies for retrieval systems. For example Mooers (1961) dissented:
``It is a common fallacy, underwritten at this date by the investment of several million dollars in a variety of retrieval hardware, that the algebra of George Boole (1847) is the appropriate formalism for retrieval system design. This view is as widely and uncritically accepted as it is wrong.''The observation of AND vs. OR giving you opposite extremes in a precisionrecall tradeoff, but not the middle ground comes from (Lee and Fox, 1988).
The book (Witten et al., 1999) is the standard reference for an in-depth comparison of the space and time efficiency of the inverted index versus other possible data structures; a more succinct and up-to-date presentation appears in Zobel and Moffat (2006). We further discuss several approaches in Chapter 5 .
Friedl (2006) covers the practical usage of regular expressions for searching. The underlying computer science appears in (Hopcroft et al., 2000).