Work in information retrieval quickly confronted the problem of variant expression which meant that the words in a query might not appear in a document, despite it being relevant to the query. An early experiment about 1960 cited by Swanson (1988) found that only 11 out of 23 documents properly indexed under the subject toxicity had any use of a word containing the stem toxi. There is also the issue of translation, of users knowing what terms a document will use. Blair and Maron (1985) conclude that ``it is impossibly difficult for users to predict the exact words, word combinations, and phrases that are used by all (or most) relevant documents and only (or primarily) by those documents''.
The main initial papers on relevance feedback using vector space models all appear in Salton (1971b), including the presentation of the Rocchio algorithm (Rocchio, 1971) and the Ide dec-hi variant along with evaluation of several variants (Ide, 1971). Another variant is to regard all documents in the collection apart from those judged relevant as nonrelevant, rather than only ones that are explicitly judged nonrelevant. However, Schütze et al. (1995) and Singhal et al. (1997) show that better results are obtained for routing by using only documents close to the query of interest rather than all documents. Other later work includes Salton and Buckley (1990), Riezler et al. (2007) (a statistical NLP approach to RF) and the recent survey paper Ruthven and Lalmas (2003).
The effectiveness of interactive relevance feedback systems is discussed in (Harman, 1992, Buckley et al., 1994b, Salton, 1989). Koenemann and Belkin (1996) do user studies of the effectiveness of relevance feedback.
Traditionally Roget's thesaurus has been the best known English language thesaurus (Roget, 1946). In recent computational work, people almost always use WordNet (Fellbaum, 1998), not only because it is free, but also because of its rich link structure. It is available at: http://wordnet.princeton.edu.
Qiu and Frei (1993) and Schütze (1998) discuss automatic thesaurus generation. Xu and Croft (1996) explore using both local and global query expansion.