Next: Automatic thesaurus generation
Up: Global methods for query
Previous: Vocabulary tools for query
Contents
Index
Query expansion
An example of query expansion in the interface of the
Yahoo! web search engine in 2006.The expanded query suggestions
appear just below the ``Search Results'' bar.
In relevance feedback, users give additional input on documents (by marking documents in the results set as relevant or not), and this input is used to reweight the terms in the query for documents. In query expansion on the other hand, users give additional input on query words or phrases, possibly suggesting additional query terms. Some search engines (especially on the web) suggest related queries in response to a query; the users then opt to use one of these alternative query suggestions. Figure 9.6 shows an example of query suggestion options being presented in the Yahoo! web search engine. The central question in this form of query expansion is how to generate alternative or expanded queries for the user. The most common form of query expansion is global analysis, using some form of thesaurus. For each term in a query, the query can be automatically expanded with synonyms and related words of from the thesaurus. Use of a thesaurus can be
combined with ideas of term weighting: for instance, one might weight added terms less than original query terms.
Methods for building a thesaurus for query expansion include:
- Use of a controlled vocabulary that is maintained by
human editors. Here, there is a canonical term for each concept. The subject headings
of traditional library subject indexes, such as the Library of
Congress Subject Headings, or the Dewey Decimal system are examples of
a controlled vocabulary. Use of a controlled vocabulary
is quite common for well-resourced domains. A well-known example is
the Unified Medical Language System (UMLS) used with MedLine for
querying the biomedical research literature.
For example, in Figure 9.7 , neoplasms
was added to a search for cancer. This Medline query expansion also contrasts with the Yahoo! example. The Yahoo! interface is a case of interactive query expansion, whereas PubMed does automatic query expansion. Unless the user chooses to examine the submitted query, they may not even realize that query expansion has occurred.
- A manual thesaurus. Here, human editors have built up sets of synonymous
names for concepts, without designating a canonical term. The UMLS
metathesaurus is one example of a
thesaurus. Statistics Canada maintains a thesaurus
of preferred terms, synonyms, broader terms, and narrower terms for
matters on
which the government collects statistics, such as goods and services.
This thesaurus is also bilingual English and French.
- An automatically derived thesaurus. Here, word co-occurrence
statistics over a collection of documents in a domain are used to
automatically induce a thesaurus; see Section 9.2.3 .
- Query reformulations based on query log mining. Here, we exploit
the manual query reformulations of other users to make suggestions to
a new user. This requires a huge query volume, and is thus
particularly appropriate to web search.
Thesaurus-based query expansion has the advantage of not requiring any
user input. Use of query expansion
generally increases recall and is widely used in many science and
engineering fields.
As well as such global analysis techniques, it is also possible to do
query expansion by local analysis, for instance, by
analyzing the documents in the result set. User input is now usually
required, but a distinction remains as to
whether the user is giving feedback on documents or on query terms.
Next: Automatic thesaurus generation
Up: Global methods for query
Previous: Vocabulary tools for query
Contents
Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07