next up previous contents index
Next: When does relevance feedback Up: Relevance feedback and pseudo Previous: The #rocchio71### algorithm.   Contents   Index


Probabilistic relevance feedback

Rather than reweighting the query in a vector space, if a user has told us some relevant and nonrelevant documents, then we can proceed to build a . One way of doing this is with a Naive Bayes probabilistic model. If $R$ is a Boolean indicator variable expressing the relevance of a document, then we can estimate $P(x_t=1\vert R)$, the probability of a term $t$ appearing in a document, depending on whether it is relevant or not, as:

$\displaystyle \hat{P}(x_t=1\vert R=1)$ $\textstyle =$ $\displaystyle \vert VR_{t}\vert/\vert VR\vert$ (50)
$\displaystyle \hat{P}(x_t=1\vert R=0)$ $\textstyle =$ $\displaystyle (df_t - \vert VR_t\vert)/(N - \vert VR\vert)$ (51)

where $N$ is the total number of documents, $df_t$ is the number that contain $t$, $VR$ is the set of known relevant documents, and $VR_{t}$ is the subset of this set containing $t$. Even though the set of known relevant documents is a perhaps small subset of the true set of relevant documents, if we assume that the set of relevant documents is a small subset of the set of all documents then the estimates given above will be reasonable. This gives a basis for another way of changing the query term weights. We will discuss such probabilistic approaches more in Chapters 11 13 , and in particular outline the application to relevance feedback in Section 11.3.4 (page [*]). For the moment, observe that using just Equation 50 as a basis for term-weighting is likely insufficient. The equations use only collection statistics and information about the term distribution within the documents judged relevant. They preserve no memory of the original query.


next up previous contents index
Next: When does relevance feedback Up: Relevance feedback and pseudo Previous: The #rocchio71### algorithm.   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07