next up previous contents index
Next: The Probability Ranking Principle Up: Probabilistic information retrieval Previous: Probabilistic information retrieval   Contents   Index


Review of basic probability theory

We hope that the reader has seen a little basic probability theory previously. We will give a very quick review; some references for further reading appear at the end of the chapter. A variable $A$ represents an event (a subset of the space of possible outcomes). Equivalently, we can represent the subset via a random variable , which is a function from outcomes to real numbers; the subset is the domain over which the random variable $A$ has a particular value. Often we will not know with certainty whether an event is true in the world. We can ask the probability of the event $0 \le P(A)
\le 1$. For two events $A$ and $B$, the joint event of both events occurring is described by the joint probability $P(A,B)$. The conditional probability $P(A\vert B)$ expresses the probability of event $A$ given that event $B$ occurred. The fundamental relationship between joint and conditional probabilities is given by the chain rule :

\begin{displaymath}
P(A, B) = P(A \cap B) = P(A\vert B)P(B) = P(B\vert A)P(A)
\end{displaymath} (56)

Without making any assumptions, the probability of a joint event equals the probability of one of the events multiplied by the probability of the other event conditioned on knowing the first event happened.

Writing $P(\overline{A})$ for the complement of an event, we similarly have:

\begin{displaymath}
P(\overline{A},B) = P(B\vert \overline{A})P(\overline{A})
\end{displaymath} (57)

Probability theory also has a partition rule , which says that if an event $B$ can be divided into an exhaustive set of disjoint subcases, then the probability of $B$ is the sum of the probabilities of the subcases. A special case of this rule gives that:
\begin{displaymath}
P(B) = P(A,B) + P(\overline{A}, B)
\end{displaymath} (58)

From these we can derive Bayes' Rule for inverting conditional probabilities:

\begin{displaymath}P(A\vert B) = \frac{P(B\vert A)P(A)}{P(B)} = \left[\frac{P(...
...{\sum_{X \in \{ A, \overline{A}\}} P(B\vert X)P(X)}\right]P(A)
\end{displaymath} (59)

This equation can also be thought of as a way of updating probabilities. We start off with an initial estimate of how likely the event $A$ is when we do not have any other information; this is the prior probability $P(A)$. Bayes' rule lets us derive a posterior probability $P(A\vert B)$ after having seen the evidence $B$, based on the likelihood of $B$ occurring in the two cases that $A$ does or does not hold.[*]

Finally, it is often useful to talk about the odds of an event, which provide a kind of multiplier for how probabilities change:

\begin{displaymath}
\mbox{Odds:\qquad } O(A) = \frac{P(A)}{P(\overline{A})} = \frac{P(A)}{1 - P(A)}
\end{displaymath} (60)


next up previous contents index
Next: The Probability Ranking Principle Up: Probabilistic information retrieval Previous: Probabilistic information retrieval   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07