next up previous contents index
Next: Feature selection Up: Properties of Naive Bayes Previous: Properties of Naive Bayes   Contents   Index


A variant of the multinomial model

An alternative formalization of the represents each document $d$ as an $M$-dimensional vector of counts $\langle \termf_{t_1,d},\ldots,\termf_{t_M,d} \rangle$ where $\termf_{t_i,d}$ is the term frequency of $t_i$ in $d$. $P(d\vert\tcjclass)$ is then computed as follows (cf. Equation 99, page 12.2.1 );
\begin{displaymath}
P(d\vert\tcjclass) = P(\langle
\termf_{t_1,d},\ldots,\termf_...
...ropto \prod_{1 \leq i \leq M} P(X=t_i\vert c)^{\termf_{t_i,d}}
\end{displaymath} (129)

Note that we have omitted the multinomial factor. See Equation 99 (page 99 ).

Equation 129 is equivalent to the sequence model in Equation 113 as $P(X=t_i\vert c)^{\termf_{t_i,d}}=1$ for terms that do not occur in $d$ ( $\termf_{t_i,d}=0$) and a term that occurs $\termf_{t_i,d} \geq 1$ times will contribute $\termf_{t_i,d}$ factors both in Equation 113 and in Equation 129.

Exercises.


next up previous contents index
Next: Feature selection Up: Properties of Naive Bayes Previous: Properties of Naive Bayes   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07