Next: Maximum tf normalization
Up: Variant tf-idf functions
Previous: Variant tf-idf functions
Contents
Index
Sublinear tf scaling
It seems unlikely that twenty occurrences of a term in a document truly carry twenty times the significance of a single occurrence. Accordingly, there has been considerable research into variants of term frequency that go beyond counting the number of occurrences of a term. A common modification is to use instead the logarithm of the term frequency, which assigns a weight given by
data:image/s3,"s3://crabby-images/ede2b/ede2b503204cd185ebac3604db4ccc523255e643" alt="\begin{displaymath}
\mbox{wf}_{t,d}=\left\{
\begin{array}{ll}
1+\log \mbox{tf}...
...x{tf}_{t,d}>0 \\
0 & \mbox{otherwise}
\end{array}.
\right.
\end{displaymath}" |
(28) |
In this form, we may replace
by some other function
as in (28), to obtain:
data:image/s3,"s3://crabby-images/dad37/dad377a8d5d95bd70ebddcb860a53a7cbc474970" alt="\begin{displaymath}
\mbox{wf-idf}_{t,d} = \mbox{wf}_{t,d} \times \mbox{idf}_t.
\end{displaymath}" |
(29) |
Equation (23) can then be modified by replacing tf-idf by wf-idf as defined in (29).
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07