Next: Maximum tf normalization
Up: Variant tfidf functions
Previous: Variant tfidf functions
Contents
Index
Sublinear tf scaling
It seems unlikely that twenty occurrences of a term in a document truly carry twenty times the significance of a single occurrence. Accordingly, there has been considerable research into variants of term frequency that go beyond counting the number of occurrences of a term. A common modification is to use instead the logarithm of the term frequency, which assigns a weight given by

(28) 
In this form, we may replace by some other function as in (28), to obtain:

(29) 
Equation (23) can then be modified by replacing tfidf by wfidf as defined in (29).
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
20090407