Next: Maximum tf normalization
Up: Variant tf-idf functions
Previous: Variant tf-idf functions
It seems unlikely that twenty occurrences of a term in a document truly carry twenty times the significance of a single occurrence. Accordingly, there has been considerable research into variants of term frequency that go beyond counting the number of occurrences of a term. A common modification is to use instead the logarithm of the term frequency, which assigns a weight given by
Sublinear tf scaling
In this form, we may replace by some other function as in (28), to obtain:
Equation (23) can then be modified by replacing tf-idf by wf-idf as defined in (29).
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.