Research wiki

Overview

The idea is very simple. For these document classification tasks, linear model often overfit already, especially if bigram features are used. Very little is gained by using complex, hard-to-optimize models. To get the best performance, one reasonable way is to find the best linear model by imposing modeling constraints.

Naive Bayes models the document as tex:p(y) = p(y) \prod_i p(x_i|y). This might be too strong an assumption, but a good one at combating overfitting nevertheless. The way to get better performance out is to discrimitively optimize:

tex:p(y;\theta) = p(y) \prod_i p(x_i|y)^{\theta_i}

as a function of tex:\theta. This indeed get better performance, and is a linear model in the log-likelihood space. For details see my paper.

sidaw12_simple_sentiment.pdf

Navigation

Research

References

Print/export