The idea is very simple. For these document classification tasks, linear model often overfit already, especially if bigram features are used. Very little is gained by using complex, hard-to-optimize models. To get the best performance, one reasonable way is to find the best linear model by imposing modeling constraints.
Naive Bayes models the document as . This might be too strong an assumption, but a good one at combating overfitting nevertheless. The way to get better performance out is to discrimitively optimize:
as a function of . This indeed get better performance, and is a linear model in the log-likelihood space. For details see my paper.