Deconfounded Lexicon Indution
for Interpretable Social Science

Project website

About
You can read the paper here.

We present two new algorithms (Adversarial Selection (A) and Deep Residualization (DR)) for selecting text features which are predictive of some outcomes, but decorrelated from others. This activity can help you make causal inferences from text. It can also help make NLP models interpretable.

Code
The Adversarial Selection and Deep Residualization algorithms are implemented in TensorFlow.

Code and implementation details can be found on Github.

Hyperparameters
Following are hyperparameter settings for each of the experiments presented in the paper. These files can be copy-pasted into the corresponding "model" section of a JSON configuration file like the sample given in Github.

Section 4.1: Consumer Financial Protection Bureau(CFPB) Complaints Section 4.2: University Course Descriptions Section 4.3: eCommerce Descriptions

Contributors

Cite
@inproceedings{pryzant2018lexicon,
  title={Deconfounded Lexicon Induction for Interpretable Social Science},
  author={Pryzant, Reid and Wang, Kelly and Jurafsky, Dan and Wager, Stefan},
  booktitle={16th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
  year={2018}
}