Studies of gender balance in academic computer science are typically based on statistics on enrollment and graduation. Going beyond these coarse measures of gender participation, we conduct a fine-grained study of gender in the field of Natural Language Processing. We use topic models (Latent Dirichlet Allocation) to explore the research topics of men and women in the ACL Anthology Network. We find that women publish more on dialog, discourse, and sentiment, while men publish more than women in parsing, formal semantics, and finite state models. To conduct our study we hand labeled the gender of authors in the ACL Anthology, creating a useful resource for other gender studies. Finally, our study of historical patterns in female participation shows that the percentage of women authors in computational linguistics has been continuously increasing, approximately doubling in the three decades since 1980.
The labeled topic models of the AAN used in the paper (produced by Steven Bethard using the Stanford Topic Modeling Toolbox) are available below:
For any comments or questions, please e-mail av@cs.stanford.edu.