Gender in the ACL Anthology

Overview

Studies of gender balance in academic computer science are typically based on statistics on enrollment and graduation. Going beyond these coarse measures of gender participation, we conduct a fine-grained study of gender in the field of Natural Language Processing. We use topic models (Latent Dirichlet Allocation) to explore the research topics of men and women in the ACL Anthology Network. We find that women publish more on dialog, discourse, and sentiment, while men publish more than women in parsing, formal semantics, and finite state models. To conduct our study we hand labeled the gender of authors in the ACL Anthology, creating a useful resource for other gender studies. Finally, our study of historical patterns in female participation shows that the percentage of women authors in computational linguistics has been continuously increasing, approximately doubling in the three decades since 1980.

Data

Here we distribute the list of author names from the AAN, sorted by gender. The following files are encoded in UTF-8, using the same representation as the AAN corpus. Authors listed in the unknown category are those whose gender we could not confidently determine. (Some are due to ill-formatted data; there are also some formatting and segmentation errors in other names.) Please email gender corrections to av@cs.stanford.edu. A subset of the names were annotated using Geoff Peter's name database. These 1278 names are included in the lists above, but are listed again here as they are the most likely to contain errors.

The labeled topic models of the AAN used in the paper (produced by Steven Bethard using the Stanford Topic Modeling Toolbox) are available below:

People

Papers

  • Adam Vogel and Dan Jurafsky, "He Said, She Said: Gender in the ACL Anthology". ACL 2012 Special Workshop: Rediscovering 50 Years of Discoveries. [pdf]

Contact Information

For any comments or questions, please e-mail av@cs.stanford.edu.