In our paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜", we raise concerns about the development and deployment of ever larger language models, especially for English. BERT, its variants, GPT-2/3, and others, most recently Switch-C, have pushed the boundaries on tasks and leaderboards both through architectural innovations and through sheer size. We take a step back and ask: How big is too big? What are the possible risks associated with this technology and what paths are available for mitigating those risks? We provide recommendations including weighing the environmental and financial costs first, investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values, and encouraging research directions beyond ever larger language models. Data statements for NLP in particular have gained traction as a form of responsible documentation for language data. I present the developments for data statements since their initial conception, including a workshop with dataset developers and plans for an updated schema informed by their feedback.
Joint work with Emily M. Bender, Timnit Gebru, Margaret Mitchell, and Batya Friedman.
Angelina McMillan-Major is a PhD student in Computational Linguistics at the University of Washington. She is interested in methodologies for low-resource language documentation and reclamation, including machine learning methodologies, and thinking critically about the interaction between technology and language.