This talk is part of the NLP Seminar Series.

Resolving the Human Subjects Status of Machine Learning’s Crowdworkers

Divyansh Kaushik, Language Technologies Institute, Carnegie Mellon University
Date: 11:00am - 12:00 noon PT, Aug 11 2022
Venue: Zoom (link hidden)


In recent years, machine learning (ML) has come to rely more heavily on crowdworkers, both for building bigger datasets and for addressing research questions requiring human interaction or judgment. Owing to the diverse tasks performed by crowdworkers, and the myriad ways the resulting datasets are used, it can be difficult to determine when these individuals are best thought of as workers, versus as human subjects. These difficulties are compounded by conflicting policies, with some institutions and researchers treating all ML crowdwork as human subjects research, and other institutions holding that ML crowdworkers rarely constitute human subjects. Additionally, few ML papers involving crowdwork mention IRB oversight, raising the prospect that many might not be in compliance with ethical and regulatory requirements. In this talk, I will focus on research in natural language processing to investigate the appropriate designation of crowdsourcing studies and the unique challenges that ML research poses for research oversight. Crucially, under the U.S. Common Rule, these judgments hinge on determinations of "aboutness", both whom (or what) the collected data is about and whom (or what) the analysis is about, a determination that can often be hard to make in ML. I will also discuss a potential loophole that exists in the Common Rule, where researchers can elude research ethics oversight by splitting data collection and analysis into distinct studies. We offer several policy recommendations to address these concerns.


Divyansh Kaushik is a PhD Candidate at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University, and a Science and Technology Policy Fellow at the Federation of American Scientists. At CMU, he is part of the Approximately Correct Machine Intelligence (ACMI) Lab, and is advised by Dr. Eduard Hovy and Dr. Zachary Lipton. He is an Amazon Graduate Research Fellow and broadly, his research specifically focuses on exploring how we can use different forms of human feedback to robustify NLP systems. Over the years, my work has been supported by Amazon AI, Pricewaterhouse Coopers, and Facebook AI. His work has been recognized by several awards and presentations at top conferences. He frequently writes about several policy issues, with recent pieces appearing in Forbes, The Dispatch, and Issues in Science and Technology.