We propose a discourse framework — predictive models that use discourse
level information in student posts — to guide Massive Open Online Course
(MOOC) instructors to selectively post on student discussions on forums,
hereafter referred to as interventions, which otherwise is infeasible to scale. Prior
works on intervention guidance, from the pre-MOOC era, and on MOOCs do
not address diversity and scale. Our models and evaluation explicitly cater to
diversity and scale, both inherent to MOOCs. One way to scale interventions is to
build prediction models that learn instructor intervention patterns and predict
future interventions including on previously unseen courses. However, simplistic
vocabulary-based prediction models fail to adapt sufficiently due to the diversity
in MOOCs in terms of subject areas, volume of discussions and pedagogical
styles emanating from the instructor’s culture and geography. To address this
we use discourse relations, open-class words that are not only domain-agnostic
but also signal intervention due to the context of their occurrence in student
posts. We show that Penn Discourse Treebank (PDTB) discourse sense (e.g.,
contingency) based models scale prediction performance with training data.
This further leads us to investigate inter-post discourse structures. We propose a pedagogically grounded discourse taxonomy and build an annotated corpus of student posts in instructor intervened threads. We address the key issue of position bias that affects instructor’s decision to intervene since they create biased training samples. We propose a debiasing classifier to unlearn the bias and predict interventions. Finally, we investigate the context, the contiguous (sub)set of posts, that trigger intervention. Unsurprisingly, context significantly affects prediction over predicting intervention on individual posts. We show that neural dense vector representations of threads and a model of thread as a sequence of posts significantly improve the state-of-the-art towards production- ready models. Our models were evaluated on a diverse corpus of 14 MOOCs from various subject areas.
Our predictive models can be integrated to an instructor dashboard that can flag a discussion thread up to the precision of a post. A corpus of interventions annotated according to a discourse taxonomy will serve to build classification models of intervention to prompt instructors and peers to intervene on their respective taxonomic types (e.g., extension vs clarification).
Muthu Kumar Chandrasekaran is a Research Scientist at Amazon, Seattle working on natural language understanding for Alexa on Amazon Devices. Previously he was a Scientist at SRI's International Artificial Intelligence Center. He completed his Ph.D. from School of Computing, National University of Singapore. He is broadly interested in natural language processing, machine learning and their applications to information retrieval; specifically, in retrieving and organising information from conversation media such discussion forums. He has been co-chairing the CL-SciSumm Shared Task series and the BIRNDL workshop series since 2014. He also reviews for ACL, EMNLP, NAACL, CoNLL and JCDL conferences. During his PhD he also interned at the Allen Institute for Artificial Intelligence's Semantic Scholar research and National Institute of Informatics, Tokyo.