In this talk, I'll discuss work from two different strands of my research which produce apparently contrasting results. In the first, I've found that AI systems such as LLMs excel at imitating human social intelligence in interactions with real people: including in persuasion, building trust, and passing as human in the Turing test. On the other hand, I've found that these same systems often fail in catastrophic ways when they are evaluated in more tightly controlled conditions: in theory of mind tasks and developing common ground conventions. I'll argue that these contrasting results are partly driven by human participants' natural cooperativity which amplifies models' capabilities and conceals their shortcomings. On a broader scale, this suggests that people might elevate these systems to become real social agents even where models lack the underlying capacity to perform these functions well.
Cameron Jones is an Assistant Professor of Psychology at Stony Brook University, where he directs the Cognition, Language, Interaction, and Computation (CLIC) Lab. His research sits at the intersection of cognitive science and NLP, using experimental methods to probe how LLMs compare to humans in social reasoning—and how people interact with these systems in practice. His work has shown that LLMs pass both classic false belief tasks and the Turing test, while also revealing systematic failures under more controlled evaluation conditions. Before joining Stony Brook, he completed his PhD in Cognitive Science at UC San Diego and worked in industry NLP. His work has been published in Cognitive Science, TACL, and Computational Linguistics.