The abundance of text and dialogue online presents probabilistic methods with an opportunity to answer fundamental questions about user behavior and interactions. However, unlike standard machine learning tasks, inferences based on language are frequently interdependent, rely on heterogenous observations, and require complex reasoning. Existing methods capture correlations and even non-linearities across relevant linguistic features, but fall short on handling the structure in these problems. In this talk, I present research on probabilistic methods that address the challenges of modeling online dialogue by making interrelated inferences and fusing noisy domain knowledge with statistical signals. These approaches use probabilistic soft logic, a flexible modeling framework for structured data. I focus on modeling two important sources of text and dialogue: online debate forums and social media sites. I introduce several modeling templates that exploit useful structural patterns to make collective, consistent predictions. I show the advantages of such structured models in fusing multiple language signals. I highlight state-of-the-art results in identifying stances in online debates and detecting indicators of alcoholism relapse from Twitter.
Dhanya Sridhar is a fifth year Ph.D. student at University of California Santa Cruz. Her research interests in machine learning focus on developing probabilistic models for text, dialogue and computational social science. Her publications include articles in Association of Computational Linguistics (ACL) and The Web Conference (WWW). Her eight invited talks include presentations at Microsoft Research NYC and Columbia University.