Computing Social Meaning

from Micro to Macro




Rob Voigt

Stanford Linguistics

robvoigt@stanford.edu

Computational Linguistics

and

Social

Meaning


social media,
variation in production
real-world contexts, interaction
propositional, word- or sentence-level intent, stance, affect, pragmatics, discourse
  • Operationalize rich linguistic cues computationally
  • to study language in social context at scale
    • accumulation across populations and time
    • generalization and variation across contexts

Pushing Beyond Prediction


  • A "pure" computational approach is not enough
    • e.g., build a classifier to
      predict gender/race from language
  • Correlations \(\neq\) social meanings;
    meaning-making is an interpretive act
  • To understand why rather than simply what,
    we need to grapple with the micro!

Today's Talk


Study 1: Quantifying Large-scale Disparity

Micro \( \to \) Macro
human social
evaluations
computational modeling
of pragmatics

Study 2: Exploring the Meanings of a Variable

Macro \( \to \) Micro
computational annotation
of multimodal linguistic cues
quantitative-qualitative
interpretation

Study 1


Computational Understanding
of Police Officer Respect

Voigt et al. (2017), Proceedings of the National Academy of Sciences

with Nicholas P. Camp, Vinodkumar Prabhakaran, William L. Hamilton,
Rebecca C. Hetey, Camilla M. Griffiths, David Jurgens,
Dan Jurafsky, and Jennifer L. Eberhardt

Police-Community Interaction


  • Media focus on explosive incidents,
    research focus on outcomes
  • But what's happening on the ground?
  • 25% of adults interact with police during a year
    Eith and Durose (2011)

Procedural Justice


  • A person treated with respect:
    • has more trust in the officer’s fairness
    • and the fairness of the institution
    • and is more willing to support or
      cooperate with the police
      (Tyler 1990; Tyler and Huo 2002; Mazerolle et al. 2013)
  • Black community members report:
    • more negative experiences with the police
    • and being treated with less respect
      (Huo and Tyler 2001; Peffley and Hurwitz 2010; Epp et al. 2014)

Study 1: Questions



How is respect instantiated linguistically in the police-community interactional context?


Are black community members treated with less respect by officers than is afforded white community members?

Previous Work on Procedural Fairness


  • citizens’ recollection of past interactions
    (Epp et al 2014)
  • researcher observation of officer behavior
    (Mastrofski et al 2009; Dai et al 2011;
    Jonathan-Zamir et al 2015; Mastrofski et al 2016)
  • These methods are invaluable but indirect
  • ... and the presence of a researcher may influence police behavior
    (Mastrofski and Parks, 1990)

This Work: Footage as Data


  • Oakland PD has been wearing body cameras since 2010, usually used only as evidence
  • ... but, a window into everyday behavior!

Study 1: Data


  • Choose to focus on traffic stops,
    black and white community members
  • 981 stops by 245 officers in April 2014
  • 70% black drivers; 183 hours of footage
  • Professionally transcribed and diarized:
    36,738 officer utterances, 350k+ words

"Thin-slice" Human Social Evaluations


  • Formal, Friendly, Polite, Respectful, Impartial
  • 414 utterances,
    10 raters each
  • High consistency: Cronbach's \(\alpha\) 0.73-0.91

The Latent Space of Respect



PCA Loadings
 
 variance explained:
Formal
Friendly
Polite
Respectful
Impartial
Respect
 71%
0.27
0.47
0.49
0.47
0.50
Formality
 22%
0.91
-0.39
-0.04
0.03
-0.11

Computational Modeling of Respect


    Use human judgments as training data
    for a machine-learned model of respect
    by operationalizing theories of politeness

  • Linguistic theories of politeness from pragmatics focus on requests
    • Requesting that you do something is face-threatening
      (Goffman 1967; Lakoff 1973; Culpepper 1976; Brown and Levinson 1978)

Features: Operationalizing Politeness


Negative Politeness

(hearer's freedom of action)

  • Minimize my request
  • Put the imposition on record (vs. ignore impact on you)

Positive Politeness

(hearer's self-image)

  • Emphasize your value
  • Emphasize my good relationship with you

apologizing, gratitude, reassurance ("it's okay"), hedges, etc

formal vs. informal titles ("sir" vs "bro"), introductions, mentioning safety, etc

Brown and Ford (1961), Brown and Levinson (1978), Culpepper (1976),
Pennebaker et al. (2007), Prabhakaran et al (2012), Danescu-Niculescu-Mizil et al (2013)

Computational Model


  • Linear regression predicting Respect
  • 32 hand-engineered linguistic features
    lexicons, regular expressions, dependency features,
    more complex functions (e.g., "bald commands")

  • log-tranformed counts of features per utterance
  • stepwise removal of uninformative features
  • Reasonable R2: 0.258

What does the model learn?


Example Score

Sorry to stop you. My name's Officer [name] with the Police Department.

0.84
Example Score

There you go, ma'am.
Drive safe, please.

1.21
Example Score

It just says that, uh, you've fixed it. No problem. Thank you very much, sir.

2.07
Example Score

Where are you guys coming from? Don't lie.

-0.57
Example Score

So let me see that registration stuff, bro.

-1.03

Results Across the Entire Dataset


  • Estimate respect for all 37k officer utterances
  • Hierarchical mixed-effects model
    • predict respect from contextual and social variables
    • controlling for officer-level and stop-level variation
  • Results - officers are more respectful:
    • with older community members
    • when a citation is issued
    • with white community members

Respect Through the Interaction


Controls


  • Only “everyday” interactions
    (no arrest, no search)
  • Crime rate in the area
  • Density of businesses in the area
  • Officer race
  • Officer years of experience
  • Severity of the reason for the stop

Replication(s)


  • Original study participants
    were Stanford undergraduates
  • Full replications:
    (Camp, Voigt, et al. in prep)
    • Same population, new stimuli
    • DMV participants, same stimuli

Study 1: Take-aways


Micro \( \to \) Macro
human social
evaluations
computational model
of respect
Computational — scale up complex social evaluations
— evidence for large-scale disparity
Linguistic — battle-test politeness theory
— characterize respect in unique domain
Impact — confirm community reports
— interpretable results and strategies

Study 1: Ongoing Work


    • Officer Prosody
    • Officer-Community Discourse Structure
    • Misunderstanding & Divergent Interpretations

Study 2


Multimodal Analysis of
Gesture and Embodiment

Voigt et al. (2016), Journal of Sociolinguistics

with Penelope Eckert, Robert J. Podesva, and Dan Jurafsky

Linguistic Approaches

to Gesture and Embodiment


  • Long acknowledgement of the body's
    meaning-making capacity
    e.g., Hall's proxemics (1963, 1966); Birdwhistell's kinesics (1952, 1970)
  • "Two parts of one system"
    (McNeill 1992, 2008; Kendon 1995, 2004)
  • Largely observational and manual coding, e.g.:
    Mendoza‐Denton and Jannedy (2011), Loehr (2012),
    Mondada (2016), Pratt and D'Onofrio (2017)

Gestural Coding


  • Complex and non-standardized annotation schemes
  • Highly time and skill intensive
  • Result:
    Detailed analyses,
    but small sample sizes

Kipp et al.'s (2007) 3D pose
annotation scheme for hand position

Computational Annotation


Overall Body Movement
and Prosodic Engagement

Voigt, Podesva, and Jurafsky (2014)

Smiling
and Vowel Fronting

Podesva, Callier, Voigt, and Jurafsky (2015)

This Work: Head Cant


  • Head Cant Side-to-side tilt of the head
  • Gendered folk ideology:

    "Women tilt their heads to the side in appeasement
    and as a playful or flirtatious gesture.""
    Body Language For Dummies, Kuhnke (2012)


Goffman (1979) in advertising
Costa et al (2001) in art

Study 2: Questions



Is head cant really about gender?


Is it used differently across interactional contexts?


What communicative purposes can it serve?

Study 2: Data


YouTube
  • 32 women
  • Video blog
    monologues
  • first day of school, pregnancy, MCAT
Laboratory
  • 22 women, 11 men
  • Dialogues between friends
  • Discussion topics given in a rolodex

17,533 phrases from 18 hours of speech

Computational Extraction of Head Cant


  • Use a shape-fitting model to find key points on the face
    (Kazemi and Sullivan 2014)

Triangulate head cant from the corners of the eyes

Continuous measure at 30Hz

Distribution by context and gender


  • More cant in dialogues
  • More cant when not speaking
  • Men use it more!

Cant and Prosody


Related to prosodic engagement? (e.g. Jeon et al 2010; Schuller et al 2010; Wang and Hirschberg 2011)


F0

Intensity

Quantitatively-guided
Qualitative Analysis


  • Extract phrases which match the trend,
    e.g., high cant + high F0 + low intensity

Framing of Shared Understanding


Joint variation:
high cant, high F0, increased
speech rate

Confirmation: Discourse Particles


  • you know and I mean:
    shared understanding, "other orientation"
    Schiffrin (1987)
  • um, uh, and like:
    fillers, hesitation, holding the floor
    Clark and Fox Tree (2002); Andersen (1998)
  • mm, mmhm, and yeah:
    acknowledgement, engagement, taking the floor Jefferson (1984); Lambertz (2011)


Confirmation: Discourse Particles


Across 9,038 transcribed phrases in the Lab data:

(log odds ratios for co-occurrence of cant with discourse particles)

you know, I mean
um, uh, like
mm, mmhm, yeah

Study 2: Take-aways


Macro \( \to \) Micro
computational annotation;
high-level correlations
quantitative-qualitative;
statistical confirmation
Computational — utility of computational annotation
— multimodality across contexts
Linguistic — identified communicative uses of cant

— not just about gender!

   (perhaps gendered with specific discourse behaviors)
Impact — TBD!

Discussion


and what's to come

Micro and Macro


  • Expanding the scope of computational linguistics
  • Central problem of interpretation
  • Underexplored - much to learn at both scales!

Computational Fieldwork?


  • Looking beyond the standard web sources
    e.g. RtGender, Voigt et al (2018)
  • Laboratory and experimental data as corpora
  • Long-term relationships with real-world actors
    • can be very time-consuming: fingerprinting, background checks, multi-year process
    • can be very messy: real-world data
      doesn't have an API

Computation and Impact: Policing


  • Provides concrete strategies for officers
  • Cooperation with Oakland to integrate results into procedural justice training
  • ... and we can measure impact

Procedural Justice Elsewhere


  • Language in AZ Eviction Courts
    with Daniel Bernal et al, Stanford Law
  • RCT with informational mailers
  • Linguistic questions:
    • Does more information help tenants be more assertive?
    • Are judges more dismissive of L2 English speakers?

Impact and Meaning in Society


  • Many social problems are problems of meaning

    How do we interpret one another's behavior?
    ... recognize one another's intentions?
    ... validate one another's personhood?

    How do those interpretations accumulate?

  • Linguists are uniquely positioned to contribute!

The common misconception is that language use has primarily to do with words and what they mean. It doesn't.
It has primarily to do with people and what they mean.
Clark and Schober (1992)




Questions?

Extra Slides

Study 1

Human Judgment Results


What does the model learn?


  • White community members are
    57% more likely to hear an officer say one of the
    top 10% most respectful utterances in our dataset
  • Black community members are
    61% more likely to hear an officer say one of the
    top 10% least respectful utterances in our dataset

Mixed-effects model outputs


Controls: Officer Race



Controls: Severity


  • We asked officers to rate the severity of each stop
  • Black community members are stopped for less severe offenses...
    no impact on Respect

Ongoing work:

Officer Prosody


  • Social evaluations on 10-15 second tiles
  • Content filtered
  • Evidence for analogous disparity

Ongoing work:

Officer Prosody


Primarily in the tails of the distribution

Prosodically, more respectful with
high F0 variance but low F0 mean

Ongoing work:

Community Member Role


Community member language is
highly contingent upon the officer

Community member swear/anger word usage predicted by officer respect

Ongoing work:

Discourse Structure and Intent


Ontology of
speech acts,
e.g. Responses
to Fault:

Admit, Reject, Justify, Deny Awareness, Request Leniency, Commit to Remedy


Ongoing work:

Officer Perception


0:00:08 0:00:09 OFFICER: Do you know why I pulled you over today?
0:00:10 0:00:11 FEMALE: [unintelligible]
0:00:11 0:00:14 OFFICER: All right, you're missing a front plate on your, your vehicle.
0:00:14 0:00:15 FEMALE: It's the church van.
0:00:16 0:00:16 OFFICER: It's a church van?
0:00:16 0:00:19 FEMALE: It's the [unintelligible], it's around the corner from my church.

  • Ask both community members and police officers:
    • How respectful was the officer? The driver?
    • How anxious was the officer? The driver?
    • How legitimate was the officer's request?
    • How cooperative was the driver?

Extra Slides

Study 2

Iconic Uses




  • High cant,
    low intensity,
    co-occurring gestures
  • More common in vlogs

Framing of Shared Understanding


  • Joint variation:
    high cant, increased speech rate, creak
  • Not the same as common ground!

Conversational Acknowledgements


Extra Slides

Other Projects

Discourse Referentiality

Voigt and Jurafsky (2015)

Word vectors to compute discourse contexts

Word Segmentation

Wang, Voigt, and Manning (2015)




Combining character- and word-level information

Bilingual Prosody

Voigt, Sumner, and Jurafsky (2016)

German-French and German-Italian 2L1

Gender-mediated differences in baseline pitch from both linguistic and cultural context


and Bidialectal

Voigt and Hilton (in prep)


Dutch-Frisian

"Citizen science" data from a smartphone app

Immigration and Oral History


  • with Ran Abramitzsky, Leah Boustan, Dylan Connor, and Peter Catron
Paul Deutsch (AoA: 4.9)
Born: 1902, Arrived 1907
Mory Helzner (AoA: 6.1)
Born: 1914, Arrived 1922
I’ll tell you why. My father went away from the army. The Russian Army with the Japanese Army was fighting at that time. He was a soldier in the Russian Army and he didn’t want to stay there, and he came over here in 1905, my father. Then after a couple, two years more, so he took my mother and three boys up, you understand, three brothers. And, of course, at that time the Revolution was brewing. I was born in 1914. I think it’s important that I indicate the date, March 22, 1914. And it was prior to the Russian Revolution and things were becoming very hectic. And, and all of a sudden the Revolution comes, in 1917, and we’re all in a state of upheaval, a terrible hunger ensured that thousands of people were just dying like flies. And I could witness all this. How we survived is still considered a miracle by me. But fortunately we did.

Corpus-building: RtGender


New dataset! Responses to Gender

Voigt, Jurgens, Prabhakaran, Jurafsky, and Tsvetkov (2018)

  • Facebook (Politicians): 14M Responses to Facebook posts from members of the U.S. House and Senate
  • Facebook (Public Figures): 11M Responses to Facebook posts from other public figures, e.g., television hosts, journalists, and athletes
  • TED: 200K Responses to presentations from TED speakers
  • Fitocracy: 200K Responses to posts about fitness progress
  • Reddit: 1M Responses to Reddit comments across a variety of subreddits

RtGender:

Relevance and Sentiment


15,000 annotations

"broadcast" settings behave differently than "personal"

Dear Whoever


  • Dear Keyboard Warriors, GO VOTE! That ballot counts more than your most RTed tweet. Love, America!
  • Dear Taco Bell, stop being so damn affordable when I’m drunk at night. Sincerely Lexi’s wallet
  • Dear Self, One day, your dreams will come true. Fighting! 😊
  • Dear annoying chick blasting your music in the architecture studio, I'm going to buy you some headphones.
  • dear school, you sucks. you made me know myself better, but unfortunately not in a good way.

Dear Whoever


Interesting temporal dynamics!