The Stanford Natural Language Processing Group

This talk is part of the NLP Seminar Series.

Sign-up here: https://forms.gle/Gq5fsbeEU2UE8y8W6

The State of Prompt Hacking

Sander Schulhoff, University of Maryland & Learn Prompting
Date: 11:00am - 12:00pm, Sep 5th 2024
Venue: Room 287, Gates Computer Science Building

Abstract

We ran the largest-ever global prompt hacking competition, which 13 leading AI companies sponsored. We elicited 600K+ adversarial prompts from 3,000+ competitors from 50+ countries. We examined the following questions in our paper, HackAPrompt, which won Best Theme Paper at EMNLP: What are the implications of prompt injection and jailbreaking (collectively, prompt hacking? What defenses work against them? Which don’t? What is the difference between them? How can we study them at scale? Do attacks transfer across models? Which attacks work best?

Bio

Sander Schulhoff is a researcher from the University of Maryland and the CEO of LearnPrompting.org, the first guide on prompt engineering. His background is in NLP/DRL research, and his current work focuses on examining prompt hacking and prompt engineering through large-scale studies.