The Stanford Natural Language Processing Group

Elizabeth Shriberg
SRI & ICSI Berkeley

Adventures in Prosody modeling for Speech Processing

Abstract

This talk describes a "direct modeling" approach for incorporating prosody (the rhythm and melody of speech) into the automatic processing of spontaneous speech. In contrast to methods that train models from hand-labeled prosodic, our approach is fully automatic and requires no human annotation of prosody. I'll first provide an overview of methods for feature processing, machine learning techniques for predicting target classes from prosodic features, and approaches for combining prosodic models with information from statistical language modeling. Following the general approach, I'll discuss a range of interesting speech processing problems to which this general framework of prosodic modeling has been successfully applied. These include automatic punctuation detection, disfluency modeling, dialog act segmentation and classification, emotion recognition, and speaker recognition. Data come from a range of corpora of spontaneous speech, including human-computer dialog, telephone conversations, and multi-party meetings.