Speech Recognition and Synthesis

Overview

We are interested in many areas at the intersection of sophisticated linguistic analysis and modern algorithms for speech recognition and synthesis. Recent work includes CRF-based acoustic models for speech recognition, prosody (prediction of pitch accents from text, and detection of pitch accents from speech), disfluencies, and linguistic error analysis. Earlier work focused on pronunciation modeling, and syntactically and semantically enriched language models.

People

Members:
- Dan Jurafsky
Alumni/Alumnae:
- Jason Brenier, now at Ernst & Young
- Sharon Goldwater, now Lecturer, Department of Informatics, University of Edinburgh
- Thad Hughes, now at Google
- Ani Nenkova, now Assistant Professor, Computer and Information Science, University of Pennsylvania
- Yun-Hsuan Sung, now at Google Research
- Jiahong Yuan, now Assistant Professor, Linguistics, University of Pennsylvania

Papers

Below is a selection of publications in speech recognition and synthesis.

Sharon Goldwater, Dan Jurafsky, and Christopher D. Manning. 2010. Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication 52, 181-200. [PDF]
Yun-Hsuan Sung and Dan Jurafsky. 2009. Hidden Conditional Random Fields for Phone Recognition. ASRU 2009. [pdf]
Yun-Hsuan Sung, Constantinos Boulis, and Dan Jurafsky. 2008. [ Maximum Conditional Likelihood Linear Regression and Maximum A Posteriori for Hidden Conditional Random Fields Speaker Adaptation. IEEE ICASSP 2008, 4293-4296.
Vivek Kumar Rangarajan Sridhar, Ani Nenkova, Shrikanth Narayanan and Dan Jurafsky. 2008. Detecting prominence in conversational speech: pitch accent, givenness and focus. In Proceedings of Speech Prosody, Campinas, Brazil, 453-456.
Yun-Hsuan Sung, Constantinos Boulis, Christopher Manning and Dan Jurafsky. 2007. Regularization, Adaptation, and Non-Independent Features Improve Hidden Conditional Random Fields for Phone Classification. In IEEE ASRU 2007. 347-352.
Volker Strom, Ani Nenkova, Robert Clark, Yolanda Vazquez-Alvarez, Jason Brenier, Simon King, and Dan Jurafsky. 2007. Modelling Prominence and Emphasis Improves Unit-Selection Synthesis. Interspeech 2007
Constance Clarke and Dan Jurafsky. 2006. Limitations of MLLR Adaptation with Spanish-Accented English: An Error Analysis. Proceedings of INTERSPEECH-2006, Pittsburgh, PA.
Jason Brenier, Ani Nenkova, Anubha Kothari, Laura Whitton, David Beaver, Dan Jurafsky. 2006. The (Non)Utility of Linguistic Features for Predicting Prominence in Spontaneous Speech. IEEE/ACL 2006 Workshop on Spoken Language Technology, Aruba.