LSA 352
Speech Recognition and Synthesis
LSA Summer Institute 2007

COURSE INFORMATION
Instructor
Dan Jurafsky, jurafsky@stanford.edu
Office: Margaret Jacks Hall (bld 460) 113
Time
Mondays and Thudays 3:45-5:30 PM
Location
Bldg 160 Room 319
Textbook

Description
Introduction to automatic speech recognition and speech synthesis. In speech recognition we will learn key algorithms in the noisy channel paradigm, focusing on the standard 3-state Hidden Markov Model (HMM), including the Viterbi decoding algorithm and the Baum-Welch training algorithm. We will also learn about representations of the acoustic signal like MFCC coefficients, and the use of Gaussian Mixture Models (GMMs) and context-dependent triphones for acoustic modeling. Finally, we will cover N-gram language modeling and perplexity. In speech synthesis we will focus on concatenative synthesis, covering text normalization, grapheme-to-phoneme conversion, prosodic modeling, and waveform synthesis. We will also give a brief overview of other speech processing tasks, such as speaker and language ID and the use of forced alignment for automatic phonetic labeling. Course will involve lectures and programming homeworks.

Prerequisites: Strictly Required: Programming ability, a class in Phonetics, and some probability theory [the probability theory can be acquired from presession Math Refresher course] Recommended: any basic intro to computational linguistics, or intro to artificial intelligence

Required Presession Courses: Mathematics Refresher for Computational Linguistics or equivalent
Required Work

  • Homeworks: 3 homeworks. You can work together, and for homework 3 you can use any programming language you want.

  • Readings: To be read before the class period in which they will be discussed. THERE IS A LOT OF READING IN THIS COURSE!!! We are covering what are really two entire fields (speech recognition, speech synthesis) in 7 lectures, and not everything can be covered in each lecture, so you need to do all the reading.

  • Determination of final grade:
    • 75%: 3 homeworks (25% each)
    • 25%: class participation


SCHEDULE
Date
HW
Lec

Topic and Readings

Thu July 6
  Lec 1 (ppt)
Lec 1 (6-up pdf)

Overview of Course, Intro to Probability Theory, and ASR Background: N-gram Language Modeling

Mon July 9
HW 1 due Lec 2 (ppt)
Lec 2 (6-up pdf)

TTS: Background (part of speech tagging, machine learning, classification, NLP) and Text Normalization

Thu July 12
Lec 3 (ppt)
Lec 3 (6-up pdf)

TTS: Grapheme-to-phoneme, Prosody (Intonation, Boundaries, and Duration) and the Festival software

Mon July 16
Lec 4 (ppt)
Lec 4 (6-up pdf)

TTS: Waveform Synthesis (Diphone and Unit Selection Synthesis)

Thu July 19
HW 2 due Lec 5 (ppt)
Lec 5 (6-up pdf)

ASR: Noisy Channel Model, Bayes, HMMs, Forward, Viterbi

Mon July 23

Lec 6 (ppt)
Lec 6 (6-up pdf)

ASR: Feature Extraction and Acoustic Modeling, Evaluation

Thu July 26
HW 3 due Lec 7 (ppt)
Lec 7 (6-up pdf)

ASR: Learning (Baum-Welch) and Disfluencies