|
COURSE INFORMATION
|
|
Instructor |
Dan Jurafsky, jurafsky@stanford.edu |
Time |
Mondays and Thudays 3:45-5:30 PM |
Location |
Bldg 160 Room 319 |
Textbook |
|
Description |
Introduction to automatic speech recognition and speech synthesis. In speech recognition we will learn key algorithms in the noisy channel paradigm, focusing on the standard 3-state Hidden Markov Model (HMM), including the Viterbi decoding algorithm and the Baum-Welch training algorithm. We will also learn about representations of the acoustic signal like MFCC coefficients, and the use of Gaussian Mixture Models (GMMs) and context-dependent triphones for acoustic modeling. Finally, we will cover N-gram language modeling and perplexity. In speech synthesis we will focus on concatenative synthesis, covering text normalization, grapheme-to-phoneme conversion, prosodic modeling, and waveform synthesis. We will also give a brief overview of other speech processing tasks, such as speaker and language ID and the use of forced alignment for automatic phonetic labeling. Course will involve lectures and programming homeworks. |
Required Work |
|
SCHEDULE
|
||||
Date
|
HW
|
Lec
|
Topic and Readings |
|
Thu July 6 |
Lec 1 (ppt) Lec 1 (6-up pdf) |
Overview of Course, Intro to Probability Theory, and ASR Background: N-gram Language Modeling
|
||
Mon July 9 |
HW 1 due |
Lec 2 (ppt) Lec 2 (6-up pdf) |
TTS: Background (part of speech tagging, machine learning, classification, NLP) and Text Normalization
|
|
Thu July 12 |
Lec 3 (ppt) Lec 3 (6-up pdf) |
TTS: Grapheme-to-phoneme, Prosody (Intonation, Boundaries, and Duration) and the Festival software
|
||
Mon July 16 |
Lec 4 (ppt) Lec 4 (6-up pdf) |
TTS: Waveform Synthesis (Diphone and Unit Selection Synthesis)
|
||
Thu July 19 |
HW 2 due |
Lec 5 (ppt) Lec 5 (6-up pdf) |
ASR: Noisy Channel Model, Bayes, HMMs, Forward, Viterbi |
|
Mon July 23 |
Lec 6 (ppt) Lec 6 (6-up pdf) |
ASR: Feature Extraction and Acoustic Modeling, Evaluation |
||
Thu July 26 |
HW 3 due |
Lec 7 (ppt) Lec 7 (6-up pdf) |
ASR: Learning (Baum-Welch) and Disfluencies |