15-492/18-492:
Speech Processing

Description: Speech Processing offers a practical and theoretical understanding of how human speech can be processed by computers. It covers speech recognition, speech synthesis and spoken dialog systems. The course involves practicals where the student will build working speech recognition systems, build their own synthetic voice and build a complete telephone spoken dialog system. This work will be based on existing toolkits. Details of algorithms, techniques and limitations of state of the art speech systems will also be presented. This course is designed for students wishing understand how to process real data for real applications, applying statistical and machine learning techniques as well as working with limitations in the technology.
Instructor(s): Alan W Black
Teaching Assistant: Nguyen Bach
Prerequisites: 15-211 for SCS undergraduates, exemption from this requirement requires the instructor's permission.
Availability: Open to juniors and seniors in the SCS undergraduate program and ECE Undergraduate program. Open to other students with the consent of an instructor.
Materials: The text required for the course will be "Spoken Language Processing" by Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Prentice Hall (ISBN 0-13-22616-5). This book will be used for reading assignments, and background reading for homeworks and exams.

Homework: Homework consists of two components: occasional Weekly brief reading assignments and four programming projects (Speech Recognition, Speech Synthesis, Spoken Dialog Systems, and one other).
Grading: 10% class participation, 60% programming projects, 10% readings homework, 20% final.
Course policies: Late homework , Cheating
Time: MWF 3:30-4:20
Location: DH 1117
Final exam: TBA
Syllabus: Reading assingment set (in slides)
Date    Topic                     Slides
Aug 24th       Course Overview slides
Aug 26th Human Speech slides
Aug 28th Computer Speech slides
Aug 31st TTS: Text analysis slides
Sep 2nd TTS: Pronunciation slides
Sep 4th no class
Sep 7th Labor Day: no class
Sep 9th TTS: Prosody slides
Sep 11th TTS: Waveform 1 slides
Sep 14th TTS: Waveform 2 slides
Sep 16th TTS: Building voices slides
Sep 18th TTS: Evaluation slides
Homework1 due before class Friday Oct 2nd
Sep 21st TTS: Signal Processing slides
Sep 23rd TTS: Voice conversion slides
Sep 25th TTS: Talking Heads and Singing slides
Sep 28th ASR: Template Matching and Signal Processing slides
slides
Sep 30th ASR: HMMs slides
Oct 2nd No lecture (HW1 due)
Oct 5th ASR: Acoustic models slides
Reading assigment due today or wednesday
Oct 7th ASR: Language models slides
Reading assigment due today
Oct 9th ASR: Language models 2 slides
Homework2 due before class Friday Oct 23rd
Oct 12th ASR: Systems slides
Homework2 due before class Friday Oct 23rd
Oct 14th Multilingual Speech Processing slides
Oct 16th Mid-semester -- no class
Oct 19th SPICE slides
Oct 21st Spoken Dialog Systems: Intro slides
Oct 23rd Spoken Dialog Systems: Components slides
Oct 26th Spoken Dialog Systems: VoiceXML slides
Oct 28th Spoken Dialog Systems: beyond simple dialogs; Olympus intro slides
slides
Oct 30th Spoken Dialog Systems: Olympus Details slides
Homework3 due before class Friday Nov 13th
Nov 2nd Spoken Dialog Systems: Olympus Details slides
Nov 6th Speech to Speech Translation slides
Nov 9th Speech to Speech Translation slides
Nov 9th Speech to Speech Translation: Demo no slides
Nov 13th Speaker ID Homework4 due before class Wednesday Dec 2nd