CSE 794L: Foundations of Spoken Language Processing
TR 2-3:18, 304 Journalism
Instructor: Eric Fosler-Lussier, fosler at cse
Description
Fundamentals of automatic speech recognition and speech synthesis; lab projects concentrating on building systems to process speech.
Level, Credits, Class Time Distribution, Prerequisites
|
Level
|
Credits
|
Class Time Distribution
|
Prerequisites
|
|
UG |
3 |
2 1.5-hr cl |
625 or Ling 484.01; 630; 730 or Stat 428 |
Quarters Offered
General Information, Exclusions, Cross-listings, etc.
-
Class time is divided between lectures and group practicals.
Intended Learning Outcomes
-
Master fundamental concepts in automatic speech recognition, such as "hidden Markov models", acoustic modeling, language modeling.
-
Master fundamental concepts in text-to-speech synthesis, such as concatenative synthesis and text analysis.
-
Be familiar with a finite state framework integrating all of speech processing.
-
Be familiar with methods of constructing speech recognition and synthesis systems.
-
Be exposed to current speech processing research.
-
Be exposed to toolkits for both speech recognition and speech synthesis.
Texts and Other Course Materials
-
Spoken Language Processing: A guide to theory, algorithms, and system development - X. Huang, A. Acero, and H.-W. Hon (Recommended)
-
Speech Synthesis and Recognition, 2nd edition - J. Holmes and W. Holmes (Supplementary)
Topics
|
Number of Hours
|
Topic
|
| 3 |
Human hearing, acoustics, and phonetics |
| 3 |
Finite state transducers |
| 2 |
ASR toolkits |
| 3 |
Dynamic time warping and acoustic modeling |
| 4 |
HMMs, expectation-maximization, and search |
| 3 |
Language modeling |
| 3 |
Text analysis |
| 2 |
Speech synthesis |
| 1 |
Speech processing in context (systems) |
| 1 |
Speaker recognition |
| 1 |
Quizzes |
| 4 |
Project presentations |
Representative Lab Assignments
-
Exercises using the AT&T FSM toolkit, building pronunciation models for a FSM recognizer
-
Train an acoustic model for the FSM recognizer
-
Rescore word hypothesis lattice using different language models
-
Text normalization using FSMs
Grades
| Homeworks |
40% |
| Final Project |
30% |
| Exams (2 x 10%) |
20% |
| Participation |
10% |
Expectations
This course will be a hybrid lecture/seminar course. Each week we
will explore a new topic in spoken language processing. On Tuesdays,
I will lecture to give everyone a grounding in the topic. Most
Thursdays will be reserved for hands-on group work. You must do the readings before class, especially on
Thurdays, since you will be expected to participate in the group
discussions. The schedule of readings will be kept
on the calendar.
The Thursday hands-on assignments will usually require people to bring
in laptops (one per group); let me know if you can bring a laptop to
class, and if so, what OS it's running.
There will be approximately 6 labs, each carrying equal weight;
however, you are only required to do four of them. (This is to give a
bit of flexiblity in terms of scheduling your work.) Labs are to be
done individually or in teams of 2, although you may not have the same
partner for more than one lab.
There will be two 30-minute quizzes, each worth 10% of the final
grade. These are mainly to ensure that you're keeping up with
material. There will be no final exam.
You will be required to participate in a final group project involving
spoken language processing. You will be required to:
- research some area of spoken language processing not extensively covered in class,
- implement some algorithm (either existing or novel) in this area,
- evaluate the performance of the algorithm,
- write a final report putting the work you have done into the appropriate research context (i.e., summarizing the three points above), and
- give a 20 minute oral presentation on the work in the final week of class.
Groups will be composed of 2-3 students; the level of expertise in the
group will define my expectations (for example, a group of 1 grad and
1 undergrad would be expected to do less work than 3 grad
students). You are encouraged (but not required) to form groups with
people of different backgrounds. We will begin forming groups sometime
in week 3, and then I will meet with each group to try to help define
projects. Individual projects will only be permitted after
serious, significant consultation with the instructor.
You have been (or will be) given access to the departmental linux
servers for this class (numbered lcc03e - lcc11e). These are
relatively slow linux servers, but you are the only people using them,
so you can run without CPU limitations that are on stdsun.
Policy on Academic Misconduct
As with any class at this university, you are required to follow the
Ohio State "Code of Student Conduct." If you are unfamiliar with this
policy, you should read it at http://oaa.osu.edu/coam/code.html.
In particular, you should note that you are not allowed to, among
other things, (a) knowingly provide or receive information during
exams, (b) knowingly provide or receive assistance on homeworks unless
I say it's OK, and (c) submit plagiarized (copied but unacknowledged)
work for credit. If any violation occurs, I am required to
report the violation to the Council on Academic Misconduct.