CIS 788K04
Foundations of Spoken Language Processing

Syllabus (pdf)
Readings

Presentations

Meeting time: TR 9:30-10:45 am
Meeting place: Dreese Labs 266 (first class) DL 698 thereafter
Instructor: Prof. Eric Fosler-Lussier
Office hours: T 10:45-12:00, R 1-2 pm

Recognizing connected digits
I have placed the tidigits corpus in /n/gold/15/c788ac/common/tidigits; this comes with a training and test data set, with both male and female speakers.

The files are all encoded with "shorten", which is an audio compression scheme. You can decode them with w_decode [more details needed here].

Each of the wave files, when decoded, will have a 1024-byte header and then 16-bit raw pcm data; you can get rid of the headers with h_delete. HTK might be able to handle the headers, I think for sonic you need to delete them.

The wave file name encodes the digit string. For example "4z12oa.wav" would mean "four zero one two oh". The "a" (or final character) gives different versions of the same file.

Your job is to train a system using the data in train, and test the system using the test data. You should try to train a system using a small subset of the wave files first, to make sure that things are working, before setting up a full run of the training data.

A second task might be to train gender-dependent models, and then choose the output of the system that produces the best score.

Depending on the toolkit, you might need extra tools to get things working. Feel free to stop by/make appointments/ask in email for all of your resource questions.


Eric Fosler-Lussier
Last modified: Wed Feb 4 17:25:19 EST 2004