Homework 4, due MONDAY 13 February 2005 , 11:59 PM

Implement the Viterbi algorithm for isolated digits (i.e. the first part of the token passing algorithm, before you start passing tokens around). To start with, assume the topology from Problem 2 of Homework 1: each phone is represented by three states, with self-loop transition probability of 0.5 and exit probability of 0.5. Don't forget that each word will start and end with SIL (silence on either side).

To do this in MATLAB (or, frankly, anything else) you'll want to compute frame-by-frame scores for every state (using that maximum (or minimum) equation that we used in the token passing tutorial we did). You may want to take a look at dtw.m from hw2 for a code structure, but remember that there are a few differences between the Viterbi algorithm and dtw -- in this case you're comparing against distributions and there are probabilites of transitioning between states.

Your code can either take as input the the raw wave file (using HW2+HW3), the MFCCs (using HW3), or the phone network posteriors (like Gaussian outputs) that were the input to Homework 1. Your code should calculate the Viterbi score for each input, and then figure out the most likely one. If you have a lot of trouble, start by hand coding the models for one, two and three, and just compare among those (turning this in will give a max score of 95 instead of 100).

For those in *real* trouble, write out an algorithmic description of how you would accomplish this task. (Max score: 90).

Extra credit, 50 points max: Compute appropriate transition probabilities. Since we're only really using one distribution for three states, this is a bit weird, but let's try this (conceptually I think this works, but I haven't tried it): Since each state has the same distribution, we can assume that the duration of the phone is divided evenly over the three states. Thus, the average duration should be mean(phone_duration)/3. For every phone, calculate its mean duration using the labels from HW3. Then compute a self-loop probability that's proportional to the mean state duration. [Details of this are forthcoming; I'm giving you the overall idea here.] Alternatively, you can assume that the transition probability from state 1 to state 2 is 1, as is from state 2 to state 3. Then you just need to worry about self-loop probabilities on state 3 (which should be proportional to mean(phone_duration)-2).


Submission instructions:

Write up all of your answers to the questions in a text editor so that it can be submitted electronically (txt files preferred). Put that file as well as your fsm files (preferrably in separate subdirectories for each problem) in a directory called hw4, and use the submit command to send the files to the grader. The syntax of the submit command is:

submit c794aa lab4 hw4

Make sure that your writeup includes enough instructions that we will be able to run your fsms easily. That means to tell us what files are what.

Have fun!