Homework 2, due 29 January 2006, 11:59 PM
This homework is about frequency analysis. You have three options for
this homework. Problem 1 is worth 95 points, problem 2 is worth 100
points, and doing both problems 1 & 2 is worth 150 points total.
Problem 1
Implement the Radio Rex detector as outlined in the practical that we
did in class on Thursday, 1/19. Explore the design space of potential
filters on the log spectral energy to see if you can come up with a
good detector of the /eh/ in Rex (i.e., by using energy around 500
Hz). Demonstrate the efficacy of different choices on the 10 examples
provided.
Problem 2
Implement MFCCs. This is very similar to the spectrogram building
exercise you did in class, with just a few more steps; it's pretty
straightforward in MATLAB. You can practice at first using the rex
data. I'm only going to give more detail on the steps you might not be
familiar with.
NOTE: there are MATLAB MFCC implementations out there, which you can
look at for guidance, but I want you to turn in your own code.
Here are the basic steps:
- Read in the wave file. This should be sampled at 16000 Hz.
- First window the utterances. The window size should be 25 msec and frame step size should be 10 msec. This translates to 400 frames with a shift of 160 frames (why?).
- Then apply a Hamming window to each sample.
- Next, take the FFT of the data for that window.
- Now gets to be the tricky part. You'll want to figure out how to do the mel frequency binning. You can develop the binning filters offline (since they don't change from sample to sample). Here's the main idea. Take a look at the filters in slide 39 (which is the same as figure 6.28 in HAH). You'll see that the H are defined in terms of points f along the frequency scale.
- First you'll need a function that computes the points f, which is defined in terms of B and Binv (B-inverse) -- equations 6.142, 2.6, and 6.143 are repeated here:
- B(f)=1125 ln (1+f/700)
- Binv(b)= 700(exp(b/1125)-1)
- f[m]=(N/Fs) Binv(B(fl)+m((B(fh)-B(fl))/(M+1)))
OK, that leaves a lot of terms undefined. fl and fh are the lower and upper frequencies of the filterbank, N is the number of points in the FFT, Fs is the sampling frequency, and M is the number of filters. The main idea is that you evenly divide up the Mel scale into M bins (which will give you B(fh) and B(fl)). You know the first point f[0]=0, and the last point f[m]=B(Fs/2). Try M in the range of 24 to 40.
- Given all of this you can now define filters H in terms of the f[m]. Either use slide 39/equation 6.140 in HAH, or equation 6.141 in HAH (which may be easier), but be consistent!
- Once you have the H, compute S[m] by multiplying the squared magnitude of X[k] by Hm[k], and summing the results (eqn 6.144, slide 40).
- What's the squared magnitude of X? Remember that X is complex, so square(|X|) = square(real(X)) + square(imag(X)).
- Now you can compute the cepstral coefficients by using the dct function in matlab (or using equation 6.145). Take only the first 13 coefficients. This gives you c[n] for that time frame. Repeat the above for all other time frames.
What do you do with this when you're done? Well you can plot it,
first off, and see what it looks like. Then, here's a test. I've
written a little function dtw (dynamic time warp,
/class/cse794L/fosler/hw2/dtw.m) that gives a score comparing two
waveforms. In the directory /class/cse794L/fosler/hw2/digits I've put
wavefiles containing the numbers 1-9,zero,and oh. There are two
copies of each (a and b). Use the a files for training DTW templates, and b
files for testing (see below):
- For each training file, compute MFCC(file) and store as a template.
- For each testing file, compute MFCC(file) and compare against all stored templates. The best match is the one with the lowest score.
- How often does the test utterance pick out the correct template?
- What happens if you leave off the first cepstral coefficient (which is the overall energy)?
Turn in your MFCC code for the grader to evaluate, with instructions
on how to run it.
Submission instructions:
Write up all of your answers to the
questions in a text editor so that it can be submitted electronically
(txt files preferred). Put that file as well as your fsm files
(preferrably in separate subdirectories for each problem) in a
directory called hw1, and use the submit command to send
the files to the grader. The syntax of the submit command is:
submit c794aa lab2 hw2
Make sure that your writeup includes enough instructions that we will
be able to run your fsms easily. That means to tell us what files are
what.
Have fun!