CSE 730 Homework 4: Speech Recognition & Natural Language Processing

Due Monday, 26 November 2007, 11:59 PM (electronic)

or Monday, 26 November 2007, 4:30 PM (paper) in my mailbox

2 point penalty for every hour late, no credit after 2pm, 27 Nov 2007

Note: the original deadline was ambiguous (Monday, 27 November) -- this interpretation was by class vote. I am relaxing the late penalty for those who didn't get the message.

This is a pencil-and-paper only homework. Please see the submission instructions below.


  1. You have a speech recognition system trained with one state per phone, and acoustic observations are vector quantized (meaning we can have discrete observations rather than continuous). Given the following probability tables and pronunciation model, first draw the FSA HMM representation for each word, and calculate the most likely word for the observations C1,C3,C4,C6,C6,C8?
    Prior word probabilities:
    P(dog)=0.4
    P(cat)=0.2
    P(dot)=0.4
    Word Pronunciation {state self loop prob, state exit prob}
    dog d {0.4,0.6} ah {0.9,0.1} g {0.4,0.6}
    cat k {0.3,0.7} ae {0.8,0.2} t {0.3,0.7}
    dot d {0.4,0.6} ah {0.9,0.1} t {0.3,0.7}
    Phone P(VQ vector | phone)
    ae C4: 0.5 C5: 0.2 C6: 0.3
    ah C4: 0.3 C5: 0.3 C6: 0.4
    d C1: 0.5 C2: 0.2 C3: 0.1 C8: 0.2
    g C6: 0.2 C7: 0.3 C8: 0.5
    k C1: 0.1 C2: 0.7 C3: 0.2
    t C1: 0.2 C7: 0.4 C8: 0.4
    You can use the Viterbi approximation (finding the best path for each word). But you must show how you know that it's the best path. Otherwise, show the probabilities for all paths.
  2. Consider the sentence "You ran two races quickly." and the following lexicon:
    Pronoun -> I
    Pronoun -> you
    Pronoun -> she
    V -> ran
    V -> run
    Cardinal -> two
    Cardinal -> one
    N -> races
    Adv -> quickly
    
    (a) For each of the following grammars, if the grammar parses the sentence, give the parse, or if it doesn't parse, suggest a rule that can help make it parsable.
    Grammar A:
    S -> NP VP
    NP -> Pronoun
    NP -> Det N
    NP -> Cardinal N
    VP -> V Vmod
    Vmod -> Adv Vmod
    Vmod -> Adv
    
    Grammar B:
    S -> NP VP
    NP -> Pronoun
    NP -> Det N
    NP -> Cardinal N
    VP -> VP Adv
    VP -> V
    VP -> V NP
    
    (b) Given the lexicon above, choose one of the grammars and show two sentences that are syntactically correct given the grammar, but not correct English. The two sentences should point out different problems in the grammar. For each sentence, how could you change the grammar (in general terms, not specifics) to rule out these ill-formed sentences?
Extra credit (30 points): Implement the Chart Parsing (Earley) Algorithm (see pages 800-801) and demonstrate on the grammars above.

Submission instructions:

Write up all of your answers to the questions in a text editor so that it can be submitted electronically. Put that file as well as your programs, and ONLY those files, in a directory called hw4, and use the submit command to send the files to the grader. The syntax of the submit command is:

> submit c730aa lab4 hw4

You can learn about using the submit command by typing man submit at a unix command prompt. If you cannot get the submit command to work, you can email your files to the grader, but please don't use this option unless you're having trouble with submit.

Eric Fosler-Lussier
Last modified: Mon Nov 26 22:11:31 EST 2007