Homework 1, due 20 January 2005, 11:59 PM

In this homework, you will get some experience using the AT&T FSM toolkit. I have installed the linux version of this toolkit in /class/794L/fosler/bin.linux (if you're logged into the linux compute cluster (lcc machines) and the solaris version in /class/794L/fosler/bin.solaris (for stdsun, suncc machines). You may want to add the appropriate directory into your path:

csh-type shells:

 
set path = ( /class/794L/fosler/bin.MACHINETYPE $path)

bash-type shells:

PATH=/class/794L/fosler/bin.MACHINETYPE:$PATH

You can feel free to work on your own computer (download the fsm toolkit from AT&T if you want to do this), but you will still need to upload to the unix system in order to submit your homework.

You will need to look over the tutorial I handed out in class for how to use the fsm toolkit. You may also want to leaf through the FSM tutorial from AT&T to see what the operators do. You will need commands like fsmcompile, fsmcompose, and fsmunion. (You may need other commands to do the problems as well.)

The solution to each problem will have both the FSMs that you create, as well as a writeup (in text format) that describes how the TA should run your code. You should use the samples given here (and possibly make up some of your own) to test your code. We will also test your transducers with other examples. In order to make sure that your FSMs will work with ours, please do not modify the vocabulary files given here. More specifically, you can add vocabulary items to the .voc files, but we will not use them during testing, so make sure that your input/output is in terms of the .voc files given here (and in /class/794L/fosler/hw1).

Problem 1

This problem is designed to get you thinking about how to represent sequences of symbols in finite state machines. We will be writing transducers to evaluate poker hands. The deck of cards we will be "using" only has the cards of rank 8 through Ace (8, 9, 10, Jack, Queen, King, Ace).

A poker hand consists of five cards, sorted in ascending order. (Each hand is guaranteed to be ascending as input; you don't need to check this.) Each "card" is represented by its rank and suit, encoded as two separate symbols. For example, a poker hand might consist of the 8 of Hearts, 8 of Spades, 10 of Diamonds, Jack of Diamonds, King of Spades. This can be represented as a linear sequence of pairs of card symbols (in fsm text format):

0 1 EIGHT
1 2 HEARTS
2 3 EIGHT
3 4 SPADES
4 5 TEN
5 6 DIAMONDS
6 7 JACK
7 8 DIAMONDS
8 9 KING
9 10 SPADES
10
Alternatively, you can use a rank-then-suit representation. This may make the problem easier to do, although it is not as straight-forward of a representation for people to read. Here is the same hand in rank-then-suit format, where the rank between n and n+1 corresponds to the suit between n+5 and n+6:

0 1 EIGHT
1 2 EIGHT
2 3 TEN
3 4 JACK
4 5 KING
5 6 HEARTS
6 7 SPADES
7 8 DIAMONDS
8 9 DIAMONDS
9 10 SPADES
10
Please indicate clearly in your writeup which representation you are using.

Here are the hands that you need to be able to represent:

For extra credit, you can also do these hands (you may want to do problem 2 and come back): There should be two parts to the answer: first, create a transducer that converts each type of hand to a single symbol describing the hand. The second part of your answer should be a single transducer to convert any hand into any rule, which would be the union of the set of transducers representing the rules.

Use the following vocabulary files for the input and output of the transducer:

cards.voc:

EIGHT 1
NINE 2
TEN 3
JACK 4
QUEEN 5
KING 6
ACE 7
CLUBS 8
DIAMONDS 9
HEARTS 10
SPADES 11

outcomes.voc (the - is the epsilon symbol, which is always symbol 0)

- 0
ONE-PAIR 1
TWO-PAIR 2
THREE-OF-A-KIND 3
FOUR-OF-A-KIND 4
FULL-HOUSE 5
STRAIGHT 6
FLUSH 7
STRAIGHT-FLUSH 8
ROYAL-STRAIGHT-FLUSH 9

Here's a hint: since the card sequence is always the same length (10 symbols) you don't have to worry about encoding that into your constraints. If there were only three ranks (8, 9, 10), then this would be a transducer that converted a pair of eights to ONE-PAIR, assuming rank-then-suit encoding:

0 0 NINE -
0 0 TEN -
0 1 EIGHT -
1 2 EIGHT ONE-PAIR
2 2 NINE -
2 2 TEN -
2 3 SPADES -
2 3 HEARTS -
2 3 DIAMONDS -
2 3 CLUBS -
3 3 SPADES -
3 3 HEARTS -
3 3 DIAMONDS -
3 3 CLUBS -
3
Style and glory points ;-) will be given if you can figure out ways to combine constraints from lower hands to make a higher hand (e.g., straight flush). This isn't required, but cleverness will make the coding much easier.

Write up in your text commentary how you chose to go about creating each transducer.

Problem 2

In this problem, you will be writing a speech recognition decoder for isolated digits of English. There are eleven digits -- one through nine, zero, and oh. The data can be found in /class/794L/fosler/hw1/prob2

I will give you the output of a neural network classifier that classifies every 10 msec frame of speech as one of 21 phones. The list of phones can be found in the file phn.voc. There are 22 sentences in this dataset; the data are encoded as one large fsm archive (far) file called isodigits-data.far. This is just a bunch of fsm files concatenated together using the "cat" command. You can split the far file into individual sentences using

farsplit -r sent isodigits-data.far
Take a look at the structure of the first sentence's output using
fsmprint -i phn.voc sent0001.fsa | less
You'll see that between every pair of nodes there are 21 arcs, corresponding to the 21 outputs of the network. Each output represents P(phone|acoustics). [Technically, what we're doing here is wrong and should be divided by P(phone), but we'll get to that in a later lecture. This is just convenient and works for the time being.]

Your job is to develop three FSMs: dur.fst, dict.fst, and lm.fsa, which can be used to decode this neural network output.

dur.fst

This is a transducer which converts multiple frames of output into a single phone symbol. This will be similar to the transducer in the class notes, except we don't have three different kinds of distributions.

Build this transducer with the following specifications:

dict.fst

This is a transducer that converts multiple phones into a single word symbol. I have given you the correct dictionary in text format; you must convert it to fsm format.

Your dictionary fst should be able to recognize a sequence of words. The language model will take care of the fact that this is an isolated task. This way, we can reuse this dictionary in subsequent homeworks.

The input to the transducer should be encoded in terms of phn.voc, and the output in terms of words.voc.

Your dictionary should also be set up to absorb SIL (silence) phones between words. This can be done by transducing SIL to - (epsilons) between every word.

lm.fsa

This is an automaton that specifies the language. In this case, we have an isolated word recognizer: it should output exactly one digit. There is no preference for one digit over another, so they can be equally weighted.

Specify an automaton (not a transducer) that representes this constraint on the language. This should be encoded in terms of words.voc

Combining

You should be able to compose all of these automata to get a graph of all possible word sequences through this input. However, you are only interested in the best sequence (i.e. most likely), which you can get using fsmbestpath. You can also just get the output words by applying fsmproject -2 (which outputs the second set of symbols), and remove all of those epsilons using fsmrmepsilon to get a more compact answer.

Quick tip: you can apply a processing chain to an entire far file (rather than individual fsm sentences) using farfilter "processing-chain" farfile. See the FSM docs for more details.

Questions to consider:


Submission instructions:

Write up all of your answers to the questions in a text editor so that it can be submitted electronically. Put that file as well as your fsm files (preferrably in separate subdirectories for each problem) in a directory called hw1, and use the submit command to send the files to the grader. The syntax of the submit command is:

submit c794aa lab1 hw1

Make sure that your writeup includes enough instructions that we will be able to run your fsms easily. That means to tell us what files are what.

Have fun!