CSE 730 Homework 2: Markov Models, Machine Learning

Due Monday, 29 October 2007, 11:59 PM

10 point penalty for every hour late

Small clarification on Problem 1d -- the sequence starts at the beginning of the day.
  1. You are working in the VIS-O-MATIC television factory. You need to produce two types of TVs: in one country (country A), televisions are required to have a chip that blocks out any kind of polka music. Another country (country B) requires that the chip not be present. So each TV comes down a conveyor belt and either has the anti-polka chip installed, or a dummy chip installed.

    For the first few problems, you can assume that you can tell with 100% accuracy what chip is installed. In the control room, there is a switch that controls what type of chip is installed; at the start of the day there's a 70% chance that it's in anti-polka mode, and 30% chance that it's in dummy mode. When the system is turned on, it starts installing chips into TVs on the conveyor belt. (You can assume that the state of the switch for any time t is the same as the kind of chip installed at time t -- that is, they are the same variable. THIS IS A SIGNIFICANT HINT. IT MAKES THE PROBLEM MUCH SIMPLER.)

    However, some devious professor has let their chimpanzee into the control room, and she starts flipping the switches in the control room. The switch can only be flipped between TVs (i.e. we don't install half of a chip). There is a 30% chance between every installation that the switch will be flipped (changing the type of chip installed).

    1. Draw a Dynamic Bayes network (DBN) representing the situation. Rememeber the CPTs!
    2. Assuming that this is a stationary process, how many parameters are there in this system?

    Now, what if a second set of chips is installed, controlling the language? There are three chips (LanguageA/LanguageB/LanguageC), and set of push-buttons that controls them as well (pushing a button for a particular language will exclusively set the chip to that language -- only one button is active at a time). The probability of the buttons being in LanguageA mode at the beginning of the day is 60%, and LanguageC is 20%. Again, the probability of the switch being flipped by the chimp is 30% between every TV (evenly divided between the other languages -- when in state "LanguageA" there's a 15% chance of LanguageB and 15% of LanguageC button pushing). The probability of the polka switch being flipped and language buttons being pushed is independent.

    1. Now draw a dynamic Bayesian network for both production of both chips.
    2. What is the probability that the machine will produce, at the beginning of the day, in order: {LanguageA/Anti-Polka, LanguageB/Anti-Polka, LanguageB/Dummy}

    Now assume that you can't tell what kind of chips are inside the TV. However, you can weigh the TVs, and the chips will probabilistically affect the weight. Use the following table for P(Weight|Chip1, Chip2):

    Chip 1 Chip 2 P(weight=heavy|Chip1,Chip2) P(weight=medium|Chip1,Chip2) P(weight=light|Chip1,Chip2)
    Anti-Polka LanguageA 0.7 0.2 0.1
    Anti-Polka LanguageB 0.5 0.3 0.2
    Anti-Polka LanguageC 0.4 0.3 0.3
    Dummy LanguageA 0.3 0.5 0.2
    Dummy LanguageB 0.1 0.6 0.3
    Dummy LanguageC 0.1 0.4 0.5

    1. Draw the dynamic Bayes net (DBN) for this new situation.
    2. If the first two TVs were heavy and light respectively, what are the probabilities of the chips inside these TVs?

  2. OSU Restaurant problem: just like the authors of the book, I've gotta eat lunch. I frequent one of four places: Buck-I-Mart, Oxley's, Oxley's cafe, or Brennan's. There are nine potential factors:

    Your job is to train two different classifiers (below) to predict where I eat lunch. I have provided you training and test materials in three formats: Text-attribute format, with the last column the name of the restaurant (train) (test), A coded format, where the text attributes are replaced with integers (starting from 0, in the order above) (train) (test), and a binary "one-hot" version which codes the nine factors in the first 30 columns, and the restaurant in the last 4. A "one-hot" encoding assumes that if you have n options, the value you want to encode will be marked with a 1 and the others with a 0 (e.g., Oxley's is restaurant 2 of 4, so the last 4 columns would be 0,1,0,0) (train) (test). This last encoding may be useful for training a perceptron. The contents of the training files are exactly the same, just a different encoding; same with the test files.

    1. Run the perceptron learning rule on these data and show the final weights. Classify the test examples and show the performance on each subset of the test data.
    2. Now run the decision tree learning rule on these data, and classify the test set.
    3. Comment on the results you get. Which does better, and speculate why?

  3. This exercise will give you some practice coding the EM algorithm. We will apply the mixture of Gaussian unsupervised clustering technique to one-dimensional data (similar to the demo in class). For reference, section 20.3 goes into detail about how to construct a mixture of Gaussian for n-dimensional data.

    You will find three sets of data here: file1(txt) file2(txt) file3(txt). You will need to use the code you developed in hw1 for reading in data files, and calculating means and standard deviations.

    Create a program to estimate the means, standard deviations, and weights of a mixture of Gaussians via the EM algorithm. You will probably want to have three functions: one that performs the expectation step, one that performs the maximization step, and one that runs the outer loop.

    1. Data set 1 was created with 3 Gaussians. Run your EM algorithm with three randomly initialized means (you'll probably want to make the std deviations large and the weights equiprobable at first). Give the resulting means, std deviations, and weights (that is, the model parameters). Plot the log likelihood of the data as a function of iteration.
    2. How does the log likelihood change if you use 2 or 4 Gaussians? What happens to the parameters of the model?
    3. Data sets 2 and 3 were constructed from an unknown number of Gaussians. Experiment with different numbers of Gaussians and describe your model fit.

Submission instructions:

Write up all of your answers to the questions in a text editor or pdf file so that it can be submitted electronically. No Word documents. Put that file as well as your programs, and ONLY those files, in a directory called hw2, and use the submit command to send the files to the grader. The syntax of the submit command is:

> submit c730aa lab2 hw2

You may also choose to hand in the non-code portion of the assignment by 5pm on the due date in my mailbox.

FOR CODE, YOU MUST SUBMIT ONLINE. TAKE THE TIME TO FIND OUT HOW IF YOU DON'T KNOW HOW.

For the code, you should turn in either

You can learn about using the submit command by typing man submit at a unix command prompt. If you haven't figured out how to log into the CSE dept servers, talk to the TA or the professor. We will test your program on the CSE dept unix computers, so make sure that the program you turn in is built to run on Solaris. Don't depend on cross-platform language implementations, you should make sure your code runs on the CSE dept Solaris system. If your program doesn't execute successfully, we will not attempt to recompile it or otherwise correct the error. Your program will be graded solely on whether its actions are appropriate to the inputs we supply in our testing, not on your programming style or design sophistication.

Eric Fosler-Lussier
Last modified: Wed Oct 17 21:49:37 EDT 2007