CSE 730 Homework 1: Probability theory, Bayes' Nets

Due 9 October 2007, 11:59 PM

10 point penalty for every hour late (or part thereof)

Your responses to these questions can either be submitted electronically, or by 5 pm on the 9th in my mailbox. For electronic submission, please see the submission instructions below.
  1. (30 points) [edit 6Oct 10:15 to correct explanation] The star quarterback of the Buckeyes is hurt on the field and the medical team is in. The quick diagnostic test that they can run on the field suggests that it's a serious injury that will end the season for the player; however, this test only has an 85% accuracy rate. The meaning of this is that if there is a serious injury, the test will be true 85% of the time; if the injury is not serious the test will be false 85% of the time. The good news is that the rate of serious injuries on the field is only 1 in 20,000.
    1. (10 points) Why is this good news, i.e., what is the probability that the player may return next week?
    2. (10 points) Draw a Bayesian network that describes the situation, including the CPTs. (Hint: think carefully about the probability distributions I have given in the problem.)
    3. (10 points) If the quarterback is out, there's a 50% chance the Buckeyes will win the next game. If the quarterback is in, then the probability is 90%. Redraw the Bayesian network, adding in this information, and compute the probability that the Buckeyes will lose given the fact that the on-field test is positive.

  2. (30 points) Consider the following three data files: File1, File2, File3.
    1. Write a program to read in a data file and compute the mean and standard deviation, min and max. One formula for standard deviation of a single random variable is sqrt((Sum_i ((x_i-mean)^2))/(n-1)), where "_i" means "subscript i", and ^2 means to take the square. Note: we will evaluate your code on another dataset. Also, don't throw away this code, we will use it in a later assignment.
    One of these files contains data from a normal distribution, one from a uniform distribution, and one from a mixed (normal+uniform) distribution. Your task is to determine whether each set is more like a uniform distribution or a normal distribution.
    1. For each data file, compute the normal distribution and uniform distribution according to the data.
    2. Compute the total log likelihood of the data for each distribution. This is the sum of the log probability of each datapoint given your distribution. Don't worry about normalizing "per unit"; since both the normal and uniform distributions are on the same scale, we can compare relative likelihood scores.
    3. The "best fit" is given by the distribution with the highest total log likelihood. Show the best fit for each dataset.
    You may use any language you like (hint: MATLAB has builtin functions for some of this). Please give the grader instructions on how to run your code. Remember that your code must run on the CSE unix cluster.

  3. (20 points) Given the following Bayes net:

    Prove that (a) D is independent of C given D's Markov blanket; and (b) D is not independent of B given E and F.

  4. (20 points) Do problem 14.1 in R&N (p.533).

  5. EXTRA CREDIT (up to 40 points depending on thoroughness of answer): Create a Bayes network, either for some problem that you make up, or by naming the variables arbitrarily (A,B,C, etc). Set reasonable CPTs for each node. To keep it simple, keep all variables binary. Write a program to do exact inference on some query, and then write a second program to implement MCMC inference on the same query. Report on the amount of time and space taken by each. Do you get the same answer for exact inference and MCMC? Try this over a variety of queries. What happens if you change the number of variables (say, go from 10 to 100 to 1000)? Think of your own questions. Creativity will be rewarded.

Submission instructions:

Write up all of your answers to the questions in a text editor so that it can be submitted electronically. Put that file as well as your programs, and ONLY those files, in a directory called hw1, and use the submit command to send the files to the grader. The syntax of the submit command is:

> submit c730aa lab1 hw1

For the code, you should turn in either

You can learn about using the submit command by typing man submit at a unix command prompt. If you cannot get the submit command to work, you can email your files to the grader, but please don't use this option unless you're having trouble with submit. We will test your program on the CSE dept unix computers, so make sure that the program you turn in is built to run on Solaris. Don't depend on cross-platform language implementations, you should make sure your code runs on the CSE dept Solaris system. If your program doesn't execute successfully, we will not attempt to recompile it or otherwise correct the error. Your program will be graded solely on whether its actions are appropriate to the inputs we supply in our testing, not on your programming style or design sophistication.

Eric Fosler-Lussier
Last modified: Mon Oct 1 21:03:58 EDT 2007