CSE 730 Homework 1: Probability theory, Bayes' Nets
Due 9 October 2007, 11:59 PM
10 point penalty for every hour late (or part thereof)
Your responses to these questions can either be submitted
electronically, or by 5 pm on the 9th in my mailbox. For
electronic submission, please see the submission
instructions below.
- (30 points) [edit 6Oct 10:15 to correct explanation] The star
quarterback of the Buckeyes is hurt on the field and the
medical team is in. The quick diagnostic test that they can
run on the field suggests that it's a serious injury that
will end the season for the player; however, this test only
has an 85% accuracy rate. The meaning of this is that if
there is a serious injury, the test will be true 85% of the
time; if the injury is not serious the test will be false 85% of
the time. The good news is that the rate of serious injuries
on the field is only 1 in 20,000.
- (10 points) Why is this good news, i.e., what is the
probability that the player may return next week?
- (10
points) Draw a Bayesian network that describes the situation,
including the CPTs. (Hint: think carefully about the
probability distributions I have given in the problem.)
-
(10 points) If the quarterback is out, there's a 50% chance
the Buckeyes will win the next game. If the quarterback is
in, then the probability is 90%. Redraw the Bayesian
network, adding in this information, and compute the
probability that the Buckeyes will lose given the fact that
the on-field test is positive.
- (30 points) Consider the following three data files: File1, File2, File3.
- Write a program to read in a data file and compute the
mean and standard deviation, min and max. One formula for standard deviation of a single random variable is sqrt((Sum_i ((x_i-mean)^2))/(n-1)), where "_i" means "subscript i", and ^2 means to take the square. Note: we will evaluate your code on another dataset. Also, don't throw away this code, we will use it in a later assignment.
One of these files contains data from a normal distribution,
one from a uniform distribution, and one from a mixed
(normal+uniform) distribution. Your task is to determine
whether each set is more like a uniform distribution or a
normal distribution.
- For each data file, compute the normal distribution and uniform distribution according to the data.
- Compute the total log likelihood of the data for each distribution. This is the sum of the log probability of each datapoint given your distribution. Don't worry about normalizing "per unit"; since both the normal and uniform distributions are on the same scale, we can compare relative likelihood scores.
- The "best fit" is given by the distribution with the highest total log likelihood. Show the best fit for each dataset.
You may use any language you like (hint: MATLAB has builtin functions for some of this). Please give the grader instructions on how to run your code. Remember that your code must run on the CSE unix cluster.
- (20 points)
Given the following Bayes net:

Prove that (a) D is independent of C given D's Markov blanket; and (b) D is not independent of B given E and F.
- (20 points)
Do problem 14.1 in R&N (p.533).
- EXTRA CREDIT (up to 40 points depending on thoroughness of
answer): Create a Bayes network, either for some problem that
you make up, or by naming the variables arbitrarily (A,B,C,
etc). Set reasonable CPTs for each node. To keep it simple, keep
all variables binary. Write a program to do exact inference on
some query, and then write a second program to implement MCMC
inference on the same query. Report on the amount of time and
space taken by each. Do you get the same answer for exact
inference and MCMC? Try this over a variety of queries. What
happens if you change the number of variables (say, go from 10
to 100 to 1000)? Think of your own questions. Creativity will
be rewarded.
Submission instructions:
Write up all of your answers to the
questions in a text editor so that it can be submitted electronically.
Put that file as well as your programs, and ONLY those files, in a
directory called hw1, and use the submit command to send
the files to the grader. The syntax of the submit command is:
> submit c730aa lab1 hw1
For the code, you should turn in either
- An executable
version of your program along with the source code.
- If your
program is written in an interpreted language, such as perl, lisp or
python, you can turn in just the source. In that case, make sure to
use a standard file extension so that we can determine how to run your
program.
You can learn about using the submit command by typing man
submit at a unix command prompt. If you cannot get the
submit command to work, you can email your files to the grader,
but please don't use this option unless you're having trouble
with submit.
We will test your program on the CSE dept unix computers, so make sure
that the program you turn in is built to run on Solaris. Don't depend
on cross-platform language implementations, you should make sure your
code runs on the CSE dept Solaris system. If your program doesn't
execute successfully, we will not attempt to recompile it or otherwise
correct the error. Your program will be graded solely on whether its
actions are appropriate to the inputs we supply in our testing, not on
your programming style or design sophistication.
Eric Fosler-Lussier
Last modified: Mon Oct 1 21:03:58 EDT 2007