CSE 222 Closed Lab #5


Please follow these steps in closed lab:

  1. Select a partner to work with on the closed lab exercise and decide whose account to use for it.  After you have completed the closed lab exercise, the person whose account has been used may e-mail to his/her partner the file(s) containing the solution that was devised by the partnership.
  2. Copy the initial closed lab #5 files to your own working directory and get into that directory:
    cp -R /class/sce/now/222/closed-labs/closed-lab05/Closed_Lab05/ .
    cd Closed_Lab05
  3. Look through the file Hash_Test.cpp, read and understand the specification for Text_Hash::Hash, and fill in the body of Hash to meet this specification.  (Notice that the compiler allows you to write the code for abstract and/or concrete components in your main program file!  We haven't done this in general, because typically it is better to place potentially reusable components into a component catalog so they can be used elsewhere, too.  But the exercise illustrates that separating components into their own files was a conscious design decision, not something dictated by the C++ programming language.)
  4. Build the project and run this program, supplying various hash table sizes (recommended range: 15-150) and various input files. Four files of several hundred Text values each, chosen from a dictionary, are provided for your convenience: random.txt, length8.txt, startend.txt, and mod30.txt. Look at these files to see if you can determine what features the various Text values have; it's not necessarily obvious. Among the hash table sizes you try, be sure to try size 30 (or any other size in the recommended range that is evenly divisible by 2, 3, or 5) with each of the provided sets of values. An evil "adversary" has attacked the combination of our hash function and table sizes evenly divisible by 2, 3, or 5. Which file came from this adversary? Try to lessen the impact of the adversary's scheme by using a table size relatively prime to 30; any prime number in the range will do, but there are other numbers relatively prime to 30, too. What happened? Do you think there would be an advantage to using an actual prime number as opposed to a number that is merely relatively prime to 30 or some other special number of interest? If not, why not? If so, what's the advantage? A "good" hash function should distribute these sets of values approximately evenly among the buckets. Observe the results of applying the hash function you have implemented for this test. Do you think it's doing a "good" job? Could it be improved? Why or why not?
  5. Now pick a prime hash table size around 1000. Other than the file created by the adversary, one file has what might seem a surprising distribution of bucket hits. Which one? Can you explain why the distribution looks like this? (If not, remember this situation when you're asking why statistics is a CSE/CIS graduation requirement!)
  6. Run the program with the input file Hash_Test.cpp (i.e., run the program with its own source code as input). What does the program do in calculating its results that makes this a reasonable additional test of the hash function and table size? (Study the program again if you're not sure.) Does this additional test shake your confidence in the "goodness" of the hash function? Why or why not?
  7. Change the hash function to any other substantively different one, and repeat step 4, trying to find a hash function that works well on all four input files.  Which hash function do you think is "better" for general Text values not picked by an adversary, the first one or your alternate?
  8. Let your instructor know when you can explain what you have observed to this point, in order to get credit for this closed lab.