CSE 222 Closed Lab #5
Please follow these steps in closed lab:
-
Select a partner to work with on the closed lab exercise and decide whose
account to use for it. After you have completed the closed lab exercise,
the person whose account has been used may e-mail to his/her partner the
file(s) containing the solution that was devised by the partnership.
-
Copy the initial closed lab #5 files to your own working directory and get
into that directory:
cp -R /class/sce/now/222/closed-labs/closed-lab05/Closed_Lab05/ .
cd Closed_Lab05
-
Look through the file Hash_Test.cpp,
read and understand the specification for Text_Hash::Hash, and fill
in the body of Hash to meet this specification. (Notice that
the compiler allows you to write the code for abstract and/or concrete
components in your main program file! We haven't done this in general,
because typically it is better to place potentially reusable components
into a component catalog so they can be used elsewhere, too. But
the exercise illustrates that separating components into their own files
was a conscious design decision, not something dictated by the C++ programming
language.)
- Build the project and run this program, supplying various hash table
sizes (recommended range: 15-150) and various input files. Four files
of several hundred Text values each, chosen from a dictionary, are
provided for your convenience: random.txt, length8.txt, startend.txt, and mod30.txt. Look at these files to see if you can determine what features the various Text
values have; it's not necessarily obvious. Among the hash
table sizes you try, be sure to try size 30 (or any other size
in the recommended range that is evenly divisible by 2, 3, or
5) with each of the provided sets of values. An evil "adversary" has
attacked the combination of our hash function and table sizes evenly divisible
by 2, 3, or 5. Which file came from this adversary? Try to lessen the
impact of the adversary's scheme by using a table size relatively prime to
30; any prime number in the range will do, but there are other numbers
relatively prime to 30, too. What happened? Do you think there would
be an advantage to using an actual prime number as opposed to a number
that is merely relatively prime to 30 or some other special number of
interest? If not, why not? If so, what's the advantage? A "good"
hash function should distribute these sets of values approximately
evenly among the buckets. Observe the results of applying the
hash function you have implemented for this test. Do you think
it's doing a "good" job? Could it be improved? Why or why
not?
- Now
pick a prime hash table size around 1000. Other than the file created by the
adversary, one file has what might seem a surprising distribution of bucket
hits. Which one? Can you explain why the distribution looks like this? (If
not, remember this situation when you're asking why statistics is a CSE/CIS
graduation requirement!)
-
Run the program with the input file Hash_Test.cpp (i.e., run the
program with its own source code as input). What does the program do
in calculating its results that makes this a reasonable additional
test of the hash function and table size? (Study the program again
if you're not sure.) Does this additional test shake your confidence
in the "goodness" of the hash function? Why or why not?
-
Change the hash function to any other substantively different one,
and repeat step 4, trying to find a hash function that works well on all
four input files. Which hash function do you think is
"better" for general Text values not picked by an adversary, the first one or your alternate?
-
Let your instructor know when you can explain what you have observed to
this point, in order to get credit for this closed lab.