Lab 4: Collections Framework


This lab consists of four parts. The first three parts are compulsory, while the last part is optional (and will not be graded).

  1. Dense Multiset

    A multiset is a set that allows repetition of individual elements. Thus, adding an element always increases the cardinality by one, regardless of whether the element is already present. In this lab, you will implement a multiset of char.

    Write a class that implements the MultiSetOfChar interface:

    interface MultiSetOfChar {
        int getCardinality();
        int getElementCount(char target);
        Set<Character> getElementSet();
        void add(char item);
        boolean remove(char target);
        char randomUniformChoose();
    }
    

    As far as semantics (behavior) of these methods, your implementation should do the following:

    getCardinality
    Returns the cardinality of the multiset.
    getElementCount
    Returns the number of occurrences of the element in the multiset. The sum of getElementCount for each char is equal to the cardinality of the set.
    getElementSet
    Returns a set such that every element in the multiset is in the set (but no duplicates).
    add
    Adds the argument to the multiset.
    remove
    Removes the target, if present in the multiset. The method returns true if it changes the multiset.
    randomUniformChoose
    Returns a char chosen randomly based on the contents of the multiset. This operation does not change the cardinality of the multiset. For a character that appears N times in a multiset of cardinality M, the probability of that character being returned is N / M.

    Your implementation should be called DenseMultiSetOfChar and should make use of the Collections Framework. In addition, you should obtimize your implementation for containing many repetitions of relatively few distinct characters. That is, a DenseMultiSetOfChar should not use significantly (any?) more memory if it contains 100,000 repetitions of 'a' vs just one.

    For your convenience, the test program TestDenseMultiSetOfCharSimple.java is provided. Your DenseMultiSetOfChar class should compile with this test program. This test program documents the expected behavior so it should help you ensure you have implemented the correct method behaviors as well as signatures.

  2. Compressing a Library of Books

    How many characters are in the text of "Alice in Wonderland"? How many times does the character 'A' appear? In this part of the lab, you will implement a class that could be used to keep track of this information for every book in a library.

    Write a class that implements the FrequencyLibrary interface:

    interface FrequencyLibrary {
        int size();
        boolean contains(String target);
        MultiSetOfChar getFrequencies(String target);
        void add(String name, char element);
        boolean remove(String name, char element);
        char randomUniformChoose(String name);
    }
    

    As far as semantics (behavior) of these methods, your implementation should do the following:

    size
    Returns the number of books in the library.
    contains
    Returns true if and only if the argument is already a book title in the library.
    getFrequencies
    Returns the MultiSetOfChar that represents the occurrences of the individual characters in the text of the book indiciated by the argument.
    add
    Modifies the character occurrences associated with name to include one more occurrence of element.
    remove
    Modifies the character occurrences associated with name to include one less occurrence of element. If this removal results in no elements being associated with name, name is removed from the library.
    randomUniformChoose
    Returns a random character, chosen from the same distribution as the characters appear in the book. For example, if 5% of the characters in "Alice in Wonderland" are an 'A', then this method should return an 'A' about 5% of the time.

    Your implementation should be called DenseFrequencyLibrary and should make use of your solution to part 1 as well as the Collections Framework.

    For your convenience, the test program TestDenseFrequencyLibrarySimple.java is provided. Your DenseFrequencyLibrary class should compile with this test program. This test program documents the expected behavior so it should help you ensure you have implemented the correct method behaviors as well as signatures.

  3. Performance

    Compare the run-time performance of your DenseMultiSetOfChar using two different choices for collection on which it is built. For example, if you wrote DenseMultiSetOfChar to use a HashSet, compare it with using a TreeSet instead. (I am not suggesting or advising you to use a HashSet for your solution! This is just an illustrative example.)

    Note: you should be able to change between two implementations (such as HashSet and TreeSet) by changing a single line of code in your implementation.

    For your convenience, a basic timing program Profiler.java is provided. Expand and modify this program until you have gathered enough performance data to observe the impact that choice of collection implementation makes on your code. For example, you might want to look at the performance of different methods under different circumstances (e.g., small/large cardinality or few/many repetitions). Include a text file in your submission that describes your findings. (Only a brief discussion is needed: a single paragraph.)

  4. Optional Extra: Generics

    Note: This part of the lab is optional. It will not be graded, so you do not have to complete it in order to get full credit for the lab.

    Modify the multiset interface provided as part of this lab to use generics. Modify your implementation of this interface accordingly.