Advisor(s): Dr. Bruce Weide,
Dr. Paolo Bucci
Participants: Ratnamala
Bearavolu, Susan Hohenberger
Start Date: Winter Quarter 1999
Project Status: Active
Abstract
Calendar
Project
Development
Findings
and Contributions
Return
to EuropaThe Statistics_Machine component was designed to handle the computation of maximum, minimum, mean, median, and standard deviation on an arbitrary number of reals. A practical application would be to handle the CIS221, CIS222, CIS321 online evaluation form data. The goal was to create a clean and user friendly interface with an elegant design supporting it.
The project development state is active. We succeeded in developing
and anaylzing an algorithm that produced the median in near real-time when
new data additions were added to a statistics machine that was being continually
queried for the median. However, this algorithm involved too much overhead
to compete with the best of the batch algorithms on a single run.
Therefore, our future work is to combine the real-time and batch median
algorithms into an "intelligent" median component. Using a small
amount of overhead, this "intelligent" algorithm can switch between
the real-time and batch algorithms to produce the median for the user with
the fastest method based on how the user is using the statistics machine.
Insert (consumes Real x), Clear (), Size (), Maximum (), Minimum (),
Mean (), Median (), Standard_Deviation ()
|Queue_of_Reals (buffer) | Real (max) | Real (min) | Real (sum) | Real
(sum of squares) | Real (median) | Boolean (median_is_calculated)|
Median is computed at time of Median() call by merge sorting data and selecting middle element. Order nlogn.
Median is computed at time of Median() call by selecting the median of medians as a partition and separating data. Order n.
Median is computed at each Insert() call by storing the items in twin binary search trees. Order nlogn (logn at each Insert()).
Median is computed at time of Median() call by generating a random partition and solving for kth in n problem. Order n-squared (averages much better).
After anaylsis involving randomly generated data sets of variable sizes, we discovered ...
Algorithm C was implemented using a special object-oriented approached
referred to as "recasting" the algorithm as an object. Unlike
Algorithms A, B, and D, this recasted algorithm was completely designed
by the authors of this research. The primary need to do this was because
all other median algorithms described in text books were for batch applications
(single runs). However, these algorithms produce horrible results if the
user wishes to insert data, ask for the median, insert more data ask for
the median again, and so on. This is the central reason that most real-time
applications use a mean approximation in place of the median.
However, there are cases, such as noise reduction in electrical engineering,
that could be enhanced by using the median instead of a mean approximation
if an efficient median algorithm were known to exist. Our research
developed such an algorithm. But we discovered the downfall of this algorithm
to be that although it could produce the median in constant time after
one new data insertion (thereby performing superbly in a real-time environment),
it performed terribly on a batch run of the data.
Our conclusions at the research forum indicated that "recasting" algorithms as objects can open up the possibilities of algorithm development in many areas just as it made it possible to create algorithm C in the median arena. Specifically for median compuation, we reported that repeatedly queried real-time applications for median could be realized with algorithm C, but that for batch algorithms it was more efficient to stick with D, the best of the batch algorithms.
The above ends our findings at the OSU Research Forum. However, over the summer Ratna and I asked ourselves "What if you could have the best of both worlds? What if you could have one algorithm that performed well in both batch and real-time applications?" Therefore, our future work is to combine the real-time and batch median algorithms into an "intelligent" median component. Using a small amount of overhead, we propose that this "intelligent" algorithm can switch between the real-time and batch algorithms producing the median with the fastest possible method based on how the user is using the statistics machine. This exciting question not only fuels further algorithm study, but also explores the fundamentals of object-oriented programming and introduces concepts of artifical intelligence.
Last modified: Sat Dec 4 19:26:00 EST 1999