Statistics Machine
Advisor: Bruce Weide
Participants: Nick Hurley
Start Date: Winter Quarter, 2001
Project Status: Complete, awaiting upload.
Index:
Abstract:
Statistics Machine is intended to provide a quick and easy way for
RESOLVE/C++ programmers to calculate the Mean,
Median, Mode, Minimum, Maximum, Variance, and
Standard Deviation of a finite, discrete data
set. It is designed to have an intuitive
interface and fast, elegant implementations.
Calendar:
- Beginning of Winter Quarter, 2001
Met with Bruce to discuss possible projects for this quarter. Decided
on Statistics Machine, as I was taking
Statistics 427 that quarter, so the subject
was already on my mind.
- First meeting to February 27, 2001
Began thinking of ways to implement Statistics Machine. Best idea so
far had most operations in O(c) (constant)
time, but the rest were O(n) (linear)
time. Not good enough.
- February 27, 2001
Presented preliminary Statistics Machine information to the
group. Asked for suggestions on data
structures and algorithms. Scott Pike
suggested a "threaded binary tree" to
store the information, enabling me to
reduce insertion to O(log n) time and
calculation of median to O(c)
time. Bruce suggested using a heap for
mode to reduce it to O(log n) time, as
well, but I have not, as of yet, looked into that
any further.
- Winter Quarter 2002
After an admittedly long break because of CIS 560, summer vacation, and
a quarter wasted on a project that went
nowhere, I finally returned to
Statistics_Machine, with an idea to
re-implement it with a B+-tree. This
idea fell by the wayside, however, in an
effort to just finish a version of the
project. The final implementation uses a
threaded binary tree with a few other
data members to hold a couple sums. All
available operations except for
insertion run in constant time.
Project Development:
- Currently, preliminary code in "vanilla" C++ exists, as my mind thinks
better in C++. I do, however, intend to
translate the final working version to
RESOLVE/C++ to maintain consistency with the
rest of the RESOLVE catalog. The only real
problem that remains unsolved is how to
reduce calculating the mode from O(n) to
O(log n) or even (ideally) O(c) time, but
this is a secondary consideration, as the
mode is very rarely used in the real
world.
- The final code is also "vanilla" C++, as implementing the low-level
stuff in RESOLVE/C++ would not work out
well. The interface, however, does conform
to RESOLVE/C++, including mathematical
specifications. The mode() operation has
been removed from the Statistics_Machine.
Findings and Contributions:
- It is impossible to implement a mode function in O(c) time using a
threaded binary tree.
Links and References:
None yet. As soon as I upload the Statistics_Machine code, a link will
appear here.
Return to Europa
Nick
Hurley <hurley AT cis DOT
ohio-state DOT edu>
Last modified: Sun Mar 17 20:20:32 EST 2002