Fast bit-reversals on unitprocessors and shared-memory multiprocessors Z. Zhang and X. Zhang SIAM Journal on Scientific Computing, Vol. 22, No. 6, 2001, pp. 2113-2134. Abstract In this paper, we examine different methods using techniques of blocking, buffering, and padding for efficient implementations of bit-reversals. We evaluate the merits and limits of each technique and their application and architecture-dependent conditions for developing cache-optimal methods. Besides testing the methods on different uniprocessors, we conducted both simulation and measurements on two commercial SMP multiprocessors to provide architectural insights into the methods and their implementations. We present two contributions in this paper: (1) Our integrated blocking methods, which match cache associativity and TLB cache size and which fully use the available registers are cache-optimal and fast. (2) We show that our padding methods outperform other software oriented methods, and believe they are the fastest in terms of minimizing both CPU and memory access cycles. Since the padding methods are almost independent of hardware, they could be widely used on many uniprocessor workstations and multiprocessors.Back to the Publication Page.
Back to the HPCS Main Page at the Ohio State University.