Cached DRAM for ILP Processor Memory Access Latency Reduction Zhao Zhang, Zhichun Zhu, and Xiaodong Zhang IEEE Micro, Vol. 21, No. 4, July/August, 2001, pp. 22-32. Abstract As the speed gap between the processor and the memory continues to widen, data-intensive applications are putting increasing demands on the main memory system. Cached DRAM is an existing technology that adds a small cache onto the DRAM chip. By exploiting the locality of memory access streams missing the L2 cache, a cached DRAM can reduce the average DRAM access time. Previous studies have shown that cached DRAM is effective on a relatively simple processor model with small or even without data caches. Some recent studies have shown that this technique can be effective on modern ILP processors as well. Aiming at further investigating the ILP effects and comparing cached DRAM with other advanced DRAM organizations and interleaving techniques, we present a study of its design and optimization in the context of processors with full ILP capabilities and large data caches. Conducting an execution-driven simulation, we have evaluated its performance effectiveness by 8 selected data-intensive SPECfp95 programs and the TPC-C workload. Our study provides three new findings (1) cached DRAM is able to consistently show its performance advantage as the ILP degree increases; (2)~contemporary DRAM schemes, such as SDRAM, Enhanced SDRAM, Rambus DRAM, and Direct Rambus DRAM, do not exploit memory access locality of data-intensive workloads as effectively as a cached DRAM does; and (3) compared with an highly effective permutation-based DRAM interleaving technique, Cached DRAM can still gain substantial performance improvement because it fully utilizes the bus bandwidth by overlapping a large number of concurrent memory accesses, and minimizes conflict misses in the on-memory caches and/or row-buffers.Back to the Publication Page.
Back to the HPCS Main Page at the Ohio State University.