next up previous
Next: About this document ... Up: Cache Miss Characterization and Previous: Experimental Results

Bibliography

1
N. Ahmed.
Locality Enhancement for Imperfectly Nested loops.
PhD thesis, Cornell Computer Science Department, 2000.

2
N. Ahmed, N. Mateev, and K. Pingali.
Synthesizing transformations for locality enhancement of imperfectly nested loops.
In Proc. of ACM Intl. Conf. on Supercomputing, 2000.

3
G. Almasi, C. Cascaval, and D. A. Padua.
Calculating stack distances efficiently.
In Proceedings of the workshop on Memory system performance, pages 37-43. ACM Press, 2002.

4
J. M. Anderson, S. P. Amarasinghe, and M. S. Lam.
Data and Computation Transformations for Multiprocessors.
In Proc. of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing, July 1995.

5
G. Baumgartner, D. Bernholdt, D. Cociorva, R. Harrison, S. Hirata, C. Lam, M. Nooijen, R. Pitzer, J. Ramanujam, and P. Sadayappan.
A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry.
In Proc. of Supercomputing 2002, November 2002.

6
C. Cascaval and D. A. Padua.
Estimating cache misses and locality using stack distances.
In Proceedings of the 17th annual international conference on Supercomputing, pages 150-159. ACM Press, 2003.

7
S. Chatterjee, E. Parker, P. J. Hanlon, and A. R. Lebeck.
Exact analysis of the cache behavior of nested loops.
In Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, pages 286-297. ACM Press, 2001.

8
D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam, M. Nooijen, D. Bernholdt, and R. Harrison.
Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations.
In Proc. of ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI), pages 177-186, 2002.

9
D. Cociorva, X. Gao, S. Krishnan, G. Baumgartner, C. Lam, P. Sadayappan, and J. Ramanujam.
Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints.
In Proc. of Seventeenth International Parallel and Distributed Processing Symposium (IPDPS), 2003.

10
D. Cociorva, J. Wilkins, G. Baumgartner, P. Sadayappan, J. Ramanujam, M. Nooijen, D. E. Bernholdt, and R. Harrison.
Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization.
In Proc. of the Intl. Conf. on High Performance Computing, volume 2228, pages 237-248. Springer-Verlag, 2001.

11
D. Cociorva, J. Wilkins, C. Lam, G. Baumgartner, P. Sadayappan, and J. Ramanujam.
Loop optimization for a class of memory-constrained computations.
In Proc. of the Fifteenth ACM International Conference on Supercomputing (ICS'01), pages 500-509, 2001.

12
S. Ghosh, M. Martonosi, and S. Malik.
Precise miss analysis for program transformations with caches of arbitrary associativity.
In Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, pages 228-239. ACM Press, 1998.

13
S. Ghosh, M. Martonosi, and S. Malik.
Cache miss equations: a compiler framework for analyzing and tuning memory behavior.
ACM Trans. Program. Lang. Syst., 21(4):703-746, 1999.

14
I. Kodukula, N. Ahmed, and K. Pingali.
Data-centric multi-level blocking.
In Proc. of SIGPLAN Conf. Programming Language Design and Implementation, 1997.

15
C. Lam.
Performance Optimization of a Class of Loops Implementing Multi-Dimensional Integrals.
PhD thesis, The Ohio State University, Columbus, OH, August 1999.

16
C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan.
Memory-optimal evaluation of expression trees involving large objects.
In Proc. of Intl. Conf. on High Perf. Comp., 1999.

17
C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan.
Optimization of Memory Usage and Communication Requirements for a Class of Loops Implementing Multi-Dimensional Integrals.
In Proc. of Twelfth LCPC Workshop, 1999.

18
C. Lam, P. Sadayappan, and R. Wenger.
On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution.
Parallel Processing Letters, 7(2):157-168, 1997.

19
C. Lam, P. Sadayappan, and R. Wenger.
Optimization of a Class of Multi-Dimensional Integrals on Parallel Machines.
In Proc. of Eighth SIAM Conf. on Parallel Processing for Scientific Computing, 1997.

20
A. W. Lim, S.-W. Liao, and M. S. Lam.
Blocking and array contraction across arbitrarily nested loops using affine partitioning.
In Proc. of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, pages 103-112. ACM Press, 2001.

21
K. S. McKinley, S. Carr, and C.-W. Tseng.
Improving Data Locality with Loop Transformations.
ACM TOPLAS, 18(4):424-453, July 1996.


rajkiran panuganti 2005-05-12