D. K. Panda, Network-Based Computing: Issues, Trends, and Challenges Network-Based Computing: Issues, Trends, and Challenges
1.a. Overview of Interconnection Networks and Issues (Switching, Routing and Deadlock freedom), (Slides for papers 1.a-1.c)
1.b. N.J. Boden, D. Cohen, R.E. Felderman, A.E. Kulawik, C.L. Seitz, J.N. Seizovic, and W. Su. Myrinet: A Gigabit-per-second Local Area Network. IEEE Micro, 15(1):29--36, February 1995.
1.c. Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, and Eitan Frachtenberg. The Quadrics Network (QsNet): High-Performance Clustering Technology. In Hot Interconnects 9, Stanford University, Palo Alto, CA, August 2001.
1.d. Introduction to InfiniBand, A white paper from Mellanox Corporation. (Slides for papers 1.d-1.g)
1.e. InfiniBand in the Enterprise Data Center, A white paper from Mellanox Corporation.
1.f. Introduction to InfiniBand for End Users, A white paper from Mellanox Corporation.
1.g. 4th Generation Server & Storage Adapter Architecture, A white paper from Mellanox Corporation.
1.h. 40 GbE: What, Why & Its Market Potential, A white paper from Ethernet Alliance. (Slides for papers 1.h-1.l)
1.i. Overview of Requirements and Applications for 40 Gigabit and 10 Gigabit Ethernet A white paper from Ethernet Alliance.
1.j. Time for TOE: The Benefits of 10Gbps TCP Offload, , White Paper, Chelsio Corporation.
1.k. Understanding iWARP: Eliminating Overhead and Latency in multi-Gb Ethernet Networks, , White Paper, Neteffect.
1.l. H. Subramoni, P. Lai, M. Luo and D. K. Panda, RDMA over Etnernet - A preliminary Study, , HPIDC Workshop, in Conjunction with Cluster '09, 2009.
2.a. Raoul A.F. Bhoedjang, Tim Ruhl, Henri E. Bal, User-Level Network Interface Protocols, IEEE Computer, Nov. 1998, pp. 53-60. (Slides for papers 2.a-2.c)
2.b. P. Shivam, P. Wyckoff, and D. K. Panda, EMP: Zero-copy OS-bypass NIC-driven Gigabit Ethernet Message Passing, Supercomputing (SC '01), November 2001.
2.c. InfiniBand? Host Channel Adapter Verb Implementer's Guide, Intel Corporation.
3.a. W. Gropp, E. Lusk, N. Doss, and T. Skjellum, A high-performance, portable implementation of the MPI message-passing interface standard, Tech. Report ANL/MCS-P5670296, Argonne National Laboratory, February 1996. (Slides for papers 3.a-3.c)
3.b. Al Geist, Ewing Lusk, William Gropp, William Saphir, Steve Huss-Lederman, Tony Skjellum, Andrew Lumsdaine, and Marc Snir, MPI-2: Extending the Message-Passing Interface. In EuroPar '96, February 1996.
3.c. William Gropp and Ewing Lusk, MPICH Abstract Device Interface, Version 3.3, Dec. 2001.
3.d. J. Protic, M. Tomasevic, and V. Milutinovic. Distributed shared memory: Concepts and systems. IEEE Parallel & Distributed Technology, 4(2):63--79, Summer 1996. (Slides for papers 3.d-3.f)
3.e. C. Amza, A. L. Cox, and et al. Treadmarks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18--28, February 1996.
3.f. Alan L. Cox et al. A Performance Comparison of Homeless and Home-Based Lazy Release Consistency Protocols in Software Shared Memory HPCA-5 Conference.
3.g. Jarek Nieplocha, Bruce Palmer, Vinod Tipparaju, Manojkumar Krishnan, Harold Trease and Edo Apra, Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit , International Journal of High Performance Computing Applications, Vol. 20, No. 2, 203-231p, 2006 (Slides for papers 3.g-3.j)
3.h. J. Nieplocha, V. Tipparaju, M. Krishnan, and D. Panda. High Performance Remote Memory Access Comunications: The ARMCI Approach. International Journal of High Performance Computing and Applications, Vol 20(2), 233-253p, 2006.
3.i. C. Barton, C. Cascaval, S. Chatterjee, G. Almasi, Y. Zheng, M. Farreras, J. Amaral. Shared Memory Programming for Large Scale Machines, ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation (PLDI).
3.j. Philleppe Charles, et. al., X10 - An Object-Oriented Approach to Non-Uniform Cluster Computing Proc. of OOPSLA '05.
4.a. J. Liu, J. Wu, S. P. Kini, P. Wyckoff, and D. K. Panda, High Performance RDMA-Based MPI Implementation over InfiniBand, Int'l Conference on Supercomputing (ICS '03), June 2003. (Slides for papers 4.a-4.c)
4.b. S. Sur, H.-W. Jin, L. Chai and D. K. Panda RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits , Symposium on Principles and Practice of Parallel Programming (PPOPP'06), March 29-31, 2006, Manhattan, New York City.
4.c. W. Huang, G. Santharaman, H. -W. Jin, and D. K. Panda, Design Alternatives and Performance Trade-offs for Implementing MPI-2 over InfiniBand, EuroPVM/MPI 2005, Sept. 2005.
4.d. S. Sur, M. Koop, and D. K. Panda, High-Performance and Scalable MPI over InfiniBand with Reduced Memory Usage: An In-Depth Performance Analysis , SuperComputing (SC 06), November, 2006. (Slides for papers 4.d-4.f)
4.e. M. Koop, S. Sur and D. K. Panda, Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram , IEEE International Conference on Cluster Computing (Cluster'07), Austin, TX, September 2007.
4.f. M. Koop, T. Jones and D. K. Panda, MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand, Int'l Parallel and Distributed Processing Symposium (IPDPS), 2008.
4.g. L. Chai, A. Hartono and D. K. Panda, Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters , The IEEE International Conference on Cluster Computing (Cluster 2006), September 2006. (Slides for papers 4.g-4.i)
4.h. H. -W. Jin, S. Sur, L. Chai, and D. K. Panda, LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster , International Conference on Parallel Processing (ICPP-05), June 2005.
4.i. L. Chai, P. Lai, H. -W. Jin, and D. K. Panda, Designing An Efficient Kernel-level and User-level Hybrid Approach for MPI Intra-node Communication on Multi-core Systems, , International Conference on Parallel Processing (ICPP-08), Sept 2008.
4.j. J. Liu, A. Vishnu, D. K. Panda, Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation, SuperComputing 2004 Conference (SC '04), November, 2004. (Slides for papers 4.j-4.l)
4.k. H. Subramoni, P. Lai, S. Sur and D. K. Panda, Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters , International Conference on Parallel Processing (ICPP '10), Sept. 2010
4.l. Q. Gao, W. Yu, W. Huang and D. K. Panda, Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand, Int'l Conference on Parallel Processing (ICPP), August 2006.
4.m. J. Jose, M. Luo, S. Sur and D. K. Panda, Unifying UPC and MPI Runtimes: Experience with MVAPICH , Fourth Conference on Partitioned Global Address Space Programming Model (PGAS '10), Oct. 2010
4.n. J. Jose, S. Potluri, M. Luo, S. Sur and D. K. Panda, UPC Queues for Scalable Graph Traversals: Design and Evaluation on InfiniBand Clusters, Fifth Conference on Partitioned Global Address Space Programming Model (PGAS '11), Oct. 2011. (Hardcopy will be distributed in class.)
4.o. M. Luo, J. Jose, S. Sur and D. K. Panda, Multi-threaded UPC Runtime with Network Endpoints: Design Alternatives and Evaluation on Multi-core Architectures, Int'l Conference on High Performance Computing (HiPC '11), Dec. 2011. (Hardcopy will be distributed in class.)
5.a. A. Mamidala, A. Vishnu and D. K. Panda, Efficient Shared Memory and RDMA based Design for MPI Allgather over InfiniBand , EuroPVM/MPI, September 2006.
5.b. R. Kumar, A. Mamidala and D. K. Panda, Scaling Alltoall Collective on Multi-core Systems , CAC '08, in conjunction with IPDPS '08.
5.c. K. Kandalla, H. Subramoni, G. Santhanaraman, M. Koop and D. K. Panda, Designing Multi-Leader-Based Allgather Algorithms for Multi-Core Clusters , CAC '09, in conjunction with IPDPS '09.
5.d. K. Kandalla, H. Subramoni, J. Vienne, K. Tomko, S. Sur and D. K. Panda, Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL, Hot Interconnect '11, Aug. 2011. (Hardcopy will be distributed in class.)
5.e. K. Kandalla, E. P. Mancini, S. Sur and D. K. Panda, Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters , International Conference on Parallel Processing (ICPP '10), Sept. 2010.
5.f. H. Subramoni, K. Kandalla, J. Vienne, S. Sur, B. Barth, K. Tomko, R. McLay, K. Schulz and D. K. Panda, Design and Evaluation of Network Topology-/Speed-Aware Broadcast Algorithms for InfiniBand Clusters , IEEE Cluster '11, Sept. 2011.
6.a. H. Wang, S. Potluri, M. Luo, A. Singh, S. Sur and D. K. Panda, MVAPICH2-GPU: Optimized GPU to GPU Communication for InfiniBand Clusters, Int'l Supercomputing Conference (ISC), June 2011. (Hardcopy will be distributed in class.)
6.b. H. Wang, S. Potluri, M. Luo, A. Singh, X. Ouyang, S. Sur and D. K. Panda, Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2, IEEE Cluster '11, Sept. 2011. (Hardcopy will be distributed in class.)
6.c. A. Singh, S. Potluri, H. Wang, K. Kandalla, S. Sur and D. K. Panda, MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefits, Workshop on Parallel Programming on Accelerator Clusters (PPAC '11), held in conjunction with Cluster '11, Sept. 2011. (Hardcopy will be distributed in class.)
7.a. P. Balaji, K. Vaidyanathan, S. Narravula, H.-W. Jin, and D. K. Panda, Designing Next-Generation Data-Centers with Advanced Communication Proto cols and Systems Services, Workshop on NSF Next Generation Software(NGS) Pro gram; held in conjuction with IPDPS, Greece, 2006.
7.b. K. Vaidyanathan, S. Narravula, and D. K. Panda, DDSS: A Low-Overhead Distributed Data Sharing Substrate for Cluster-Based Data-Centers over Modern Interconnects , Int'l Symposium on High Performance Computing (HiPC 06), December, 2006.
7.c. S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy, J. Wu, and D. K. Panda, Supporting Strong Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand . SAN-03 Workshop (in conjunction with HPCA), Feb. 2004.
7.d. S. Narravula, A. Mamidala, A. Vishnu, K. Vaidyanathan and D. K. Panda, High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations , Int'l Symposium on Cluster Computing and the Grid (CCGrid), Rio de Janeiro - Brazil, May 2007.
7.e. J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. W. Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur and D. K. Panda, Memcached Design on High Performance RDMA Capable Interconnects, Int'l Conference on Parallel Processing (ICPP '11), Sept. 2011. (Hardcopy will be distributed in class.)
7.f. J. Huang, X. Ouyang, J. Jose, M. Wasi-ur-Rahman, H. Wang, M. Luo, H. Subramoni, C. Murthy and D. K. Panda, High-Performance Design of HBase with RDMA over InfiniBand, Int'l Parallel and Distributed Processing Symposium (IPDPS '12), May 2012. (Hardcopy will be distributed in class.)
7.g. W. Huang, Q. Gao, J. Liu, and D.K. Panda. High Performance Virtual Machine Migration with RDMA over Modern Interconnects . IEEE International Conference on Cluster Computing (Cluster'07), Austin, TX, September 2007. Selected as a BEST Paper.
8.a. S. Narravula, H. Subramoni, P. Lai, R. Noronha and D. K. Panda, Performance of HPC Middleware over InfiniBand WAN , Int'l Conference on Parallel Processing (ICPP '08), May 2008.
8.b. P. Lai, H. Subramoni, S. Narravula, A. Mamidala and D. K. Panda, Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand, Int'l Conference on Parallel Processing (ICPP '09), Sept. 2009.
8.c. H. Subramoni, P. Lai, R. Kettimuthu and D. K. Panda, High Performance Data Transfer in Grid Environment Using GridFTP over InfiniBand , Int'l Symposium on Cluster Computing and the Grid (CCGrid), May 2010.
9.a. J. Wu, P. Wyckoff, and D. K. Panda. PVFS over InfiniBand: Design and Performance Evaluation. International Conference on Parallel Processing (ICPP 03). Oct. 2003.
9.b. R. Noronha, L. Chai, T. Talpey and D. K. Panda, Designing NFS With RDMA For Security, Performance and Scalability , Int'l Conference on Parallel Processing, XiAn, China, September 2007.
9.c. X. Ouyang, R. Rajachandrasekhar, X. Besseron, H. Wang, J. Huang and D. K. Panda, CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart , Int'l Conference on Parallel Processing (ICPP '11), Sept. 2011.
9.d. R. Rajachandrasekar, X. Ouyang, X. Besseron, V. Meshram and D. K. Panda, Can Checkpoint/Restart Mechanisms Benefit from Hierarchical Data Staging? Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids (Resilience '11), held in conjunction with EuroPar, Aug. 2011.