Research Interests

I am a member of Network-Based Computing Laboratory. My advisor is Prof. D.K. Panda.

My research interests include High Performance Computing and Network File System.

I have been working on two projects: MVAPICH and NFS/RDMA.

 


MVAPICH: High Performance MPI over InfiniBand and other RDMA-enabled Networks

InfiniBand Architecture and iWARP are emerging as open interconnect and protocol standards for HPC clusters, data centers, and file/storage systems. MVAPICH is a high performance MPI library over InfiniBand and other RDMA-enabled Interconnects. MVAPICH project includes both MVAPICH (MPI-1 implementation) and MVAPICH2 (MPI-2 implementation). MVAPICH/MVAPICH2 software is being used by more than 465 organizations world-wide to extract the potential of InfiniBand and other RDMA-enabled interconnect networking technologies for designing high-end computing systems and servers. This software is also being distributed by many InfiniBand and RDMA-enabled interconnect vendors in their software distributions. Two versions (MVAPICH and MVAPICH2) are also available with the OpenFabrics stack.

I have been working on the intra-node communication component of MVAPICH, and also MVAPICH over uDAPL interface. Typical clusters are built by SMP nodes. I am exploring optimizations for MPI intra-node communication. Our current approach is to use user space shared memory, and organize the buffers to efficiently use the L2 cache. I have also designed and implemented MVAPICH over uDAPL interface. uDAPL is a set of transport-independent, platform-independent user space APIs that exploit the RDMA (remote direct memory access) capabilities of next-generation interconnect technologies such as InfiniBand, and iWARP. uDAPL is defiend by DAT Collaborative. By building MVAPICH on top of uDAPL, we have essentially allowed MVAPICH to run on any RDMA-enabled networks.

Current work and progress

Multi-core processors are emerging as a growing industry trend as single-core processors have rapidly reached the speed limit. Multi-core means to place two or more computing cores on one single chip. Multi-core processors bring many novel architecture designs, such as shared L2 cache. However, to efficiently take advantage of the multi-core architecture, applications and middleware should be designed in a multi-core aware manner. With the emergence of multi-core systems, more communication will go through intra-node than before, which makes intra-node communication performance more critical. Currently I am working on optimizing MPI intra-node communication for multi-core systems. We have designed a user space shared memory based scheme that uses L2 cache more efficiently than the traditional design. We are currently exploring novel approaches, such as dedicating a processing core for intra-node communication, to further improve performance. 

Publications

L. Chai, Q. Gao and D. K. Panda, Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System, The 7th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), May 2007.

 

L. Chai, A. Hartono and D. K. Panda, Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters, The IEEE International Conference on Cluster Computing (Cluster 2006), September 2006.

 

L. Chai, R. Noronha and D. K. Panda, MPI over uDAPL: Can High Performance and Portability Exist Across Architectures?, The 6th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), May 2006.

 

  S. Sur, L. Chai, H.-W. Jin and D. K. Panda, Shared Receive Queue Based Scalable MPI Design for InfiniBand Clusters, International Parallel and Distributed Processing Symposium (IPDPS 2006), April, 2006.

 

  S. Sur, H.-W. Jin, L. Chai and D. K. Panda, RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits, Symposium on Principles and Practice of Parallel Programming (PPOPP 2006), March 2006.

  L. Chai, R. Noronha, P. Gupta, G. Brown, and D. K. Panda, Designing a Portable MPI-2 over Modern Interconnects Using uDAPL Interface, EuroPVM/MPI 2005, Sept. 2005.

  H. -W. Jin, S. Sur, L. Chai, and . K. Panda, LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster, Proceedings of the 2005 International Conference on Parallel Processing (ICPP-05), June 2005

  L. Chai, S. Sur, H. -W. Jin, and D. K. Panda, Analysis of Design Considerations for Optimizing Multi-Channel MPI over InfiniBand, Proceedings of Workshop on Communication Architecture for Clusters (CAC 2005); In Conjunction with the International Parallel and Distributed Processing Symposium (IPDPS 2005), April 2005

Links to Related Websites

MVAPICH project website

DAT Collaborative

OpenFabrics Alliance

Message Passing Interface


OpenSolaris NFS over RDMA Project

The aim of this project is to design and develop a high performance RDMA-enabled RPC transport for the NFS client and server in OpenSolaris. Existing transports based on TCP and UDP are limited by copying overhead, resulting in low aggregate bandwidth and high CPU utilization. In addition, the current RDMA-Read (Read-Read design) based design has performance and security limitations. The objectives of this project include security, performance, and interoperability. This prototype is being incorporated into the OpenSolaris kernel code base.

Current work and progress

I am currently working on scalability study of NFS/RDMA with multiple clients. We are also exploring pNFS (parallel NFS) design and optimization in OpenSolaris, to eliminate the single server bottleneck.

Technical Reports

  Ranjit Noronha, Lei Chai, Thomas Talpey and Dhabaleswar K. Panda, Better NFS through RDMA and Efficient Memory Registration. OSU-CISRC-1/07--TR06. (pdf)

Weikuan Yu, Ranjit Noronha, Lei Chai, Shuang Liang, and Dhabaleswar K. Panda, Optimizing OpenSolaris NFS over RDMA. OSU-CISRC-4/060TR43. (pdf)

Links to Related Work

NFS/RDMA project website

NFS/RDMA website in OpenSolaris

Linux NFS/RDMA website (CITI)