PGAS and Hybrid MPI+PGAS Programming Models on Modern HPC Clusters
When: February 15, 2014 (1:30-5:00pm)
Where: Orlando, Florida, USA
Abstract
Multi-core processors, accelerators (GPGPUs), coprocessors (Xeon Phis) and
high-performance interconnects (InfiniBand, 10 GigE/iWARP and RoCE) with RDMA
support are shaping the architectures for next generation clusters. Efficient
programming models to design applications on these clusters as well as on
future exascale systems are still evolving. Partitioned Global Address Space
(PGAS) Models provide an attractive alternative to the traditional Message
Passing Interface (MPI) model owing to their easy to use global shared memory
abstractions and light-weight one-sided communication. Hybrid MPI+PGAS
programming models are gaining attention as a possible solution to programming
exascale systems. These hybrid models help the transition of codes designed
using MPI to take advantage of PGAS models without paying the prohibitive cost
of re-designing complete applications. They also enable hierarchical design of
applications using the different models to suite modern architectures. In this
tutorial, we provide an overview of the research and development taking place
along these directions and discuss associated opportunities and challenges as
we head toward exascale. We start with an in-depth overview of modern system
architectures with multi-core processors, GPU accelerators, Xeon Phi
coprocessors and high-performance interconnects. We present an overview of
language based and library based PGAS models with focus on two popular models -
UPC and OpenSHMEM. We introduce MPI+PGAS hybrid programming models and
highlight the advantages and challenges of designing a unified runtime to
support them. We examine the challenges in designing high-performance UPC,
OpenSHMEM and unified MPI+UPC/OpenSHMEM runtimes. We present case-studies using
application kernels, to demonstrate how one can exploit hybrid MPI+PGAS
programming models to achieve better performance without rewriting the complete
code. Using the publicly available MVAPICH-2-X software package
(http://mvapich.cse.ohio-state.edu/overview/mvapich2x/), we provide concrete
case studies and in-depth evaluation of runtime and applications-level designs
that are targeted for modern systems architectures with multi-core processors,
GPUs, Xeon Phis and high-performance interconnects.
Targeted Audience and Scope
This tutorial is targeted for various categories of people working in the areas
of PGAS and MPI programming models, high performance communication and I/O,
networking, middleware, exascale computing and applications. Specific audience
this tutorial is aimed at include:
- Designers, developers and users of parallel programming models (MPI and PGAS)
- Scientists, engineers, researchers and students engaged in designing next-generation HPC systems and applications
- Newcomers to the field of HPC and exascale computing who are interested in familiarizing themselves with programming models, accelerators, networking, and RDMA
- Managers and administrators responsible for setting-up next generation HPC environment and high-end systems/facilities in their organizations/laboratories
The content level will be as follows: 30% beginner, 40% intermediate, and 30% advanced. There is no fixed pre-requisite. As long as the attendee has a general
knowledge in high performance computing, networking, programming models, parallel applications, and related issues, he/she will be able to understand and appreciate
it. The tutorial is designed in such a way that an attendee gets exposed to the topics in a smooth and progressive manner.
Outline of the Tutorial
- Overview of the Modern HPC System Architectures
- Multi-core Processors
- High Performance Interconnects (InfiniBand, 10GigE/iWARP and
RDMA over Converged Enhanced Ethernet (RoCE))
- Heterogeneity with Accelerators (GPUs) and Coprocessors (Xeon Phis)
- Introduction to Partitioned Global Address Space Models
- Language-based Models: Case Study with UPC
- Library-based Models: Case Study with OpenSHMEM
- Overview of MPI+PGAS Hybrid Programming Models and Benefits
- Designing Scalable and High Performance Support for PGAS and Hybrid MPI+PGAS Models on Modern Clusters
- Application-level Case Studies for using Hybrid MPI+PGAS Models
- Opportunities for Future Extensions and Enhancements
- Conclusion and Q&A
Brief Biography of Speakers
Dr. Dhabaleswar K. (DK)
Panda is a Professor of Computer Science at the Ohio State
University. He obtained his Ph.D. in computer engineering from the
University of Southern California. His research interests include
parallel computer architecture, high performance computing,
communication protocols, files systems, network-based computing, and
Quality of Service. He has published over 300 papers in major journals
and international conferences related to these research
areas. Dr. Panda and his research group members have been doing
extensive research on modern networking technologies including
InfiniBand, HSE and RDMA over
Converged Enhanced Ethernet (RoCE). His research group is currently
collaborating with National Laboratories and leading InfiniBand and
10GigE/iWARP companies on designing various subsystems of next
generation high-end
systems. The
MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE)
and MVAPICH2-X (Hybrid MPI and PGAS (OpenSHMEM and UPC))
software packages, developed by his research group, are
currently being used by more than 2,100 organizations worldwide (in 71
countries). This software has enabled several InfiniBand clusters
(including the 7th one) to get into the latest TOP500
ranking. These software packages are also available with the Open
Fabrics stack for network vendors (InfiniBand and iWARP), server
vendors and Linux distributors. Dr. Panda's research is supported by
funding from US National Science Foundation, US Department of Energy,
and several industry including Intel, Cisco, SUN, Mellanox,
QLogic, NVIDIA
and NetApp.
He is an IEEE Fellow and a member of ACM.
More details about Dr. Panda, including a comprehensive CV
and publications are available
here.
Sreeram Potluri
is a Ph.D. candidate in the Department of
Computer Science and Engineering at The Ohio State University. He is a member
of the Network-Based Computing Laboratory lead by Prof.D.K.Panda. His research
interests include high-performance interconnects, heterogeneous architectures,
parallel programming models and high-end computing applications. His current
focus is on designing high performance MPI, PGAS and hybrid MPI+PGAS library
runtimes for InfiniBand clusters with GPUs and Intel MIC co-processors. Sreeram
is involved in the design and development of the popular MVAPICH2 and
MVAPICH2-X software packages.
Last Updated: January 5, 2014