InfiniBand and Ethernet Architectures for Scientific and Enterprise Computing:
Opportunities and Challenges
When: January 10, 2010 (Afternoon)
Where: Bangalore, India
Abstract
InfiniBand Architecture (IB) and 10-Gigabit Ethernet (10GE)
architectures are generating a lot of excitement towards building next
generation High Performance Computing (HPC) systems and enterprise
datacenters. This tutorial will provide an overview of these emerging
interconnect architectures, their offered features, their current
market standing, and their suitability for prime-time HPC. It will
start with a brief overview of IB, 10GE and their architectural
features. An overview of the emerging OpenFabrics stack which
encapsulates both IB and 10GE in a unified manner will be
presented. IB and 10GE hardware/software solutions and the market
trends will be highlighted. Finally, sample performance numbers
highlighting the performance these technologies can achieve in
different environments such as MPI, Sockets, Parallel File Systems,
Multi-tier Datacenters, and Virtual Machines, will be shown.
Targeted Audience and Scope
This tutorial is targeted for various categories of people working in
the areas of high performance communication and I/O architecture,
storage,
networking, middleware, virtualization, and applications related to
high-end systems:
- Newcomers to the field of high-speed networking
architecture who are looking
to get familiar with IB and 10GE technologies.
- Scientists, engineers, and researchers working on the design and
development of next generation high-end systems including clusters,
data centers, storage centers.
- System administrators of large-scale clusters
- Developers of next generation networked computing middleware and
applications
- Managers and administrators responsible for setting-up next
generation high-end systems and facilities in their
organizations/laboratories.
The tutorial content is planned for half-a-day.
The content level is
broken down as 50% beginner, 30% intermediate,
and 20% advanced. There
is no fixed pre-requisite. As long as the attendee has a general
knowledge in high performance computing, networking, and
storage, he/she will
be able to understand and appreciate it.
The tutorial is designed in
such a way that an attendee gets exposed to the
topics in a smooth and
progressive manner.
Outline of the Tutorial
-
What are InfiniBand and 10GigE/iWARP?
- TCP vs. User-level communication protocols
- Requirements (communication, I/O, performance, cost,
RAS) from
the perspective of designing next generation
high-end systems and
scalable data centers
- Overview of InfiniBand Architecture
- Architecture and Basic Hardware Components
- Novel Features (Hardware Protocol Offload and
Communication Semantics)
- IB Verbs Interface
- Management and Services (Subnet Management and
Hardware Support for Scalable Network Management)
- Overview of 10-Gigabit Ethernet Family
- Architecture and Basic Components (Similarity
and Differences with InfiniBand)
- Out-of-Order Data Placement
- Dynamic and Fine-grained Data Rate Control
- Existing Implementations of 10GE/iWARP
- InfiniBand/10-GigE Convergence Technologies
- Overview of InfiniBand and 10-GigE Products
(hardware and software)
- Vendors, Switches, and Host Channel Adapters
- Overview of ConnectX architecture
- Overview of OpenFabrics Architecture and Convergence
- Pointers to IBA and 10GigE installations
- Designing High-end Systems with InfiniBand and iWARP: Research
Challenges, Case Studies and Performance Evaluation
- High-end clusters with MPI-1 and MPI-2
- Sockets (SDP and IPoIB)
- File Systems (Lustre and NFS over RDMA)
- Multi-tier Datacenters
- Virtualization Support (Xen-IB)
- Conclusions, Final Q&A, and Discussion
Brief Biography of Speakers
Dr. Dhabaleswar K. (DK)
Panda is a Professor of Computer Science at the Ohio State
University. He obtained his Ph.D. in computer engineering from the
University of Southern California. His research interests include
parallel computer architecture, high performance computing,
communication protocols, files systems, network-based computing, and
Quality of Service. He has published over 250 papers in major journals
and international conferences related to these research
areas. Dr. Panda and his research group members have been doing
extensive research on modern networking technologies including
InfiniBand and 10GigE/iWARP. His research group is currently
collaborating with National Laboratories and leading InfiniBand and
10GigE/iWARP companies on designing various subsystems of next
generation high-end
systems. The
MVAPICH/MVAPICH2 (High Performance MPI over InfiniBand and iWARP)
open-source software packages, developed by his research group, are
currently being used by more than 1,000 organizations worldwide (in 53
countries). This software has enabled several InfiniBand clusters
(including the 5th ranked one) to get into the latest TOP500
ranking. These software packages are also available with the Open
Fabrics stack for network vendors (InfiniBand and iWARP), server
vendors and Linux distributors. Dr. Panda's research is supported by
funding from US National Science Foundation, US Department of Energy,
and several industry including Intel, Cisco, SUN, Mellanox, QLogic
and NetApp.
He is an IEEE Fellow and a member of ACM.
More details about Dr. Panda, including a comprehensive CV
and publications are available
here.
Dr. Pavan Balaji
holds a joint appointment as an Assistant Computer
Scientist at the Argonne National Laboratory and as a research fellow
of the Computation Institute at the University of Chicago. He had
received his Ph.D. from the Computer Science and Engineering
department at the Ohio State University. His research interests
include high-speed interconnects, efficient protocol stacks, parallel
programming models and middleware for communication and I/O, and job
scheduling and resource management. He has more than 60 publications
in these areas and has delivered nearly 75 talks and tutorials at
various conferences and research institutes. He has received several
awards for his research activities including an Outstanding Researcher
award at the Ohio State University, the Director's Technical
Achievement award at Los Alamos National Laboratory, and several best
paper and other awards. Dr. Balaji has also served as a chairman or
editor in more than a dozen journals, conferences and workshops
including ICPP 2010, Hot Interconnects 2010, P2S2 workshop 2010, ICCCN
2010, CCGrid 2010, and JHPCA 2010, and as a technical program
committee member in numerous conferences and workshops. He is a member
of the IEEE and ACM. More details about Dr. Balaji are available at
available here .
Last Updated: December 18, 2009