Accelerating Big Data Processing with Hadoop and Memcached on
Datacenters with Modern Networking and Storage Architecture
When: February 16, 2014
(1:30-5:00pm)
Where: Orlando, Florida, USA
Abstract
Apache Hadoop is gaining prominence in handling Big Data and
analytics. Similarly, Memcached in Web 2.0 environment is becoming important
for large-scale query processing. These middleware are traditionally written
with sockets and do not deliver best performance on datacenters with modern
high performance networks. In this tutorial, we will provide an in-depth
overview of the architecture of Hadoop components (HDFS, MapReduce, RPC, HBase,
etc.) and Memcached. We will examine the challenges in re-designing the
networking and I/O components of these middleware with modern interconnects,
protocols (such as InfiniBand, iWARP, RoCE, and RSocket) with RDMA and storage
architecture. Using the publicly available Hadoop-RDMA
(http://hadoop-rdma.cse.ohio-state.edu)
software package, we will provide case
studies of the new designs for several Hadoop components and their associated
benefits. Through these case studies, we will also examine the interplay
between high performance interconnects, storage systems (HDD and SSD), and
multi-core platforms to achieve the best solutions for these components.
Targeted Audience and Scope
The tutorial content is planned for half-a-day. This tutorial
is targeted for various categories of people working in the areas of Big Data
including high-performance Hadoop, high performance communication and I/O
architecture, storage, networking, middleware, cloud computing and applications.
Specific audience this tutorial is aimed at include: - Scientists,
engineers, researchers, and students engaged in designing next-generation Big
Data systems and applications
- Designers and developers of Big Data,
Hadoop and Memcached middleware
- Newcomers to the field of Big Data who
are interested in familiarizing themselves with Hadoop, Memcached, RDMA, and
high-performance networking
- Managers and administrators responsible
for setting-up next generation Big Data environment and high-end systems/facilities
in their organizations/laboratories
The content level will be as follows: 30% beginner, 40%
intermediate, and 30% advanced. There is no fixed pre-requisite. As long as the
attendee has a general knowledge in Big Data, Hadoop, high performance
computing, networking and storage architecture, and related issues, he/she will
be able to understand and appreciate it. The tutorial is designed in such a way
that an attendee gets exposed to the topics in a smooth and progressive manner.
Outline of the Tutorial
- Introduction to Big
Data Applications and Analytics
- Overview of Hadoop MapReduce
Programming Model
- Architecture Overview of Apache Hadoop and
Memcached
- HDFS
- MapReduce
- RPC
- HBase
- Memcached
- Overview of Modern Interconnects, Protocols and Storage
Architecture
- InfiniBand and RDMA
- 10/40 GigE, iWARP and
RoCE technologies
- RSocket and SDP protocols
- SSD-based
storage
- Challenges in Accelerating Hadoop and Memcached on
Modern Datacenters
- Overview of Benchmarks and Applications using
Hadoop and Memcached
- Acceleration Case Studies and In-Depth
Performance Evaluation
- HDFS over InfiniBand with RDMA and SSD
- MapReduce over InfiniBand with RDMA and SSD
- RPC over InfiniBand
with RDMA
- HBase over InfiniBand with RDMA and SSD
- Memcached
over InfiniBand with RDMA and SSD
- Hadoop-RDMA Distribution
with Optimizations and Tuning
- Ongoing and Future Activities for Hadoop
Accelerations
- Conclusion and Q&A
Brief Biography of Speakers
Dr. Dhabaleswar
K. (DK) Panda is a Professor of Computer Science at the Ohio State
University. He obtained his Ph.D. in computer engineering from the University
of Southern California. His research interests include parallel computer
architecture, high performance computing, communication protocols, files
systems, network-based computing, and Quality of Service. He has published over
300 papers in major journals and international conferences related to these
research areas. Dr. Panda and his research group members have been doing
extensive research on modern networking technologies including InfiniBand, HSE
and RDMA over Converged Enhanced Ethernet (RoCE). His research group is
currently collaborating with National Laboratories and leading InfiniBand and
10GigE/iWARP companies on designing various subsystems of next generation
high-end systems. The MVAPICH2
(High Performance MPI over InfiniBand, iWARP and RoCE) open-source software
package, developed by his research group, are currently being used by more than
2,100 organizations worldwide (in 71 countries). This software has enabled
several InfiniBand clusters (including the 7th one) to get into the latest
TOP500 ranking. These software packages are also available with the Open
Fabrics stack for network vendors (InfiniBand and iWARP), server vendors and
Linux distributors. The new RDMA-enabled Apache Hadoop package, consisting of
acceleration for HDFS, MapReduce and RPC, is publicly available from
http://hadoop-rdma.cse.ohio-state.edu. Dr. Panda's research is supported
by funding from US National Science Foundation, US Department of Energy, and
several industry including Intel, Cisco, SUN, Mellanox, QLogic, NVIDIA and
NetApp. He is an IEEE Fellow and a member of ACM. More details about Dr.
Panda, including a comprehensive CV and publications are available here.
Dr. Xiaoyi Lu
is a postdoctoral researcher in the
Department of Computer Science and Engineering at the Ohio State
University, USA.
He received the Ph.D. degree in Computer Science from
Institute of Computing Technology, Chinese Academy of Sciences,
Beijing, China. His current research interests
include high performance interconnects and protocols, Big Data, Hadoop
Ecosystem, and Parallel Computing Models (MPI/PGAS). He has published
over 20 papers in major journals and international conferences related
to these research areas. He has been actively involved in various
professional activities in academic journals and conferences.
Recently, Dr. Lu is doing research and working on design and
development for the high performance Hadoop-RDMA software package
(http://hadoop-rdma.cse.ohio-state.edu). He is a member of IEEE. More
details about Dr. Lu are available here.
Last Updated: January 09, 2014