Network-Based Computing for HPC, Big Data, Deep Learning, and Cloud
Instructor: Prof. Dhabaleswar K. (DK) Panda
Credits: 3
Course Description:
The objective of this course is to understand the principles and the
practice of the emerging network-based computing paradigm. We will
study advances in networking, computing, accelerator, and storage
technologies as well as the computational/networking demands of
current and emerging applications. Brief overview of the emerging
network-based computing applications will be covered. System
architectures for different kinds of network-based computing systems
(HPC, Big Data, Deep Learning, and Cloud) will be analyzed. The role
of computing technologies (multi-core/many-core), networking
technologies (InfiniBand, High-Speed Ethernet, Omni-Path,
proprietary), accelerators (GPGPUs and FPGAs), storage technologies
(SSDs, NVMe-based SSDs, Burst Buffers), programming models and
runtimes (MPI, PGAS, Hybrid MPI+PGAS, Hadoop MapReduce, Spark, Caffe,
CNTK, and TensorFlow), storage/file systems (parallel file systems,
HDFS, and Ceph), fault-tolerance, virtualization and
power-/energy-aware issues in designing next-generation exascale
systems will be focussed. Limitations of current solutions, impact of
next generation technologies on designing future exascale computing
systems and applications and future research challenges will be
discussed.
Topics to be Covered (Tentative)
- Overview of Network-Based Computing
- Trends in Designing Systems for HPC, Big Data, Deep Learning, and Cloud
- Computing Technologies and Trends
- Multi-Core (x86, ARM, OpenPower), GPGPUs, and FPGAs
- Networking Technologies and Trends
- InfiniBand, Ethernet, Omni-Path, and Proprietary
- Storage Technologies and Trends
- SSDs, NVMe-based SSDs, and Burst Buffers
- User-Level Communication Protocols and Benefits
- Programming Models and Environments
- MPI, Shared Memory and Distributed Shared Memory,
- Partitioned Global Address Space (PGAS) - UPC, OpenSHMEM, and CAF
- Hadoop MapReduce, HBase, and Spark
- Caffe, CNTK, and Tensorflow
- High-Performance and Scalable Implementation of Programming Models
and Environments
- RDMA-based networking
- Multi-core
- GPGPUs and FPGAs
- Collective Communication Optimizations
- Shared-Memory-aware, Offload-based, Topology-aware and Power-aware
- Scalable I/O and File Systems
- Designing High-Performance and Scalable Data Centers
- Multi-tier Data Centers
- Memcached Designs (Memory-based and Hybrid Memory+SSD-based)
- Desing High-Performance and Scalable Big Data Systems
- HDFS, MapReduce, RPC, HBase, and Spark
- RDMA-based Networking and Multi-core
- Desing High-Performance and Scalable Deep Learning Systems
- Caffe, CNTK, and Tensorflow
- RDMA-based Networking and Multi-core
- Designing High-Peformance Cloud Environment
- Virtualization including SR-IOV
- VMM-Bypass
- WAN Technology and RDMA
The objective of this course is to understand the principles and
practice of the emerging network-based computing paradigm. We will
study advances in networking and computing technologies as well as the
computational/networking demands of current and emerging
applications. Brief overview of the emerging network-based
computing applications will be covered. System architectures
for different kinds of
network-based computing systems will be analyzed. Computational,
communication, networking, I/O, and QoS requirements of emerging
network-based computing applications together with emerging
trends (such as GPUs/accelerators, Partitioned Global Address
Systems (PGAS), Hadoop/MapReduce/HBase/Spark, Deep Learning,
and virtualization) will be analyzed.
Challenges and research issues in designing
network-based computing systems and applications will be discussed in
detail.
Limitations of current solutions, impact of next generation
networking technologies
on designing future network-based computing
systems and applications, and future research challenges will be
discussed.
Last Updated: August 5, 2018