5449: Introduction to High-Performance Deep/Machine Learning
Instructors: Prof. Dhabaleswar K. (DK) Panda and Prof. Hari Subramoni
Autumn 2023
Course Number: 5449
Class Number: 36643 (Grad) and 38406 (Undergrad)
Pre-Requisite: 2431 or 3430; and 3521 or 5521; or Grad standing.
Credits: 3
Course Time: TuTh 11:10 am - 12:30 pm
Classroom: Baker Systems 198
Course Description:
Recent advancements in Artificial Intelligence (AI),
including Large Language Models (LLMs) and Chat GPT, have been fueled
by the resurgence of Deep Neural Networks (DNNs); various Deep
Learning (DL) frameworks like PyTorch, Tensorflow, and
Chainer; various Machine Learning (ML) frameworks like K-means;
and various data science frameworks like Dask.
DNNs have found widespread applications in classical areas
like Image Recognition, Speech Processing, Textual Analysis, as well
as areas like Cancer Detection, Medical Imaging, Physics,
Materials Science, and even Autonomous
Vehicle systems. However, scaling distributed training with scale-up
and scale-out approaches are still challenging. This is leading to the
emergence of a new field called "High-Performance Deep/Machine Learning".
The objectives of this course are to understand the principles and the
practice of this emerging trend, the open set of challenges, how
modern HPC technologies can be used to accelerate DL/ML training
and inferencing and apply these benefits to the real-world applications.
Topics to be Covered
- High-Performance Deep Learning: Issues, Trends, and Challenges
- Introduction to Deep Learning and Terminologies
- The Past, Present, and Future of Deep Learning
- What are Deep Neural Networks?
- Diverse Applications of Deep Learning
- Overview of commonly used Terminologies
- Overview of DL, ML, and Data Science Frameworks
- TensorFlow
- PyTorch
- Caffe and Caffe2
- Chainer/ChainerMN
- LBANN
- Horovod
- Deepspeed
- K-means
- Dask
- Introdction to HPC Technologies
- GPUs, CPUs, and TPUs
- High-Performance Networking (InfiniBand, HSE and RoCE)
- MPI, CUDA-Aware MPI, NCCL
- DGX-1, DGX-2, IBM Power-AI
- Overview of State-of-the-art DL Models
- ImageNet and VGG
- GoogleNet
- ResNet
- NASNet
- DeepSpeech
- Large Language Models (LLMs)
- Challenges for Exploiting HPC for DL
- Overall Challenges
- Need for Co-Design
- The need for Co-Designing DL/ML frameworks and HPC Middleware
- Data Parallel DNN Training
- Basic Solutions for CPU- and GPU-based Training
- NVIDIA NCCL, Baidu-allreduce, Facebook Gloo
- Co-Designs
- Deep Learning and Big Data
- Model Parallel DNN Training
- Advanced Parallelization Strategies
- Parallel Inferencing Strategies
- Advanced Topics
- Latest and Emerging Trends (Hardware, Software, and Training)
- Larger-scale Models
- Distributed Training with
- Elastic Training
- Standardization on Benchmarking, OpenAI, ONNX
- Overview of AI Chips and Processors (GraphCore, Loihi, Pohohiki, Habana/Gaudi, Cerebras, etc.)
Text:
Selected papers from the literature including papers focusing on
past and on-going
research activities in the group.
Laboratory Exercises:
The course will involve laboratory expercises for students to
experiment with Deep/Machine Learning Frameworks. These exercises will be
carried out on OSC (Ohio Supercomputing Center) clusters using
CPUs and GPUs. This will provide hands-on knowledge to the students in the area
of high-performance deep/machine learning.
Last Updated: June 17, 2023