Oct 24, 2024; Workshop
NHR-TrainingsangeboteNHR tutorial: Efficiently using & analyzing deep learning frameworks on HPC resources
Deep Learning APIs such as TensorFlow or PyTorch usually do not guarantee efficient use of HPC resources. With existing HPC tools, the efficiency analysis of Deep Learning applications is more difficult than for classical HPC applications. This tutorial aims to bridge the gap between Deep Learning APIs and classical HPC infrastructure and presents practical approaches and recipes for efficient model training on HPC resources. Moreover, we show methods to analyze DeepLearning application's performance.
Agenda
This tutorial will be organized as a guided hands-on session covering the following topics.
- Installing/using frameworks via PIP, CONDA, EasyBuild modules and containers. Pros and cons.
- CPU and GPU allocation for the TensorFlow and PyTorch Frameworks with the SLURM batch system
- Efficient provisioning and reading of (training) input data for different classes of file systems
- Using pre-build vendor containers
- Parallel training / Scaling the training
- Ways to check performance
Handouts
The course material (slides) will be made available to the class participants.
HPC-Certification Forum Links
Prerequisites
-
First contact with machine learning application
-
Basic understanding of machine learning approaches
-
Hands-On: Use an TensorFlow environment
Learning Objectives
- Understanding of HPC resources and how to utilize it with machine learning tasks
- Basic knowledge of performance analyzing tools
Registration
Link: http://event.zih.tu-dresden.de/nhr/deepl-hpc/
Registration is closing on October 13th 2024. The NHR tutorial is limited to 30 participants.
You will receive the access data shortly before the event by email to your registered email address.
Further Information
Course language: English
Target group: HPC Beginners, HPC Users, HPC Dev, ML users and ML developers
Für Fragen steht Ihnen Dr. Natalie Breidenbach () zur Verfügung.