HPC Systems
ZIH operates high performance computing systems with more than 100.000 processor cores, more than 800 GPUs, and a flexible storage hierarchy with more than 40 PB total capacity. The HPC systems provide an optimal research environment especially in the area of data analytics and machine learning as well as for processing extremely large data sets. Moreover it is also a perfect platform for highly scalable, data-intensive and compute-intensive applications.
With shared file systems users are enabled to easily switch between the components, each specialized for different application scenarios. Pre-installed software environments allow for a quick start. The HPC resources can also be used interactively, e.g. in the form of Jupyter notebooks. To access our HPC resources, a short project application is required. The HPC systems are available free of charge to scientists from all over Germany via the NHR center NHR@TUD.
Table of contents
For data-intensive and compute-intensive HPC Applications
The high performance computing system „Barnard“ by Atos/Eviden, with around 75,000 processor cores, provides the major part of the CPU computing capacity available at ZIH, especially for highly parallel, data-intensive and compute-intensive HPC applications.
Typical applications: FEM simulations, CFD simulations with Ansys or OpenFOAM, molecular dynamics with GROMACS or NAMD, computations with Matlab or R
- In total 720 compute nodes
- 104 processor cores (Intel „Sapphire Rapids“) and 512 GB RAM per node
- 90 nodes with 1 TB RAM
- 40 PB parallel file system (usable for all HPC systems)
- Documentation
For HPC Data Analytics and Machine Learning
Capella
The „Capella” GPU cluster from Megware provides 576 powerful Nvidia H100 GPUs, particularly for machine learning and deep learning applications. A fast staging storage for data-intensive applications with a capacity of 1 petabyte and a bandwidth of over 1 terabyte/s is installed exclusively for Capella.
Typical applications: Training of neural networks with PyTorch (deep learning), HPC simulations on GPUs
-
144 nodes, each with 2 AMD CPUs, 4 GPUs and 768 GB DDR5 RAM
-
A total of 576 NVIDIA H100 with 94 GB HBM2e memory each
-
1 PB fast staging storage (WekaIO)
- Documentation
HPC-DA
For High Performance Computing / Data Analytics (HPC-DA) different technologies can be combined toindividual and efficient research infrastructures. For machine learning 192 Nvidia V100 GPUs are installed. For data analytics on CPUs a cluster with high memory bandwidth is provided. Additionally, 312 Nvidia A100 GPUs are provided especially for machine learning applications in ScaDS.AI.
Typical applications: Training of neural networks with PyTorch (deep learning), data analytics with Big Data frameworks such as Apache Spark
- „Alpha Centauri“: 39 AMD Rome nodes, each 8 Nvidia A100 GPUs for machine learning (primarily for ScaDS.AI)
- 32 IBM Power 9 nodes, each 6 Nvidia V100 GPUs for machine learning
- „Romeo“: 192 AMD Rome nodes, each 128 cores, 512 GB RAM with 400 GB/s bandwidth
-
For Processing of extremely large Data Sets
The shared memory system HPE Superdome Flex „Julia“ is especially well suited for data intensive application scenarios, for example to process extremely large data sets completely in main memory or in very fast NVMe memory. As a part of HPC-DA the system is also available for users in entire Germany.
Typical applications: applications that require a very large shared memory, such as genome analysis
- Shared memory system with 32 Intel CascadeLake CPUs with total 896 cores
- 48 TB main memory in a shared address space
- 400 TB NVMe memory cards as a very fast local storage
- Documentation