HPC Systems
ZIH operates high performance computing systems with almost 100.000 processor cores, more than 500 GPUs, and a flexible storage hierarchy with more than 40 PB total capacity. The HPC systems provide an optimal research environment especially in the area of data analytics and machine learning as well as for processing extremely large data sets. Moreover it is also a perfect platform for highly scalable, data-intensive and compute-intensive applications.
With shared file systems users are enabled to easily switch between the components, each specialized for different application scenarios. Pre-installed software environments allow for a quick start. To access our HPC resources, a short project application is required. Via the NHR center NHR@TUD the HPC systems are available for users in entire Germany.
Table of contents
For data-intensive and compute-intensive HPC Applications
The high performance computing system „Barnard“ by Atos/Eviden provides the major part of the computing capacity available at ZIH, especially for highly parallel, data-intensive and compute-intensive HPC applications.
Typical applications: FEM simulations, CFD simulations with Ansys or OpenFOAM, molecular dynamics with GROMACS or NAMD, computations with Matlab or R
- In total more than 60.000 processor cores (Intel „Sapphire Rapids“)
- 104 processor cores and 512 GB RAM per node
- Documentation
For HPC Data Analytics and Machine Learning
For High Performance Computing / Data Analytics (HPC-DA) different technologies can be combined toindividual and efficient research infrastructures. Especially for applications in the area of machine learning and deep learning 192 powerful Nvidia V100 GPUs are installed. Resources can be used also interactively, for example with Jupyter Notebooks. For data analytics on CPUs a cluster with high memory bandwidth is provided. To efficiently access large data sets 2 petabytes of flash memory with a total bandwidth of about 2 terabytes/s are available. Additionally, 312 Nvidia A100 GPUs are provided especially for machine learning applications in ScaDS.AI.
Typical applications: training of neural nets with PyTorch (deep learning), data analytics with Big Data frameworks such as Apache Spark
- 39 AMD Rome nodes, each 8 Nvidia A100 GPUs for machine learning (primarily for ScaDS.AI)
- 32 IBM Power 9 nodes, each 6 Nvidia V100 GPUs for machine learning
- 192 AMD Rome nodes, each 128 cores, 512 GB RAM with 400 GB/s bandwidth
- 2 PB fast flash memory (NVMe)
- 10 PB archive with access via S3, Cinder, NFS, QXFS
-
For Processing of extremely large Data Sets
The shared memory system HPE Superdome Flex is especially well suited for data intensive application scenarios, for example to process extremely large data sets completely in main memory or in very fast NVMe memory. As a part of HPC-DA the system is also available for users in entire Germany.
Typical applications: applications that require a very large shared memory, such as genome analysis
- Shared memory system with 32 Intel CascadeLake CPUs with total 896 cores
- 48 TB main memory in a shared address space
- 400 TB NVMe memory cards as a very fast local storage
- Documentation