Interaction between HPC, AI, and Research Data (HAI)
Pipelines for data processing are cost-intensive due to the high volume of software development and computing power. The HAI project aims to develop strategies and tools for efficient and collaborative software development and computing efficiency. The focus is on the efficient and collaborative development of data processing and AI pipelines as well as their computing and energy efficiency on HPC and cloud resources. The aim is to develop strategies and software tools for efficient and scalable AI workloads and data management on NHR computing resources and to make these available to users. In order to increase their productivity and skills, HAI is geared towards specific use cases of these users.
Focal points of the project
- Efficient creation and execution of data processing pipelines with LLM-automated data engineering for rapid development.
- Efficient and sustainable use of HPC resources for scalable AI model training pipelines using monitoring and optimization techniques.
- Providing a collaborative code development and execution environment, coupled with FAIR data management practices to ensure integrity throughout the AI project lifecycle.
The project adds value to NHR and their strategic goals in several areas. It will:
- contribute strategies for increased efficiency and usability of HPC resources
- gain new technical and scientific insights into the efficient development and execution of data analytics pipelines on HPC (including methods to increase sustainability and energy efficiency)
- Strengthen the skills of HPC users and young researchers through tutorials and other forms of training
- further develop and publish open source software and foster collaboration between NHR centers
Involve users through a "Call for Use Cases" and planned use cases from the community to tailor developments to their needs. Collaboration, training, documentation and the publication of best practices will also strengthen users' skills. Users can also benefit from easy-to-use interfaces for the LLM-supported data engineering tool and the collaborative data science environment as well as from the open source developments. The software is not tied to specific centers and can also be used elsewhere.
Partner
- RWTH Aachen University (IT Center, Institute for Energy-Efficient Buildings and Indoor Climate, Institute for Automation of Complex Power Systems)
- NHR4CES@TUDa (Darmstadt)
- NHR@FAU (Erlangen)
- NHR-North@Göttingen
- TU Dresden
- NHR@KIT (Karlsruhe)
- Network NHR@SW
- ScaDS.AI
Duration
from 10/2024-09/2026
Funding
NHR Strategy Fund Project