02.03.2023
Veronia Iskandar: Methods, Algorithms and Frameworks for Near-Memory Computing using 3D-Stacked Memories (Statusvortrag)
09.03.2023, 13:00 Uhr, Raum APB 1096
Einladung zum Statusvortrag im Promotionsverfahren von Frau Veronia Bahaa Fayez Iskandar
Thema: Methods, Algorithms and Frameworks for Near-Memory Computing using 3D-Stacked Memories
Betreuerin: Prof. Dr. Diana Göhringer
Fachreferent: Prof. Dr. Wolfgang E. Nagel
Abstract: The near-memory computing (NMC) paradigm has emerged as a promising method for overcoming the memory wall challenges of future computing architectures. Modern systems integrating 3D-stacked DRAM memory can be leveraged to prevent unnecessary data movement between the main memory and the CPU. FPGA vendors have started introducing 3D-stacked memories to their products in an effort to remain competitive on bandwidth requirements of modern memory-intensive applications. With the high bandwidth offered from such memories and specifically designed hardware, FPGAs have become a competitor to GPU solutions in terms of speed and energy efficiency. Recent NMC proposals target various types of data processing workloads such as graph processing and machine learning.
This work addresses the research questions of how to leverage the full bandwidth of 3D-stacked high-bandwidth memory and how to facilitate the adoption of the near-memory computing paradigm without introducing limitations for developers. We identify design goals and criteria for hardware accelerators to be able to efficiently use a high-bandwidth memory, such as controlling the access of processing elements to the memory channels, and data partitioning and replication to provide parallel computation. We propose methods and frameworks for high-level application characterization, performance prediction on NMC systems, and automatic selection of kernels suitable for execution on NMC units. Such frameworks are based on machine learning and compiler-assisted models. We extend our selection framework to perform idiom matching for well-known kernel patterns and allow replacement of CPU code with high-performance FPGA accelerator calls to be executed near a high-bandwidth memory. Furthermore, we provide a SystemC-based simulation model for the HBM subsystem present in AMD Xilinx FPGAs. Future work includes further tuning of the simulation and compiler models, and adding support for multi-kernel and multi-application workloads.