12.10.2023
Abeer Mubashshir Khan: FPGA-optimized implementation of an Arithmetic Logic Unit (ALU) for a RISC-V processor (Projektarbeit)
19.10.2023, 13:30 Uhr, APB 1035
Einladung zur Präsentation von Herrn Abeer Mubashshir Khan
Thema: FPGA-optimized implementation of an Arithmetic Logic Unit (ALU) for a RISC-V processor
Projekt: Projektarbeit
Betreuer: Muhammad Ali, Markus Helbig
Abstract:
In an era marked by the ascent of embedded systems and IoT applications, we are faced with the challenge of designing sleeker, faster, and energy-efficient devices. FPGAs due to their configurable architecture provide a platform to design energy-efficient, cost-effective, and high-performance devices. Hence are gaining popularity in the development and enhancement of processors with energy constraints. One such processor optimization for FPGAs involving the open-source RISC-V instruction set architecture is explored in this work. Focus is drawn towards optimization of the execution stage more precisely, optimization of ALU of a processor that implements RV32IM ISA. The ALU being an essential component of this stage needs to be accurate while being as compact and as efficient as possible for FPGA applications. This is realized by designing an ALU that leverages the DSP slice’s innate arithmetic and logical capabilities to put forth a solution that limits the need for extra hardware and improves latency and power requirements. Here we focus on RV32IM ISA. Basic arithmetic instructions of this base architecture are directly implementable by the DSP slice. To realize M extension, multiple such DSP slices are needed. Basic to advanced multiplication algorithms, like traditional schoolbook multiplication and Karatsuba, and other resource-efficient methodologies like cascading DSP slices are explored and their results are compared to find the most fitting ALU design. ALU instructions are performed using the DSP slice’s inherent dynamic switching capability. These optimizations result in a compact, high-performing ALU within the RISC-V processor framework which outperforms the original by consuming 40% less power and 34% fewer resources with an overall latency improvement of 23.28 MHz.