Jun 13, 2022

Lester Kalms: Methods and Algorithms for Efficient Programming of FPGA-based Heterogeneous Systems for Object Detection (PhD defense)

28.06.2022, 14:00 pm

Invitation to the PhD defense of Mr. Lester Kalms

Topic: Methods and Algorithms for Efficient Programming of FPGA-based Heterogeneous Systems for Object Detection

Supervisor: Prof. Dr.-Ing. Diana Göhringer

Second supervisor: Prof. Dr. Marco D. Santambrogio

Abstract: Nowadays, there is a high demand for computer vision applications in numerous application areas, such as autonomous driving or unmanned aerial vehicles. However, the application areas and scenarios are becoming increasingly complex, and their data requirements are growing. To meet these requirements, it needs increasingly powerful computing systems. FPGA-based heterogeneous systems oﬀer an excellent solution in terms of energy eﬃciency, ﬂexibility, and performance, especially in the ﬁeld of computer vision. Due to complex applications and the use of FPGAs in combination with other architectures, eﬃcient programming is becoming increasingly diﬃcult. Thus, developers need a comprehensive framework with eﬃcient automation, good usability, reasonable abstraction, and seamless integration of tools. It should provide an easy entry point, and reduce the eﬀort to learn new concepts, programming languages and tools. Additionally, it needs optimized libraries for the user to focus on developing applications without getting involved with the underlying details. These should be well integrated, easy to use, and cover a wide range of possible use cases. The framework needs eﬃcient algorithms to execute applications on heterogeneous architectures with maximum performance. These algorithms should distribute applications across various nodes with low fragmentation and communication overhead and ﬁnd a near-optimal solution in a reasonable amount of time. This thesis addresses the research problem of an eﬃcient implementation of object detection applications, their distribution across FPGA-based heterogeneous systems, and methods for automation and integration using toolchains. Within this, the three contributions are the HiFlipVX object detection library, the DECISION framework, and the APARMAP application distribution algorithm.

HiFlipVX is an open-source HLS-based FPGA library optimized for performance and re-
source eﬃciency. It contains 66 highly parameterizable computer vision functions including neural networks, ideally for design space exploration. It extends the OpenVX standard for feature extraction, which is challenging due to unknown element size at design time. All functions are streaming capable to achieve maximum performance by increasing parallelism and reducing oﬀ-chip memory access. It does not require external or vendor libraries, which eases project integration, device coverage, and vendor portability, as shown for Intel. The library consumed on average 0.39 % FFs and 0.32 % LUTs for a set of image processing functions compared to a vendor library. A HiFlipVX implementation of the AKAZE feature detector computes between 3.56 and 4.13 times more pixels per second than the related work, while its resource consumption is comparable to optimized VHDL designs. Its neural network extension achieved a speedup of 3.23 for an AlexNet layer in comparison to a related work, while consuming 73 % less on-chip memory. Furthermore, this thesis proposes an improved feature extraction implementation that achieves a repeatability of 72.57 % when weighting complex cases, while the next best algorithm only achieves 62.99 %.

DECISION is a framework consisting of two toolchains for the eﬃcient programming of FPGA-based heterogeneous systems. Both integrate HiFlipVX and use a joint OpenVX-based frontend to implement computer vision applications. It abstracts the underlying hardware and algorithm details while covering a wide range of architectures and applications. The ﬁrst toolchain targets x86-based systems consisting of CPUs, GPUs, and FPGAs using OpenCL. To create a heterogeneous schedule, it considers device proﬁles, kernel proﬁles and estimates, and FPGA dataﬂow characteristics. It manages synchronization, memory transfers and data coherence at design time. It creates a runtime optimized program which excels by its high parallelism and a low overhead. Additionally, this thesis looks at the integration of OpenCL-based libraries, automatic OpenCL kernel generation, and OpenCL kernel optimization and comparison for diﬀerent architectures. The second toolchain creates an application speciﬁc and adaptive NoC-based architecture. The streaming-optimized architecture enables the reusability of vision functions by multiple applications to improve the resource eﬃciency while maintaining high performance. For a set of example applications, the resource consumption was more than halved, while its overhead was only 0.015 % in terms of performance.

APARMAP is an application distribution algorithm for partition-based and mesh-like FPGA
topologies. It uses a NoC (Network-on-Chip) as communication infrastructure to connect
reconﬁgurable regions and generate an application-speciﬁc hardware architecture. The al-
gorithm uses load balancing techniques to ﬁnd reasonable solutions within a predictable
and scalable amount of time. It optimizes solutions using various heuristics, such as Simu-
lated Annealing and Tabu Search. It uses a multithreaded grid-based approach to prevent
threads from calculating the same solution and getting stuck in local minimums. Its con-
straints and objectives are the FPGA resource utilization, NoC bandwidth consumption, NoC hop count, and execution time of the proposed algorithm. The evaluation showed that the algorithm can deal with heterogeneous and irregular host graph topologies. The algorithm showed a good scalability in terms of computation time for an increasing number of nodes and partitions. It was able to achieve an optimal placement for a set of example graphs up to a size of 196 nodes on host graphs of up to 49 partitions. For a real application with 271 nodes and 441 edges, it was able to achieve a distribution with low resource fragmentation in an average time of 149 ms.

Lester Kalms: Methods and Algorithms for Efficient Programming of FPGA-based Heterogeneous Systems for Object Detection (PhD defense)

About this page

Finden!