Aug 23, 2019
Eskander: HW/SW Co-design For Object Detection Using a Machine Learning Accelerator
27.08.2019, 15.00 Uhr, APB 1096
Invitation to the presentation of the bachelor thesis of Arsany Eskander
Subject: “HW/SW Co-design For Object Detection Using a Machine Learning Accelerator”
Supervisor: Pedram Amini Rad, Lester Kalms, Muhammad Ali
Abstract: A recent trend in neural networks development is to extend the growth of deep learning applications to embedded systems. Extending to platforms that are more resource and energy constrained, designers are constantly striving to develop hardware-efficient algorithms. Efforts are exerted to decrease the power consumption levels in order to meet the growing embedded level application. For high performance embedded systems, FPGAs became a breakthrough. Recently, FPGA-based neural network’s accelerators have gathered huge attention since they provide all trade offs on performance level, energy consumption compared to GPUS, outstanding reconfiguration characteristics and cost with respect to ASICS.
Built under the Open-VX standard, this work presents a uniform neural network accelerator’s architecture that meets the aforementioned constraints. Starting with the abstract CNN functions, this work presents finely tuned data flow models with a high level of abstraction. Using one of the proposed data flow models, this work proposes a steady convolution function that supports not only 2D, but also 3D convolution. Performing a simple modification on such data flow model, the pooling layer is accomplished. Finally, this work presents perfectly tailored fully connected and soft max layers in a way that unrolls their operation to a simple matrix multiplication. For the implemented functions to operate seamlessly on hardware, they support fixed point data types. To guarantee low memory bandwidth, a constraint is set on the number of reads per clock cycle, limiting them to one in all the aforementioned functions. Endeavours are needed to reduce the external memory access and boosting the
hardware processing efficiency, by maximising data reuse. Not only does this work target energy constraints, but it also targets stringent timing restrictions by introducing a fully pipe-lined neural network’s architecture. The proposed architecture is tested for its ability to scale up and adapt to different neural network models such as LeNet-5, Cifar-10 and Alex-Net. Using an efficient full-buffering model for the convolution neural network’s intrinsic functions, the Cifar-10 model can be configured on a ZED board. Adopting hybrid data access and computation patterns, the proposed work achieved 10x speedup in comparison with the Cortexa 53 Arm processor.