Designing Compact Convolutional Neural Network for Embedded Stereo Vision Systems

A4 Konferenspublikationer


Interna författare/redaktörer


Publikationens författare: Mohammad Loni, Amin Majd, Abdolah Loni, Masoud Daneshtalab, Mikael Sjödin, Elena Troubitsyna
Förlagsort: Hanoi, Vietnam
Publiceringsår: 2018
Förläggare: IEEE
Moderpublikationens namn: 2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)
Artikelns första sida, sidnummer: 244
Artikelns sista sida, sidnummer: 251
ISBN: 978-1-5386-6690-6
eISBN: 978-1-5386-6689-0


Abstrakt

Autonomous systems are used in a wide range of domains from indoor utensils to autonomous robot surgeries and self-driving cars. Stereo vision cameras probably are the most flexible sensing way in these systems since they can extract depth, luminance, color, and shape information. However, stereo vision based applications suffer from huge image sizes and computational complexity leading system to higher power consumption. To tackle these challenges, in the first step, GIMME2 stereo vision system [1] is employed. GIMME2 is a high-throughput and cost efficient FPGA-based stereo vision embedded system. In the next step, we present a framework for designing an optimized Deep Convolutional Neural Network (DCNN) for time constraint applications and/or limited resource budget platforms. Our framework tries to automatically generate a highly robust DCNN architecture for image data receiving from stereo vision cameras. Our proposed framework takes advantage of a multi-objective evolutionary optimization approach to design a near-optimal network architecture for both the accuracy and network size objectives. Unlike recent works aiming to generate a highly accurate network, we also considered the network size parameters to build a highly compact architecture. After designing a robust network, our proposed framework maps generated network on a multi/many core heterogeneous System-on-Chip (SoC). In addition, we have integrated our framework to the GIMME2 processing pipeline such that it can also estimate the distance of detected objects. The generated network by our framework offers up to 24x compression rate while losing only 5% accuracy compare to the best result on the CIFAR-10 dataset.


Senast uppdaterad 2019-13-12 vid 03:30