Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application

Seifeddine Messaoud; Soulef Bouaafia; Amna Maraoui; Ahmed Chiheb Ammari; Lazhar Kheriji; Mohsen Machhout

doi:10.1016/j.compeleceng.2021.107671

Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application

Seifeddine Messaoud, Soulef Bouaafia, Amna Maraoui, Ahmed Chiheb Ammari, Lazhar Kheriji, Mohsen Machhout

Electrical and Computer Engineering

Research output: Contribution to journal › Article › peer-review

16 Citations (Scopus)

Abstract

Embedded vision systems are the best solutions for high-performance and lightning-fast inspection tasks. As everyday life evolves, it becomes almost imperative to harness artificial intelligence (AI) in vision applications that make these systems intelligent and able to make decisions close to or similar to humans. In this context, the AI's integration on embedded systems poses many challenges, given that its performance depends on data volume and quality they assimilate to learn and improve. This returns to the energy consumption and cost constraints of the FPGA-SoC that have limited processing, memory, and communication capacity. Despite this, the AI algorithm implementation on embedded systems can drastically reduce energy consumption and processing times, while reducing the costs and risks associated with data transmission. Therefore, its efficiency and reliability always depend on the designed prototypes. Within this range, this work proposes two different designs for the Traffic Sign Recognition (TSR) application based on the convolutional neural network (CNN) model, followed by three implantations on PYNQ-Z1. Firstly, we propose to implement the CNN-based TSR application on the PYNQ-Z1 processor. Considering its runtime result of around 3.55 s, there is room for improvement using programmable logic (PL) and processing system (PS) in a hybrid architecture. Therefore, we propose a streaming architecture, in which the CNN layers will be accelerated to provide a hardware accelerator for each layer where direct memory access (DMA) interface is used. Thus, we noticed efficient power consumption, decreased hardware cost, and execution time optimization of 2.13 s, but, there was still room for design optimizations. Finally, we propose a second co-design, in which the CNN will be accelerated to be a single computation engine where BRAM interface is used. The implementation results prove that our proposed embedded TSR design achieves the best performances compared to the first proposed architectures, in terms of execution time of about 0.03 s, computation roof of about 36.6 GFLOPS, and bandwidth roof of about 3.2 GByte/s.

Original language	English
Article number	107671
Pages (from-to)	107671
Number of pages	1
Journal	Computers and Electrical Engineering
Volume	98
DOIs	https://doi.org/10.1016/j.compeleceng.2021.107671
Publication status	Published - Mar 1 2022

Keywords

Acceleration
CNN
Co-design
FPGA
PYNQ-Z1

ASJC Scopus subject areas

Control and Systems Engineering
General Computer Science
Electrical and Electronic Engineering

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1016/j.compeleceng.2021.107671

Cite this

@article{b0cc5c5b65b5402284a33705a9759568,

title = "Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application",

abstract = "Embedded vision systems are the best solutions for high-performance and lightning-fast inspection tasks. As everyday life evolves, it becomes almost imperative to harness artificial intelligence (AI) in vision applications that make these systems intelligent and able to make decisions close to or similar to humans. In this context, the AI's integration on embedded systems poses many challenges, given that its performance depends on data volume and quality they assimilate to learn and improve. This returns to the energy consumption and cost constraints of the FPGA-SoC that have limited processing, memory, and communication capacity. Despite this, the AI algorithm implementation on embedded systems can drastically reduce energy consumption and processing times, while reducing the costs and risks associated with data transmission. Therefore, its efficiency and reliability always depend on the designed prototypes. Within this range, this work proposes two different designs for the Traffic Sign Recognition (TSR) application based on the convolutional neural network (CNN) model, followed by three implantations on PYNQ-Z1. Firstly, we propose to implement the CNN-based TSR application on the PYNQ-Z1 processor. Considering its runtime result of around 3.55 s, there is room for improvement using programmable logic (PL) and processing system (PS) in a hybrid architecture. Therefore, we propose a streaming architecture, in which the CNN layers will be accelerated to provide a hardware accelerator for each layer where direct memory access (DMA) interface is used. Thus, we noticed efficient power consumption, decreased hardware cost, and execution time optimization of 2.13 s, but, there was still room for design optimizations. Finally, we propose a second co-design, in which the CNN will be accelerated to be a single computation engine where BRAM interface is used. The implementation results prove that our proposed embedded TSR design achieves the best performances compared to the first proposed architectures, in terms of execution time of about 0.03 s, computation roof of about 36.6 GFLOPS, and bandwidth roof of about 3.2 GByte/s.",

keywords = "Acceleration, CNN, Co-design, FPGA, PYNQ-Z1",

author = "Seifeddine Messaoud and Soulef Bouaafia and Amna Maraoui and {Chiheb Ammari}, Ahmed and Lazhar Kheriji and Mohsen Machhout",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.",

year = "2022",

month = mar,

day = "1",

doi = "10.1016/j.compeleceng.2021.107671",

language = "English",

volume = "98 ",

pages = "107671",

journal = "Computers and Electrical Engineering",

issn = "0045-7906",

publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application

AU - Messaoud, Seifeddine

AU - Bouaafia, Soulef

AU - Maraoui, Amna

AU - Chiheb Ammari, Ahmed

AU - Kheriji, Lazhar

AU - Machhout, Mohsen

N1 - Publisher Copyright: © 2022 Elsevier Ltd DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

PY - 2022/3/1

Y1 - 2022/3/1

N2 - Embedded vision systems are the best solutions for high-performance and lightning-fast inspection tasks. As everyday life evolves, it becomes almost imperative to harness artificial intelligence (AI) in vision applications that make these systems intelligent and able to make decisions close to or similar to humans. In this context, the AI's integration on embedded systems poses many challenges, given that its performance depends on data volume and quality they assimilate to learn and improve. This returns to the energy consumption and cost constraints of the FPGA-SoC that have limited processing, memory, and communication capacity. Despite this, the AI algorithm implementation on embedded systems can drastically reduce energy consumption and processing times, while reducing the costs and risks associated with data transmission. Therefore, its efficiency and reliability always depend on the designed prototypes. Within this range, this work proposes two different designs for the Traffic Sign Recognition (TSR) application based on the convolutional neural network (CNN) model, followed by three implantations on PYNQ-Z1. Firstly, we propose to implement the CNN-based TSR application on the PYNQ-Z1 processor. Considering its runtime result of around 3.55 s, there is room for improvement using programmable logic (PL) and processing system (PS) in a hybrid architecture. Therefore, we propose a streaming architecture, in which the CNN layers will be accelerated to provide a hardware accelerator for each layer where direct memory access (DMA) interface is used. Thus, we noticed efficient power consumption, decreased hardware cost, and execution time optimization of 2.13 s, but, there was still room for design optimizations. Finally, we propose a second co-design, in which the CNN will be accelerated to be a single computation engine where BRAM interface is used. The implementation results prove that our proposed embedded TSR design achieves the best performances compared to the first proposed architectures, in terms of execution time of about 0.03 s, computation roof of about 36.6 GFLOPS, and bandwidth roof of about 3.2 GByte/s.

AB - Embedded vision systems are the best solutions for high-performance and lightning-fast inspection tasks. As everyday life evolves, it becomes almost imperative to harness artificial intelligence (AI) in vision applications that make these systems intelligent and able to make decisions close to or similar to humans. In this context, the AI's integration on embedded systems poses many challenges, given that its performance depends on data volume and quality they assimilate to learn and improve. This returns to the energy consumption and cost constraints of the FPGA-SoC that have limited processing, memory, and communication capacity. Despite this, the AI algorithm implementation on embedded systems can drastically reduce energy consumption and processing times, while reducing the costs and risks associated with data transmission. Therefore, its efficiency and reliability always depend on the designed prototypes. Within this range, this work proposes two different designs for the Traffic Sign Recognition (TSR) application based on the convolutional neural network (CNN) model, followed by three implantations on PYNQ-Z1. Firstly, we propose to implement the CNN-based TSR application on the PYNQ-Z1 processor. Considering its runtime result of around 3.55 s, there is room for improvement using programmable logic (PL) and processing system (PS) in a hybrid architecture. Therefore, we propose a streaming architecture, in which the CNN layers will be accelerated to provide a hardware accelerator for each layer where direct memory access (DMA) interface is used. Thus, we noticed efficient power consumption, decreased hardware cost, and execution time optimization of 2.13 s, but, there was still room for design optimizations. Finally, we propose a second co-design, in which the CNN will be accelerated to be a single computation engine where BRAM interface is used. The implementation results prove that our proposed embedded TSR design achieves the best performances compared to the first proposed architectures, in terms of execution time of about 0.03 s, computation roof of about 36.6 GFLOPS, and bandwidth roof of about 3.2 GByte/s.

KW - Acceleration

KW - CNN

KW - Co-design

KW - FPGA

KW - PYNQ-Z1

UR - http://www.scopus.com/inward/record.url?scp=85123030584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85123030584&partnerID=8YFLogxK

U2 - 10.1016/j.compeleceng.2021.107671

DO - 10.1016/j.compeleceng.2021.107671

M3 - Article

AN - SCOPUS:85123030584

SN - 0045-7906

VL - 98

SP - 107671

JO - Computers and Electrical Engineering

JF - Computers and Electrical Engineering

M1 - 107671

ER -

Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application

Abstract

Keywords

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this