Skip to main content

Showing 1–49 of 49 results for author: Conti, F

  1. arXiv:2407.03111  [pdf, other

    cs.NE cs.AI cs.ET cs.LG

    Compressed Latent Replays for Lightweight Continual Learning on Spiking Neural Networks

    Authors: Alberto Dequino, Alessio Carpegna, Davide Nadalini, Alessandro Savino, Luca Benini, Stefano Di Carlo, Francesco Conti

    Abstract: Rehearsal-based Continual Learning (CL) has been intensely investigated in Deep Neural Networks (DNNs). However, its application in Spiking Neural Networks (SNNs) has not been explored in depth. In this paper we introduce the first memory-efficient implementation of Latent Replay (LR)-based CL for SNNs, designed to seamlessly integrate with resource-constrained devices. LRs combine new samples wit… ▽ More

    Submitted 4 July, 2024; v1 submitted 8 May, 2024; originally announced July 2024.

  2. HTVM: Efficient Neural Network Deployment On Heterogeneous TinyML Platforms

    Authors: Josse Van Delm, Maarten Vandersteegen, Alessio Burrello, Giuseppe Maria Sarda, Francesco Conti, Daniele Jahier Pagliari, Luca Benini, Marian Verhelst

    Abstract: Optimal deployment of deep neural networks (DNNs) on state-of-the-art Systems-on-Chips (SoCs) is crucial for tiny machine learning (TinyML) at the edge. The complexity of these SoCs makes deployment non-trivial, as they typically contain multiple heterogeneous compute cores with limited, programmer-managed memory to optimize latency and energy efficiency. We propose HTVM - a compiler that merges T… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Presented at DAC2023. Open-source code is available at https://github.com/KULeuven-MICAS/htvm

    ACM Class: D.3.4

    Journal ref: 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2023, pp. 1-6

  3. arXiv:2404.11488  [pdf, other

    cs.CV cs.AI

    Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

    Authors: Luca Bompani, Manuele Rusci, Daniele Palossi, Francesco Conti, Luca Benini

    Abstract: This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors. This method reduces the average compute load of an off-the-shelf Deep Neural Network (DNN) based object detector by up to 2.25$\times$ by alternating the processing of high-resolution images (320$\times$320 pixels) with multiple down-sized fr… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 figures Accepted for publication at the Embedded Vision Workshop of the Computer Vision and Pattern Recognition conference, Seattle, 2024

    ACM Class: I.4

  4. arXiv:2404.02945  [pdf, other

    cs.LG cs.AI cs.DC cs.PF

    Optimizing the Deployment of Tiny Transformers on Low-Power MCUs

    Authors: Victor J. B. Jung, Alessio Burrello, Moritz Scherer, Francesco Conti, Luca Benini

    Abstract: Transformer networks are rapidly becoming SotA in many fields, such as NLP and CV. Similarly to CNN, there is a strong push for deploying Transformer models at the extreme edge, ultimately fitting the tiny power budget and memory footprint of MCUs. However, the early approaches in this direction are mostly ad-hoc, platform, and model-specific. This work aims to enable and optimize the flexible, mu… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Pre-print manuscript submitted for review to the IEEE Transactions on Computers

  5. arXiv:2403.11661  [pdf, other

    cs.RO eess.SY

    Combining Local and Global Perception for Autonomous Navigation on Nano-UAVs

    Authors: Lorenzo Lamberti, Georg Rutishauser, Francesco Conti, Luca Benini

    Abstract: A critical challenge in deploying unmanned aerial vehicles (UAVs) for autonomous tasks is their ability to navigate in an unknown environment. This paper introduces a novel vision-depth fusion approach for autonomous navigation on nano-UAVs. We combine the visual-based PULP-Dronet convolutional neural network for semantic information extraction, i.e., serving as the global perception, with 8x8px d… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 5 pages, 2 figures, 1 table, 1 video

  6. arXiv:2312.14750  [pdf, other

    cs.AR

    Siracusa: A 16 nm Heterogenous RISC-V SoC for Extended Reality with At-MRAM Neural Engine

    Authors: Arpan Suravi Prasad, Moritz Scherer, Francesco Conti, Davide Rossi, Alfio Di Mauro, Manuel Eggimann, Jorge Tómas Gómez, Ziyun Li, Syed Shakib Sarwar, Zhao Wang, Barbara De Salvo, Luca Benini

    Abstract: Extended reality (XR) applications are Machine Learning (ML)-intensive, featuring deep neural networks (DNNs) with millions of weights, tightly latency-bound (10-20 ms end-to-end), and power-constrained (low tens of mW average power). While ML performance and efficiency can be achieved by introducing neural engines within low-power systems-on-chip (SoCs), system-level power for nontrivial DNNs dep… ▽ More

    Submitted 14 April, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Final accepted manuscript pre-print submitted to the IEEE Journal of Solid-State Circuits

  7. arXiv:2312.08991  [pdf, other

    cs.RO eess.IV eess.SY

    A Sim-to-Real Deep Learning-based Framework for Autonomous Nano-drone Racing

    Authors: Lorenzo Lamberti, Elia Cereda, Gabriele Abbate, Lorenzo Bellone, Victor Javier Kartsch Morinigo, Michał Barcis, Agata Barcis, Alessandro Giusti, Francesco Conti, Daniele Palossi

    Abstract: Autonomous drone racing competitions are a proxy to improve unmanned aerial vehicles' perception, planning, and control skills. The recent emergence of autonomous nano-sized drone racing imposes new challenges, as their ~10cm form factor heavily restricts the resources available onboard, including memory, computation, and sensors. This paper describes the methodology and technical implementation o… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 8 pages, 10 Figures, 3 Tables, This paper has been accepted for publication in the IEEE Robotics and Automation Letters (RAL). Copyright 2023 IEEE

    Journal ref: IEEE Robotics and Automation Letters (Volume: 9, Issue: 2, February 2024)

  8. arXiv:2311.17815  [pdf, other

    cs.AR cs.AI

    A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

    Authors: Fabrizio Ferrandi, Serena Curzel, Leandro Fiorin, Daniele Ielmini, Cristina Silvano, Francesco Conti, Alessio Burrello, Francesco Barchi, Luca Benini, Luciano Lavagno, Teodoro Urso, Enrico Calore, Sebastiano Fabio Schifano, Cristian Zambelli, Maurizio Palesi, Giuseppe Ascia, Enrico Russo, Nicola Petra, Davide De Caro, Gennaro Di Meo, Valeria Cardellini, Salvatore Filippone, Francesco Lo Presti, Francesco Silvestri, Paolo Palazzari , et al. (1 additional authors not shown)

    Abstract: In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  9. A Topological Machine Learning Pipeline for Classification

    Authors: Francesco Conti, Davide Moroni, Maria Antonietta Pascali

    Abstract: In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. The development of such a topological pipeline for Machine Learning involves two crucial steps that strongly affect its performance: firstly… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    MSC Class: 55N31; 62R40; 68T05

    Journal ref: Mathematics 2022, 10(17), 3086

  10. arXiv:2309.03664  [pdf, other

    cs.LG

    Alzheimer Disease Detection from Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning

    Authors: Francesco Conti, Martina Banchelli, Valentina Bessi, Cristina Cecchi, Fabrizio Chiti, Sara Colantonio, Cristiano D'Andrea, Marella de Angelis, Davide Moroni, Benedetta Nacmias, Maria Antonietta Pascali, Sandro Sorbi, Paolo Matteini

    Abstract: The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer's disease (AD) as well as of 5 pathological controls have been collected and analysed by Raman spectroscopy (RS). We investigated whether the raw and preprocessed Raman spectra could be used to distinguish AD from controls. First, we applied standard Machine Learning (ML) methods obtaining unsatisfactory re… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepter for inclusion in AITA 2023 (http://aita.isti.cnr.it/)

    MSC Class: 55N31 (primary); 55N35 (secondary) ACM Class: I.2.6; I.5

  11. arXiv:2307.02894  [pdf, other

    cs.LG

    Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

    Authors: Georg Rutishauser, Francesco Conti, Luca Benini

    Abstract: Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hyb… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: Accepted for publication at the 2023 IEEE International Conference of Artificial Intelligence Circuits and Systems (AICAS '23)

  12. arXiv:2307.01056  [pdf, other

    cs.AR

    A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks

    Authors: Alessandro Nadalini, Georg Rutishauser, Alessio Burrello, Nazareno Bruschi, Angelo Garofalo, Luca Benini, Francesco Conti, Davide Rossi

    Abstract: The emerging trend of deploying complex algorithms, such as Deep Neural Networks (DNNs), increasingly poses strict memory and energy efficiency requirements on Internet-of-Things (IoT) end-nodes. Mixed-precision quantization has been proposed as a technique to minimize a DNN's memory footprint and maximize its execution efficiency, with negligible end-to-end precision degradation. In this work, we… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  13. arXiv:2306.15552  [pdf, other

    cs.AR cs.ET cs.LG

    A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms

    Authors: Cristina Silvano, Daniele Ielmini, Fabrizio Ferrandi, Leandro Fiorin, Serena Curzel, Luca Benini, Francesco Conti, Angelo Garofalo, Cristian Zambelli, Enrico Calore, Sebastiano Fabio Schifano, Maurizio Palesi, Giuseppe Ascia, Davide Patti, Nicola Petra, Davide De Caro, Luciano Lavagno, Teodoro Urso, Valeria Cardellini, Gian Carlo Cardarilli, Robert Birke, Stefania Perri

    Abstract: Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent advances in designing DL accelerators suitable to reach the performance requirements of HPC applications. In par… ▽ More

    Submitted 12 July, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: Preprint version of our manuscript submitted to the journal @ ACM CSUR (58 pages including Appendix) on June 22nd, 2023. Major revision submitted on July 12th, 2024

  14. arXiv:2305.19167  [pdf, other

    cs.LG cs.AI cs.DC

    Reduced Precision Floating-Point Optimization for Deep Neural Network On-Device Learning on MicroControllers

    Authors: Davide Nadalini, Manuele Rusci, Luca Benini, Francesco Conti

    Abstract: Enabling On-Device Learning (ODL) for Ultra-Low-Power Micro-Controller Units (MCUs) is a key step for post-deployment adaptation and fine-tuning of Deep Neural Network (DNN) models in future TinyML applications. This paper tackles this challenge by introducing a novel reduced precision optimization technique for ODL primitives on MCU-class devices, leveraging the State-of-Art advancements in RISC-… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Pre-print version submitted to Elsevier's Future Generation Computer Systems journal. For the associated open-source release, see https://github.com/pulp-platform/pulp-trainlib

  15. Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC with 2-to-8b DNN Acceleration and 30%-Boost Adaptive Body Biasing

    Authors: Francesco Conti, Gianna Paulin, Angelo Garofalo, Davide Rossi, Alfio Di Mauro, Georg Rutishauser, Gianmarco Ottavi, Manuel Eggimann, Hayate Okuhara, Luca Benini

    Abstract: Emerging Artificial Intelligence-enabled Internet-of-Things (AI-IoT) System-on-a-Chip (SoC) for augmented reality, personalized healthcare, and nano-robotics need to run many diverse tasks within a power envelope of a few tens of mW over a wide range of operating conditions: compute-intensive but strongly quantized Deep Neural Network (DNN) inference, as well as signal processing and control requi… ▽ More

    Submitted 28 November, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: Post-print accepted by IEEE Journal of Solid-State Circuits. Fixed metadata (was missing one co-author), added DOI of IEEE JSSC

  16. arXiv:2305.07325  [pdf, other

    cs.AR

    Echoes: a 200 GOPS/W Frequency Domain SoC with FFT Processor and I2S DSP for Flexible Data Acquisition from Microphone Arrays

    Authors: Mattia Sinigaglia, Luca Bertaccini, Luca Valente, Angelo Garofalo, Simone Benatti, Luca Benini, Francesco Conti, Davide Rossi

    Abstract: Emerging applications in the IoT domain require ultra-low-power and high-performance end-nodes to deal with complex near-sensor-data analytics. Domains such as audio, radar, and Structural Health Monitoring require many computations to be performed in the frequency domain rather than in the time domain. We present ECHOES, a System-On-a-Chip (SoC) composed of a RISC-V core enhanced with fixed and f… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  17. DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

    Authors: Angelo Garofalo, Yvan Tortorella, Matteo Perotti, Luca Valente, Alessandro Nadalini, Luca Benini, Davide Rossi, Francesco Conti

    Abstract: On-chip DNN inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a System-on-Chip with a heterogeneous cluster of 8 RISC-V… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: 11 pages, 15 figures

  18. arXiv:2303.08706  [pdf, other

    eess.SY cs.AR

    Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space

    Authors: Michael Rogenmoser, Yvan Tortorella, Davide Rossi, Francesco Conti, Luca Benini

    Abstract: Space Cyber-Physical Systems (S-CPS) such as spacecraft and satellites strongly rely on the reliability of onboard computers to guarantee the success of their missions. Relying solely on radiation-hardened technologies is extremely expensive, and developing inflexible architectural and microarchitectural modifications to introduce modular redundancy within a system leads to significant area increa… ▽ More

    Submitted 14 November, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

  19. Lightweight Neural Architecture Search for Temporal Convolutional Networks at the Edge

    Authors: Matteo Risso, Alessio Burrello, Francesco Conti, Lorenzo Lamberti, Yukai Chen, Luca Benini, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari

    Abstract: Neural Architecture Search (NAS) is quickly becoming the go-to approach to optimize the structure of Deep Learning (DL) models for complex tasks such as Image Classification or Object Detection. However, many other relevant applications of DL, especially at the edge, are based on time-series processing and require models with unique features, for which NAS is less explored. This work focuses in pa… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: Accepted for publication at the IEEE Transactions on Computers

  20. arXiv:2301.03904  [pdf, other

    cs.AR cs.AI cs.LG

    RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration

    Authors: Yvan Tortorella, Luca Bertaccini, Luca Benini, Davide Rossi, Francesco Conti

    Abstract: The increasing interest in TinyML, i.e., near-sensor machine learning on power budgets of a few tens of mW, is currently pushing toward enabling TinyML-class training as opposed to inference only. Current training algorithms, based on various forms of error and gradient backpropagation, rely on floating-point matrix operations to meet the precision and dynamic range requirements. So far, the energ… ▽ More

    Submitted 6 May, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

  21. arXiv:2211.12877  [pdf, other

    cs.DC

    End-to-End DNN Inference on a Massively Parallel Analog In Memory Computing Architecture

    Authors: Nazareno Bruschi, Giuseppe Tagliavini, Angelo Garofalo, Francesco Conti, Irem Boybat, Luca Benini, Davide Rossi

    Abstract: The demand for computation resources and energy efficiency of Convolutional Neural Networks (CNN) applications requires a new paradigm to overcome the "Memory Wall". Analog In-Memory Computing (AIMC) is a promising paradigm since it performs matrix-vector multiplications, the critical kernel of many ML applications, in-place in the analog domain within memory arrays structured as crossbars of memo… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  22. arXiv:2206.04796  [pdf, other

    cs.AR eess.SY

    Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

    Authors: Nazareno Bruschi, Giuseppe Tagliavini, Francesco Conti, Sergi Abadal, Alberto Cabellos-Aparicio, Eduard Alarcón, Geethan Karunaratne, Irem Boybat, Luca Benini, Davide Rossi

    Abstract: Analog In-Memory Computing (AIMC) is emerging as a disruptive paradigm for heterogeneous computing, potentially delivering orders of magnitude better peak performance and efficiency over traditional digital signal processing architectures on Matrix-Vector multiplication. However, to sustain this throughput in real-world applications, AIMC tiles must be supplied with data at very high bandwidth and… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

  23. arXiv:2204.11192  [pdf, other

    cs.AR

    RedMulE: A Compact FP16 Matrix-Multiplication Accelerator for Adaptive Deep Learning on RISC-V-Based Ultra-Low-Power SoCs

    Authors: Yvan Tortorella, Luca Bertaccini, Davide Rossi, Luca Benini, Francesco Conti

    Abstract: The fast proliferation of extreme-edge applications using Deep Learning (DL) based algorithms required dedicated hardware to satisfy extreme-edge applications' latency, throughput, and precision requirements. While inference is achievable in practical cases, online finetuning and adaptation of general DL models are still highly challenging. One of the key stumbling stones is the need for parallel… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

  24. arXiv:2204.10687  [pdf, other

    cs.AR

    SNE: an Energy-Proportional Digital Accelerator for Sparse Event-Based Convolutions

    Authors: Alfio Di Mauro, Arpan Suravi Prasad, Zhikai Huang, Matteo Spallanzani, Francesco Conti, Luca Benini

    Abstract: Event-based sensors are drawing increasing attention due to their high temporal resolution, low power consumption, and low bandwidth. To efficiently extract semantically meaningful information from sparse data streams produced by such sensors, we present a 4.5TOP/s/W digital accelerator capable of performing 4-bits-quantized event-based convolutional neural networks (eCNN). Compared to standard co… ▽ More

    Submitted 29 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: Accepted at DATE22

  25. Pruning In Time (PIT): A Lightweight Network Architecture Optimizer for Temporal Convolutional Networks

    Authors: Matteo Risso, Alessio Burrello, Daniele Jahier Pagliari, Francesco Conti, Lorenzo Lamberti, Enrico Macii, Luca Benini, Massimo Poncino

    Abstract: Temporal Convolutional Networks (TCNs) are promising Deep Learning models for time-series processing tasks. One key feature of TCNs is time-dilated convolution, whose optimization requires extensive experimentation. We propose an automatic dilation optimizer, which tackles the problem as a weight pruning on the time-axis, and learns dilation factors together with weights, in a single training. Our… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Journal ref: 2021 58th ACM/IEEE Design Automation Conference (DAC), 2021, pp. 1015-1020

  26. TCN Mapping Optimization for Ultra-Low Power Time-Series Edge Inference

    Authors: Alessio Burrello, Alberto Dequino, Daniele Jahier Pagliari, Francesco Conti, Marcello Zanghieri, Enrico Macii, Luca Benini, Massimo Poncino

    Abstract: Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series analysis. We introduce an automated exploration approach and a library of optimized kernels to map TCNs on Parallel Ultra-Low Power (PULP) microcontrollers. Our approach minimizes latency and energy by exploiting a layer tiling optimizer to jointly find the tiling dimensions and select among altern… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Journal ref: 2021 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)

  27. arXiv:2202.07462  [pdf, other

    cs.LG cs.AI cs.AR cs.DC cs.NE eess.SP

    Vau da muntanialas: Energy-efficient multi-die scalable acceleration of RNN inference

    Authors: Gianna Paulin, Francesco Conti, Lukas Cavigelli, Luca Benini

    Abstract: Recurrent neural networks such as Long Short-Term Memories (LSTMs) learn temporal dependencies by keeping an internal state, making them ideal for time-series problems such as speech recognition. However, the output-to-input feedback creates distinctive memory bandwidth and scalability challenges in designing accelerators for RNNs. We present Muntaniala, an RNN accelerator architecture for LSTM in… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    Journal ref: IEEE Transactions on Circuits and Systems I: Regular Papers, IEEE, Volume: 69, Issue: 1, January 2022, Date of Publication (Early Access) 30 July 2021

  28. Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster with 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

    Authors: Gianmarco Ottavi, Angelo Garofalo, Giuseppe Tagliavini, Francesco Conti, Alfio Di Mauro, Luca Benini, Davide Rossi

    Abstract: Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices poses several challenges related to memory footprint, computational throughput, and energy efficiency. Low-bitwidth and mixed-precision arithmetic have been proven to be valid strateg… ▽ More

    Submitted 16 March, 2023; v1 submitted 21 January, 2022; originally announced January 2022.

    Comments: 13 pages, 17 figures, 2 tables, Journal

  29. arXiv:2201.01089  [pdf, other

    cs.AR cs.DC cs.LG cs.NE

    A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks

    Authors: Angelo Garofalo, Gianmarco Ottavi, Francesco Conti, Geethan Karunaratne, Irem Boybat, Luca Benini, Davide Rossi

    Abstract: Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency. Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference and serves as on-chip memory storage for DNN weights. However, IMC's functional flexibility limitations and their impact on performance… ▽ More

    Submitted 4 January, 2022; originally announced January 2022.

    Comments: 14 pages (not including final biography page), 13 figures (excluded authors pictures)

  30. A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays

    Authors: Leonardo Ravaglia, Manuele Rusci, Davide Nadalini, Alessandro Capotondi, Francesco Conti, Luca Benini

    Abstract: In the last few years, research and development on Deep Learning models and techniques for ultra-low-power devices in a word, TinyML has mainly focused on a train-then-deploy assumption, with static models that cannot be adapted to newly collected data without cloud-based data collection and fine-tuning. Latent Replay-based Continual Learning (CL) techniques[1] enable online, serverless adaptation… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: 14 pages

    Journal ref: IEEE Journal on Emerging and Selected Topics in Circuits and Systems 11.4 (2021), 789-802

  31. Vega: A 10-Core SoC for IoT End-Nodes with DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode

    Authors: Davide Rossi, Francesco Conti, Manuel Eggimann, Alfio Di Mauro, Giuseppe Tagliavini, Stefan Mach, Marco Guermandi, Antonio Pullini, Igor Loi, Jie Chen, Eric Flamand, Luca Benini

    Abstract: The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for a long battery lifetime, as well as high performance, energy efficiency, and extreme flexibility to deal with complex and fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT end-node SoC capable of scaling from a 1.7 $\mathrmμ$W fully retentive cognitive sleep mode up to 32.2 GOPS (@… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: 13 pages, 11 figures, 8 tables, journal paper

  32. End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

    Authors: Gianmarco Ottavi, Geethan Karunaratne, Francesco Conti, Irem Boybat, Luca Benini, Davide Rossi

    Abstract: In-Memory Acceleration (IMA) promises major efficiency improvements in deep neural network (DNN) inference, but challenges remain in the integration of IMA within a digital system. We propose a heterogeneous architecture coupling 8 RISC-V cores with an IMA in a shared-memory cluster, analyzing the benefits and trade-offs of in-memory computing on the realistic use case of a MobileNetV2 bottleneck… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Comments: 4 pages,6 figures, conference

    Journal ref: 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)

  33. arXiv:2103.10873  [pdf, other

    cs.RO eess.SY

    Fully Onboard AI-powered Human-Drone Pose Estimation on Ultra-low Power Autonomous Flying Nano-UAVs

    Authors: Daniele Palossi, Nicky Zimmerman, Alessio Burrello, Francesco Conti, Hanna Müller, Luca Maria Gambardella, Luca Benini, Alessandro Giusti, Jérôme Guzzi

    Abstract: Artificial intelligence-powered pocket-sized air robots have the potential to revolutionize the Internet-of-Things ecosystem, acting as autonomous, unobtrusive, and ubiquitous smart sensors. With a few cm$^{2}$ form-factor, nano-sized unmanned aerial vehicles (UAVs) are the natural befit for indoor human-drone interaction missions, as the pose estimation task we address in this work. However, this… ▽ More

    Submitted 19 March, 2021; originally announced March 2021.

    Comments: 15 pages, 15 figures, 4 tables. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  34. arXiv:2011.14325  [pdf, other

    cs.AR

    XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Network on RISC-V based IoT End Nodes

    Authors: Angelo Garofalo, Giuseppe Tagliavini, Francesco Conti, Luca Benini, Davide Rossi

    Abstract: This work introduces lightweight extensions to the RISC-V ISA to boost the efficiency of heavily Quantized Neural Network (QNN) inference on microcontroller-class cores. By extending the ISA with nibble (4-bit) and crumb (2-bit) SIMD instructions, we are able to show near-linear speedup with respect to higher precision integer computation on the key kernels for QNN computation. Also, we propose a… ▽ More

    Submitted 29 November, 2020; originally announced November 2020.

    Comments: 16 pages, 17 figures

  35. Graphene-based Wireless Agile Interconnects for Massive Heterogeneous Multi-chip Processors

    Authors: Sergi Abadal, Robert Guirado, Hamidreza Taghvaee, Akshay Jain, Elana Pereira de Santana, Peter Haring Bolívar, Mohamed Saeed, Renato Negra, Zhenxing Wang, Kun-Ta Wang, Max C. Lemme, Joshua Klein, Marina Zapater, Alexandre Levisse, David Atienza, Davide Rossi, Francesco Conti, Martino Dazzi, Geethan Karunaratne, Irem Boybat, Abu Sebastian

    Abstract: The main design principles in computer architecture have recently shifted from a monolithic scaling-driven approach to the development of heterogeneous architectures that tightly co-integrate multiple specialized processor and memory chiplets. In such data-hungry multi-chip architectures, current Networks-in-Package (NiPs) may not be enough to cater to their heterogeneous and fast-changing communi… ▽ More

    Submitted 21 September, 2023; v1 submitted 8 November, 2020; originally announced November 2020.

    Comments: 8 pages, 4 figures, 1 table

    Journal ref: IEEE Wireless Communications Magazine, vol. 30, no. 4, pp. 162-169, 2023

  36. arXiv:2010.04073  [pdf, other

    cs.AR

    A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference

    Authors: Gianmarco Ottavi, Angelo Garofalo, Giuseppe Tagliavini, Francesco Conti, Luca Benini, Davide Rossi

    Abstract: Low bit-width Quantized Neural Networks (QNNs) enable deployment of complex machine learning models on constrained devices such as microcontrollers (MCUs) by reducing their memory footprint. Fine-grained asymmetric quantization (i.e., different bit-widths assigned to weights and activations on a tensor-by-tensor basis) is a particularly interesting scheme to maximize accuracy under a tight memory… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: 6 pages, 6 figures, 2 tables, conference

  37. arXiv:2008.07127  [pdf, other

    cs.DC cs.AR cs.NE

    DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs

    Authors: Alessio Burrello, Angelo Garofalo, Nazareno Bruschi, Giuseppe Tagliavini, Davide Rossi, Francesco Conti

    Abstract: The deployment of Deep Neural Networks (DNNs) on end-nodes at the extreme edge of the Internet-of-Things is a critical enabler to support pervasive Deep Learning-enhanced applications. Low-Cost MCU-based end-nodes have limited on-chip memory and often replace caches with scratchpads, to reduce area overheads and increase energy efficiency -- requiring explicit DMA-based memory transfers between di… ▽ More

    Submitted 19 March, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

    Comments: 14 pages, 12 figures, 4 tables, 2 listings. Accepted for publication in IEEE Transactions on Computers (https://ieeexplore.ieee.org/document/9381618)

  38. arXiv:2007.13631  [pdf, other

    cs.DC eess.SP

    Memory-Latency-Accuracy Trade-offs for Continual Learning on a RISC-V Extreme-Edge Node

    Authors: Leonardo Ravaglia, Manuele Rusci, Alessandro Capotondi, Francesco Conti, Lorenzo Pellegrini, Vincenzo Lomonaco, Davide Maltoni, Luca Benini

    Abstract: AI-powered edge devices currently lack the ability to adapt their embedded inference models to the ever-changing environment. To tackle this issue, Continual Learning (CL) strategies aim at incrementally improving the decision capabilities based on newly acquired data. In this work, after quantifying memory and computational requirements of CL algorithms, we define a novel HW/SW extreme-edge platf… ▽ More

    Submitted 22 July, 2020; originally announced July 2020.

    Comments: 6 pages, 5 figures, conference

  39. arXiv:2007.08952  [pdf, other

    cs.AR cs.LG eess.SP

    Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node

    Authors: Alfio Di Mauro, Francesco Conti, Pasquale Davide Schiavone, Davide Rossi, Luca Benini

    Abstract: Binary Neural Networks (BNNs) have been shown to be robust to random bit-level noise, making aggressive voltage scaling attractive as a power-saving technique for both logic and SRAMs. In this work, we introduce the first fully programmable IoT end-node system-on-chip (SoC) capable of executing software-defined, hardware-accelerated BNNs at ultra-low voltage. Our SoC exploits a hybrid memory schem… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Submitted to ISICAS2020 journal special issue

  40. Enabling Mixed-Precision Quantized Neural Networks in Extreme-Edge Devices

    Authors: Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, Davide Rossi

    Abstract: The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA). As such, recent research proposed optimized libraries for QNNs (from 8-bit to 2-bit) such as CMSIS-NN and PULP-NN. This work presents an extension to the PULP-NN library targeting the accelera… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: 4 pages, 6 figures, published in 17th ACM International Conference on Computing Frontiers (CF '20), May 11--13, 2020, Catania, Italy

  41. arXiv:2004.05930  [pdf, ps, other

    cs.LG cs.AI cs.NE stat.ML

    Technical Report: NEMO DNN Quantization for Deployment Model

    Authors: Francesco Conti

    Abstract: This technical report aims at defining a formal framework for Deep Neural Network (DNN) layer-wise quantization, focusing in particular on the problems related to the final deployment. It also acts as a documentation for the NEMO (NEural Minimization for pytOrch) framework. It describes the four DNN representations used in NEMO (FullPrecision, FakeQuantized, QuantizedDeployable and IntegerDeployab… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Comments: 12 pages, technical report

  42. PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors

    Authors: Angelo Garofalo, Manuele Rusci, Francesco Conti, Davide Rossi, Luca Benini

    Abstract: We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors. The key innovation in PULP-NN is a set of kernels for Quantized Neural Network (QNN) inference, targeting byte and sub-byte data types, down to INT-1, tuned for the recent trend toward aggressive quantization in deep neural network inference. The proposed library exploits… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

    Comments: 13 pages, 11 figures, 2 tables

  43. arXiv:1905.04166  [pdf, other

    cs.RO cs.LG eess.SP

    An Open Source and Open Hardware Deep Learning-powered Visual Navigation Engine for Autonomous Nano-UAVs

    Authors: Daniele Palossi, Francesco Conti, Luca Benini

    Abstract: Nano-size unmanned aerial vehicles (UAVs), with few centimeters of diameter and sub-10 Watts of total power budget, have so far been considered incapable of running sophisticated visual-based autonomous navigation software without external aid from base-stations, ad-hoc local positioning infrastructure, and powerful external computation servers. In this work, we present what is, to the best of our… ▽ More

    Submitted 10 May, 2019; originally announced May 2019.

    Comments: Accepted for publication in Proceeding of International Conference on Distributed Computing in Sensor Systems (DCOSS 2019). arXiv admin note: text overlap with arXiv:1805.01831

  44. arXiv:1902.01492  [pdf, other

    cs.NE

    Optimally Scheduling CNN Convolutions for Efficient Memory Access

    Authors: Arthur Stoutchinin, Francesco Conti, Luca Benini

    Abstract: Embedded inference engines for convolutional networks must be parsimonious in memory bandwidth and buffer sizing to meet power and cost constraints. We present an analytical memory bandwidth model for loop-nest optimization targeting architectures with application managed buffers. We applied this model to optimize the CNN convolution loop-nest. We show that our model is more accurate than previous… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  45. XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference

    Authors: Francesco Conti, Pasquale Davide Schiavone, Luca Benini

    Abstract: Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid SRAM / sta… ▽ More

    Submitted 9 July, 2018; originally announced July 2018.

    Comments: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issue

  46. arXiv:1805.01831  [pdf, other

    cs.RO cs.AI cs.NE eess.SP

    A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones

    Authors: Daniele Palossi, Antonio Loquercio, Francesco Conti, Eric Flamand, Davide Scaramuzza, Luca Benini

    Abstract: Fully-autonomous miniaturized robots (e.g., drones), with artificial intelligence (AI) based visual navigation capabilities are extremely challenging drivers of Internet-of-Things edge intelligence capabilities. Visual navigation based on AI approaches, such as deep neural networks (DNNs) are becoming pervasive for standard-size drones, but are considered out of reach for nanodrones with size of a… ▽ More

    Submitted 14 May, 2019; v1 submitted 4 May, 2018; originally announced May 2018.

    Comments: 15 pages, 13 figures, 5 tables, 2 listings, accepted for publication in the IEEE Internet of Things Journal (IEEE IOTJ)

  47. arXiv:1712.00994  [pdf, other

    cs.NE cs.AR cs.DC

    NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs

    Authors: Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, Luca Benini

    Abstract: Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition. However, their computational load is significant, motivating the development of CNN-specialized accelerators. This work presents NEURAghe, a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs. NEURAg… ▽ More

    Submitted 4 December, 2017; originally announced December 2017.

    Comments: 22 pages, 14 figures, submitted to ACM Transactions on Reconfigurable Technology and Systems

    Journal ref: ACM Transactions on Reconfigurable Technology and Systems, Vol. 11 No. 3 (2018), Article 18

  48. arXiv:1711.05734  [pdf, other

    cs.DC cs.LG cs.NE cs.SD

    Chipmunk: A Systolically Scalable 0.9 mm${}^2$, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference

    Authors: Francesco Conti, Lukas Cavigelli, Gianna Paulin, Igor Susmelj, Luca Benini

    Abstract: Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a small (<1 mm${}^2$) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capab… ▽ More

    Submitted 20 February, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

  49. arXiv:1612.05974  [pdf, other

    cs.AR cs.CR cs.LG cs.NE

    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

    Authors: Francesco Conti, Robert Schilling, Pasquale Davide Schiavone, Antonio Pullini, Davide Rossi, Frank Kagan Gürkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain Haugou, Stefan Mangard, Luca Benini

    Abstract: Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data securi… ▽ More

    Submitted 23 April, 2017; v1 submitted 18 December, 2016; originally announced December 2016.

    Comments: 15 pages, 12 figures, accepted for publication to the IEEE Transactions on Circuits and Systems - I: Regular Papers