Skip to main content

Showing 1–50 of 269 results for author: Benini, L

  1. arXiv:2407.13706  [pdf, other

    cs.RO cs.CV eess.SP

    GAP9Shield: A 150GOPS AI-capable Ultra-low Power Module for Vision and Ranging Applications on Nano-drones

    Authors: Hanna Müller, Victor Kartsch, Luca Benini

    Abstract: The evolution of AI and digital signal processing technologies, combined with affordable energy-efficient processors, has propelled the development of both hardware and software for drone applications. Nano-drones, which fit into the palm of the hand, are suitable for indoor environments and safe for human interaction; however, they often fail to deliver the required performance for complex tasks… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: This work has been accepted for publication at the European Robotics Forum 2024

  2. arXiv:2407.05938  [pdf, other

    physics.ins-det cs.AR hep-ex

    Design and Experimental Investigation of Trikarenos: A Fault-Tolerant 28nm RISC-V-based SoC

    Authors: Michael Rogenmoser, Philip Wiese, Bruno Endres Forlin, Frank K. Gürkaynak, Paolo Rech, Alessandra Menicucci, Marco Ottavi, Luca Benini

    Abstract: We present a fault-tolerant by-design RISC-V SoC and experimentally assess it under atmospheric neutrons and 200 MeV protons. The dedicated ECC and Triple-Core Lockstep countermeasures correct most errors, guaranteeing a device cross-section lower than $5.36 \times 10^{-12}$ cm$^2$.

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 4 pages (excluding title page), accepted at RADECS 2024

  3. arXiv:2407.05447  [pdf, other

    cs.AR

    Spatzformer: An Efficient Reconfigurable Dual-Core RISC-V V Cluster for Mixed Scalar-Vector Workloads

    Authors: Matteo Perotti, Michele Raeber, Mattia Sinigaglia, Matheus Cavalcante, Davide Rossi, Luca Benini

    Abstract: Multi-core vector processor architectures excel in handling computationally intensive vectorizable tasks but struggle to achieve optimal resource utilization when facing sequential and control tasks that cannot be vectorized. This work presents Spatzformer, the first reconfigurable RISC-V V (RVV) architecture developed from a baseline open-source dual-core cluster based on Snitch scalar cores augm… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: To be published in the 2024 IEEE 35th International Conference on Application Specific Systems (ASAP), Architectures and Processors

  4. arXiv:2407.03136  [pdf

    cs.RO

    Ultra-Lightweight Collaborative Mapping for Robot Swarms

    Authors: Vlad Niculescu, Tommaso Polonelli, Michele Magno, Luca Benini

    Abstract: A key requirement in robotics is the ability to simultaneously self-localize and map a previously unknown environment, relying primarily on onboard sensing and computation. Achieving fully onboard accurate simultaneous localization and mapping (SLAM) is feasible for high-end robotic platforms, whereas small and inexpensive robots face challenges due to constrained hardware, therefore frequently re… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 20 pages, 7 figures

  5. arXiv:2407.03111  [pdf, other

    cs.NE cs.AI cs.ET cs.LG

    Compressed Latent Replays for Lightweight Continual Learning on Spiking Neural Networks

    Authors: Alberto Dequino, Alessio Carpegna, Davide Nadalini, Alessandro Savino, Luca Benini, Stefano Di Carlo, Francesco Conti

    Abstract: Rehearsal-based Continual Learning (CL) has been intensely investigated in Deep Neural Networks (DNNs). However, its application in Spiking Neural Networks (SNNs) has not been explored in depth. In this paper we introduce the first memory-efficient implementation of Latent Replay (LR)-based CL for SNNs, designed to seamlessly integrate with resource-constrained devices. LRs combine new samples wit… ▽ More

    Submitted 4 July, 2024; v1 submitted 8 May, 2024; originally announced July 2024.

  6. arXiv:2407.02405  [pdf, other

    cs.RO cs.CV cs.LG eess.IV

    Tiny-PULP-Dronets: Squeezing Neural Networks for Faster and Lighter Inference on Multi-Tasking Autonomous Nano-Drones

    Authors: Lorenzo Lamberti, Vlad Niculescu, Michał Barcis, Lorenzo Bellone, Enrico Natalizio, Luca Benini, Daniele Palossi

    Abstract: Pocket-sized autonomous nano-drones can revolutionize many robotic use cases, such as visual inspection in narrow, constrained spaces, and ensure safer human-robot interaction due to their tiny form factor and weight -- i.e., tens of grams. This compelling vision is challenged by the high level of intelligence needed aboard, which clashes against the limited computational and storage resources ava… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 3 Figures, 1 table. Accepted for publication at IEEE Artificial Intelligence Circuits and Systems (AICAS), 2022

  7. arXiv:2406.19189  [pdf, other

    cs.LG cs.AI

    BISeizuRe: BERT-Inspired Seizure Data Representation to Improve Epilepsy Monitoring

    Authors: Luca Benfenati, Thorir Mar Ingolfsson, Andrea Cossettini, Daniele Jahier Pagliari, Alessio Burrello, Luca Benini

    Abstract: This study presents a novel approach for EEG-based seizure detection leveraging a BERT-based model. The model, BENDR, undergoes a two-phase training process. Initially, it is pre-trained on the extensive Temple University Hospital EEG Corpus (TUEG), a 1.5 TB dataset comprising over 10,000 subjects, to extract common EEG data patterns. Subsequently, the model is fine-tuned on the CHB-MIT Scalp EEG… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 4 pages, 2 tables, 2 figures

  8. arXiv:2406.15107  [pdf, other

    cs.AR

    Basilisk: An End-to-End Open-Source Linux-Capable RISC-V SoC in 130nm CMOS

    Authors: Paul Scheffler, Philippe Sauter, Thomas Benz, Frank K. Gürkaynak, Luca Benini

    Abstract: Open-source hardware (OSHW) is rapidly gaining traction in academia and industry. The availability of open RTL descriptions, EDA tools, and even PDKs enables a fully auditable supply chain for end-to-end (RTL to layout) open-source silicon, significantly strengthening security and transparency. Despite promising developments, existing OSHW efforts have so far fallen short of producing end-to-end o… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 3 pages, 4 figures. Accepted at SSH-SoC 2024 workshop

  9. arXiv:2406.15068  [pdf, other

    cs.AR

    Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET

    Authors: Gianna Paulin, Paul Scheffler, Thomas Benz, Matheus Cavalcante, Tim Fischer, Manuel Eggimann, Yichao Zhang, Nils Wistoff, Luca Bertaccini, Luca Colagrande, Gianmarco Ottavi, Frank K. Gürkaynak, Davide Rossi, Luca Benini

    Abstract: We present Occamy, a 432-core RISC-V dual-chiplet 2.5D system for efficient sparse linear algebra and stencil computations on FP64 and narrow (32-, 16-, 8-bit) SIMD FP data. Occamy features 48 clusters of RISC-V cores with custom extensions, two 64-bit host cores, and a latency-tolerant multi-chiplet interconnect and memory system with 32 GiB of HBM2E. It achieves leading-edge utilization on stenc… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 2 pages, 7 figures. Accepted at the 2024 IEEE Symposium on VLSI Technology & Circuits

  10. Low Latency Visual Inertial Odometry with On-Sensor Accelerated Optical Flow for Resource-Constrained UAVs

    Authors: Jonas Kühne, Michele Magno, Luca Benini

    Abstract: Visual Inertial Odometry (VIO) is the task of estimating the movement trajectory of an agent from an onboard camera stream fused with additional Inertial Measurement Unit (IMU) measurements. A crucial subtask within VIO is the tracking of features, which can be achieved through Optical Flow (OF). As the calculation of OF is a resource-demanding task in terms of computational load and memory footpr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: This article has been accepted for publication in the IEEE Sensors Journal (JSEN)

  11. HTVM: Efficient Neural Network Deployment On Heterogeneous TinyML Platforms

    Authors: Josse Van Delm, Maarten Vandersteegen, Alessio Burrello, Giuseppe Maria Sarda, Francesco Conti, Daniele Jahier Pagliari, Luca Benini, Marian Verhelst

    Abstract: Optimal deployment of deep neural networks (DNNs) on state-of-the-art Systems-on-Chips (SoCs) is crucial for tiny machine learning (TinyML) at the edge. The complexity of these SoCs makes deployment non-trivial, as they typically contain multiple heterogeneous compute cores with limited, programmer-managed memory to optimize latency and energy efficiency. We propose HTVM - a compiler that merges T… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Presented at DAC2023. Open-source code is available at https://github.com/KULeuven-MICAS/htvm

    ACM Class: D.3.4

    Journal ref: 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2023, pp. 1-6

  12. arXiv:2406.06546  [pdf, other

    cs.AR

    SentryCore: A RISC-V Co-Processor System for Safe, Real-Time Control Applications

    Authors: Michael Rogenmoser, Alessandro Ottaviano, Thomas Benz, Robert Balas, Matteo Perotti, Angelo Garofalo, Luca Benini

    Abstract: In the last decade, we have witnessed exponential growth in the complexity of control systems for safety-critical applications (automotive, robots, industrial automation) and their transition to heterogeneous mixed-criticality systems (MCSs). The growth of the RISC-V ecosystem is creating a major opportunity to develop open-source, vendor-neutral reference platforms for safety-critical computing.… ▽ More

    Submitted 16 May, 2024; originally announced June 2024.

    Comments: 2 pages, accepted at the RISC-V Summit Europe 2024

  13. A Gigabit, DMA-enhanced Open-Source Ethernet Controller for Mixed-Criticality Systems

    Authors: Chaoqun Liang, Alessandro Ottaviano, Thomas Benz, Mattia Sinigaglia, Luca Benini, Angelo Garofalo, Davide Rossi

    Abstract: The ongoing revolution in application domains targeting autonomous navigation, first and foremost automotive "zonalization", has increased the importance of certain off-chip communication interfaces, particularly Ethernet. The latter will play an essential role in next-generation vehicle architectures as the backbone connecting simultaneously and instantaneously the zonal/domain controllers. There… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 4 pages,4 figures, 21st ACM International Conference on Computing Frontiers Workshops and Special Sessions

  14. arXiv:2405.19284  [pdf, other

    cs.DC cs.AI cs.AR

    Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform

    Authors: Viviane Potocnik, Luca Colagrande, Tim Fischer, Luca Bertaccini, Daniele Jahier Pagliari, Alessio Burrello, Luca Benini

    Abstract: Transformer-based foundation models have become crucial for various domains, most notably natural language processing (NLP) or computer vision (CV). These models are predominantly deployed on high-performance GPUs or hardwired accelerators with highly customized, proprietary instruction sets. Until now, limited attention has been given to RISC-V-based general-purpose platforms. In our work, we pre… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 14 pages, 10 figures, 4 tables, IEEE Transactions on Circuits and Systems for Artificial Intelligence

    ACM Class: C.4; C.3; I.2

  15. arXiv:2405.19065  [pdf, other

    cs.AR cs.LG

    xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems

    Authors: Georg Rutishauser, Joan Mihali, Moritz Scherer, Luca Benini

    Abstract: Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at IEEE ASAP 2024

  16. arXiv:2405.18030  [pdf, other

    eess.SY cs.PF

    Modeling and Controlling Many-Core HPC Processors: an Alternative to PID and Moving Average Algorithms

    Authors: Giovanni Bambini, Alessandro Ottaviano, Christian Conficoni, Andrea Tilli, Luca Benini, Andrea Bartolini

    Abstract: The race towards performance increase and computing power has led to chips with heterogeneous and complex designs, integrating an ever-growing number of cores on the same monolithic chip or chiplet silicon die. Higher integration density, compounded with the slowdown of technology-driven power reduction, implies that power and thermal management become increasingly relevant. Unfortunately, existin… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Paper in Review

  17. arXiv:2405.14917  [pdf, other

    cs.LG cs.CL

    SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

    Authors: Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Xianglong Liu, Luca Benini, Michele Magno, Xiaojuan Qi

    Abstract: Large language models (LLMs) achieve remarkable performance in natural language understanding but require substantial computation and memory resources. Post-training quantization (PTQ) is a powerful compression technique extensively investigated in LLMs. However, existing PTQ methods are still not ideal in terms of accuracy and efficiency, especially with below 4 bit-widths. Standard PTQ methods u… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages

  18. TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios

    Authors: Yichao Zhang, Marco Bertuletti, Samuel Riedel, Matheus Cavalcante, Alessandro Vanelli-Coralli, Luca Benini

    Abstract: Radio Access Networks (RAN) workloads are rapidly scaling up in data processing intensity and throughput as the 5G (and beyond) standards grow in number of antennas and sub-carriers. Offering flexible Processing Elements (PEs), efficient memory access, and a productive parallel programming model, many-core clusters are a well-matched architecture for next-generation software-defined RANs, but stag… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures and 3 tables

  19. arXiv:2405.04257  [pdf, other

    cs.AR

    Insights from Basilisk: Are Open-Source EDA Tools Ready for a Multi-Million-Gate, Linux-Booting RV64 SoC Design?

    Authors: Philippe Sauter, Thomas Benz, Paul Scheffler, Frank K. Gürkaynak, Luca Benini

    Abstract: Designing complex, multi-million-gate application-specific integrated circuits requires robust and mature electronic design automation (EDA) tools. We describe our efforts in enhancing the open-source Yosys+Openroad EDA flow to implement Basilisk, a fully open-source, Linux-booting RV64GC system-on-chip (SoC) design. We analyze the quality-of-results impact of our enhancements to synthesis tools,… ▽ More

    Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures, submitted at IWLS 2024

  20. arXiv:2405.03523  [pdf, other

    cs.AR

    Basilisk: Achieving Competitive Performance with Open EDA Tools on an Open-Source Linux-Capable RISC-V SoC

    Authors: Phillippe Sauter, Thomas Benz, Paul Scheffler, Zerun Jiang, Beat Muheim, Frank K. Gürkaynak, Luca Benini

    Abstract: We introduce Basilisk, an optimized application-specific integrated circuit (ASIC) implementation and design flow building on the end-to-end open-source Iguana system-on-chip (SoC). We present enhancements to synthesis tools and logic optimization scripts improving quality of results (QoR), as well as an optimized physical design with an improved power grid and cell placement integration enabling… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 2 pages, 1 figure, accepted as a poster at the RISC-V Summit Europe 2024

  21. arXiv:2404.11488  [pdf, other

    cs.CV cs.AI

    Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

    Authors: Luca Bompani, Manuele Rusci, Daniele Palossi, Francesco Conti, Luca Benini

    Abstract: This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors. This method reduces the average compute load of an off-the-shelf Deep Neural Network (DNN) based object detector by up to 2.25$\times$ by alternating the processing of high-resolution images (320$\times$320 pixels) with multiple down-sized fr… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 figures Accepted for publication at the Embedded Vision Workshop of the Computer Vision and Pattern Recognition conference, Seattle, 2024

    ACM Class: I.4

  22. arXiv:2404.05303  [pdf, other

    cs.MS cs.AR

    SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers

    Authors: Paul Scheffler, Luca Colagrande, Luca Benini

    Abstract: Stencil codes are performance-critical in many compute-intensive applications, but suffer from significant address calculation and irregular memory access overheads. This work presents SARIS, a general and highly flexible methodology for stencil acceleration using register-mapped indirect streams. We demonstrate SARIS for various stencil codes on an eight-core RISC-V compute cluster with indirect… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 6 pages, 5 figures, 2 tables. Accepted at DAC 2024

  23. arXiv:2404.02945  [pdf, other

    cs.LG cs.AI cs.DC cs.PF

    Optimizing the Deployment of Tiny Transformers on Low-Power MCUs

    Authors: Victor J. B. Jung, Alessio Burrello, Moritz Scherer, Francesco Conti, Luca Benini

    Abstract: Transformer networks are rapidly becoming SotA in many fields, such as NLP and CV. Similarly to CNN, there is a strong push for deploying Transformer models at the extreme edge, ultimately fitting the tiny power budget and memory footprint of MCUs. However, the early approaches in this direction are mostly ad-hoc, platform, and model-specific. This work aims to enable and optimize the flexible, mu… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Pre-print manuscript submitted for review to the IEEE Transactions on Computers

  24. arXiv:2404.02944  [pdf, other

    cs.LG cs.AI eess.SY

    Foundation Models for Structural Health Monitoring

    Authors: Luca Benfenati, Daniele Jahier Pagliari, Luca Zanatta, Yhorman Alexander Bedoya Velez, Andrea Acquaviva, Massimo Poncino, Enrico Macii, Luca Benini, Alessio Burrello

    Abstract: Structural Health Monitoring (SHM) is a critical task for ensuring the safety and reliability of civil infrastructures, typically realized on bridges and viaducts by means of vibration monitoring. In this paper, we propose for the first time the use of Transformer neural networks, with a Masked Auto-Encoder architecture, as Foundation Models for SHM. We demonstrate the ability of these models to l… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 16 pages, 4 tables, 9 figures

    ACM Class: I.2.1; I.2.3

  25. arXiv:2404.01908  [pdf, other

    cs.AR cs.DC

    Optimizing Offload Performance in Heterogeneous MPSoCs

    Authors: Luca Colagrande, Luca Benini

    Abstract: Heterogeneous multi-core architectures combine a few "host" cores, optimized for single-thread performance, with many small energy-efficient "accelerator" cores for data-parallel processing, on a single chip. Offloading a computation to the many-core acceleration fabric introduces a communication and synchronization cost which reduces the speedup attainable on the accelerator, particularly for sma… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 2 pages, 1 figure. Accepted for publication in the DATE24 conference proceedings

  26. arXiv:2403.16696  [pdf, other

    cs.RO eess.SY

    BatDeck: Advancing Nano-drone Navigation with Low-power Ultrasound-based Obstacle Avoidance

    Authors: Hanna Müller, Victor Kartsch, Michele Magno, Luca Benini

    Abstract: Nano-drones, distinguished by their agility, minimal weight, and cost-effectiveness, are particularly well-suited for exploration in confined, cluttered and narrow spaces. Recognizing transparent, highly reflective or absorbing materials, such as glass and metallic surfaces is challenging, as classical sensors, such as cameras or laser rangers, often do not detect them. Inspired by bats, which can… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2403.11661  [pdf, other

    cs.RO eess.SY

    Combining Local and Global Perception for Autonomous Navigation on Nano-UAVs

    Authors: Lorenzo Lamberti, Georg Rutishauser, Francesco Conti, Luca Benini

    Abstract: A critical challenge in deploying unmanned aerial vehicles (UAVs) for autonomous tasks is their ability to navigate in an unknown environment. This paper introduces a novel vision-depth fusion approach for autonomous navigation on nano-UAVs. We combine the visual-based PULP-Dronet convolutional neural network for semantic information extraction, i.e., serving as the global perception, with 8x8px d… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 5 pages, 2 figures, 1 table, 1 video

  28. arXiv:2403.10549  [pdf, other

    cs.SD cs.LG eess.AS

    On-Device Domain Learning for Keyword Spotting on Low-Power Extreme Edge Embedded Systems

    Authors: Cristian Cioflan, Lukas Cavigelli, Manuele Rusci, Miguel de Prado, Luca Benini

    Abstract: Keyword spotting accuracy degrades when neural networks are exposed to noisy environments. On-site adaptation to previously unseen noise is crucial to recovering accuracy loss, and on-device learning is required to ensure that the adaptation process happens entirely on the edge device. In this work, we propose a fully on-device domain adaptation system achieving up to 14% accuracy gains over alrea… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 5 pages, 2 tables, 2 figures. Accepted at IEEE AICAS 2024

  29. arXiv:2403.07851  [pdf, other

    cs.LG cs.CV

    12 mJ per Class On-Device Online Few-Shot Class-Incremental Learning

    Authors: Yoga Esa Wibowo, Cristian Cioflan, Thorir Mar Ingolfsson, Michael Hersche, Leo Zhao, Abbas Rahimi, Luca Benini

    Abstract: Few-Shot Class-Incremental Learning (FSCIL) enables machine learning systems to expand their inference capabilities to new classes using only a few labeled examples, without forgetting the previously learned classes. Classical backpropagation-based learning and its variants are often unsuitable for battery-powered, memory-constrained systems at the extreme edge. In this work, we introduce Online F… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 6 pages, 4 tables, 3 figures. Accepted at IEEE DATE 2024

  30. arXiv:2403.07802  [pdf, other

    cs.SD cs.LG eess.AS

    Boosting keyword spotting through on-device learnable user speech characteristics

    Authors: Cristian Cioflan, Lukas Cavigelli, Luca Benini

    Abstract: Keyword spotting systems for always-on TinyML-constrained applications require on-site tuning to boost the accuracy of offline trained classifiers when deployed in unseen inference conditions. Adapting to the speech peculiarities of target users requires many in-domain samples, often unavailable in real-world scenarios. Furthermore, current on-device learning techniques rely on computationally int… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 tables, 2 figures. Accepted as a full paper by the tinyML Research Symposium 2024

  31. arXiv:2402.13005  [pdf, other

    eess.SP cs.LG

    SzCORE: A Seizure Community Open-source Research Evaluation framework for the validation of EEG-based automated seizure detection algorithms

    Authors: Jonathan Dan, Una Pale, Alireza Amirshahi, William Cappelletti, Thorir Mar Ingolfsson, Xiaying Wang, Andrea Cossettini, Adriano Bernini, Luca Benini, Sándor Beniczky, David Atienza, Philippe Ryvlin

    Abstract: The need for high-quality automated seizure detection algorithms based on electroencephalography (EEG) becomes ever more pressing with the increasing use of ambulatory and long-term EEG monitoring. Heterogeneity in validation methods of these algorithms influences the reported results and makes comprehensive evaluation and comparison challenging. This heterogeneity concerns in particular the choic… ▽ More

    Submitted 8 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  32. arXiv:2402.12986  [pdf, other

    cs.AR

    Enabling Efficient Hybrid Systolic Computation in Shared L1-Memory Manycore Clusters

    Authors: Sergio Mazzola, Samuel Riedel, Luca Benini

    Abstract: Systolic arrays and shared-L1-memory manycore clusters are commonly used architectural paradigms that offer different trade-offs to accelerate parallel workloads. While the first excel with regular dataflow at the cost of rigid architectures and complex programming models, the second are versatile and easy to program but require explicit dataflow management and synchronization. This work aims at e… ▽ More

    Submitted 24 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  33. arXiv:2402.10748  [pdf, other

    eess.SP cs.HC cs.LG

    A Tiny Transformer for Low-Power Arrhythmia Classification on Microcontrollers

    Authors: Paola Busia, Matteo Antonio Scrugli, Victor Jean-Baptiste Jung, Luca Benini, Paolo Meloni

    Abstract: Wearable systems for the continuous and real-time monitoring of cardiovascular diseases are becoming widespread and valuable assets in diagnosis and therapy. A promising approach for real-time analysis of the electrocardiographic (ECG) signal and the detection of heart conditions, such as arrhythmia, is represented by the transformer machine learning model. Transformers are powerful models for the… ▽ More

    Submitted 21 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 2024 IEEE Transactions on Biomedical Circuits and Systems

  34. A Precision-Optimized Fixed-Point Near-Memory Digital Processing Unit for Analog In-Memory Computing

    Authors: Elena Ferro, Athanasios Vasilopoulos, Corey Lammie, Manuel Le Gallo, Luca Benini, Irem Boybat, Abu Sebastian

    Abstract: Analog In-Memory Computing (AIMC) is an emerging technology for fast and energy-efficient Deep Learning (DL) inference. However, a certain amount of digital post-processing is required to deal with circuit mismatches and non-idealities associated with the memory devices. Efficient near-memory digital logic is critical to retain the high area/energy efficiency and low latency of AIMC. Existing syst… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted at ISCAS2024

  35. arXiv:2401.16876  [pdf, other

    cs.CV cs.LG

    Zero-shot Classification using Hyperdimensional Computing

    Authors: Samuele Ruffino, Geethan Karunaratne, Michael Hersche, Luca Benini, Abu Sebastian, Abbas Rahimi

    Abstract: Classification based on Zero-shot Learning (ZSL) is the ability of a model to classify inputs into novel classes on which the model has not previously seen any training examples. Providing an auxiliary descriptor in the form of a set of attributes describing the new classes involved in the ZSL-based classification is one of the favored approaches to solving this challenging task. In this work, ins… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: This is the extended version of a paper accepted in the Design, Automation, and Test in Europe Conference (DATE), 2024

  36. arXiv:2401.15639  [pdf, other

    cs.AR

    TOP: Towards Open & Predictable Heterogeneous SoCs

    Authors: Luca Valente, Francesco Restuccia, Davide Rossi, Ryan Kastner, Luca Benini

    Abstract: Ensuring predictability in modern real-time Systems-on-Chip (SoCs) is an increasingly critical concern for many application domains such as automotive, robotics, and industrial automation. An effective approach involves the modeling and development of hardware components, such as interconnects and shared memory resources, to evaluate or enforce their deterministic behavior. Unfortunately, these IP… ▽ More

    Submitted 7 June, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  37. arXiv:2401.09359  [pdf, other

    cs.AR

    LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems through Polling-Free and Retry-Free Operation

    Authors: Samuel Riedel, Marc Gantenbein, Alessandro Ottaviano, Torsten Hoefler, Luca Benini

    Abstract: Extensive polling in shared-memory manycore systems can lead to contention, decreased throughput, and poor energy efficiency. Both lock implementations and the general-purpose atomic operation, load-reserved/store-conditional (LRSC), cause polling due to serialization and retries. To alleviate this overhead, we propose LRwait and SCwait, a synchronization pair that eliminates polling by allowing c… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 6 pages, 6 figures, 2 tables, accepted as a regular paper at DATE24

  38. arXiv:2401.04012  [pdf, other

    cs.AR

    MX: Enhancing RISC-V's Vector ISA for Ultra-Low Overhead, Energy-Efficient Matrix Multiplication

    Authors: Matteo Perotti, Yichao Zhang, Matheus Cavalcante, Enis Mustafa, Luca Benini

    Abstract: Dense Matrix Multiplication (MatMul) is arguably one of the most ubiquitous compute-intensive kernels, spanning linear algebra, DSP, graphics, and machine learning applications. Thus, MatMul optimization is crucial not only in high-performance processors but also in embedded low-power platforms. Several Instruction Set Architectures (ISAs) have recently included matrix extensions to improve MatMul… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  39. A Heterogeneous RISC-V based SoC for Secure Nano-UAV Navigation

    Authors: Luca Valente, Alessandro Nadalini, Asif Veeran, Mattia Sinigaglia, Bruno Sa, Nils Wistoff, Yvan Tortorella, Simone Benatti, Rafail Psiakis, Ari Kulmala, Baker Mohammad, Sandro Pinto, Daniele Palossi, Luca Benini, Davide Rossi

    Abstract: The rapid advancement of energy-efficient parallel ultra-low-power (ULP) ucontrollers units (MCUs) is enabling the development of autonomous nano-sized unmanned aerial vehicles (nano-UAVs). These sub-10cm drones represent the next generation of unobtrusive robotic helpers and ubiquitous smart sensors. However, nano-UAVs face significant power and payload constraints while requiring advanced comput… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  40. arXiv:2401.01826  [pdf, other

    cs.PF cs.OS

    Data-Driven Power Modeling and Monitoring via Hardware Performance Counters Tracking

    Authors: Sergio Mazzola, Gabriele Ara, Thomas Benz, Björn Forsberg, Tommaso Cucinotta, Luca Benini

    Abstract: In the current high-performance and embedded computing era, full-stack energy-centric design is paramount. Use cases require increasingly high performance at an affordable power budget, often under real-time constraints. Extreme heterogeneity and parallelism address these issues but greatly complicate online power consumption assessment, which is essential for dynamic hardware and software stack a… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: 13 pages, 5 figures, submitted to the IEEE for possible publication

  41. arXiv:2312.14750  [pdf, other

    cs.AR

    Siracusa: A 16 nm Heterogenous RISC-V SoC for Extended Reality with At-MRAM Neural Engine

    Authors: Arpan Suravi Prasad, Moritz Scherer, Francesco Conti, Davide Rossi, Alfio Di Mauro, Manuel Eggimann, Jorge Tómas Gómez, Ziyun Li, Syed Shakib Sarwar, Zhao Wang, Barbara De Salvo, Luca Benini

    Abstract: Extended reality (XR) applications are Machine Learning (ML)-intensive, featuring deep neural networks (DNNs) with millions of weights, tightly latency-bound (10-20 ms end-to-end), and power-constrained (low tens of mW average power). While ML performance and efficiency can be achieved by introducing neural engines within low-power systems-on-chip (SoCs), system-level power for nontrivial DNNs dep… ▽ More

    Submitted 14 April, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Final accepted manuscript pre-print submitted to the IEEE Journal of Solid-State Circuits

  42. arXiv:2312.13086  [pdf, other

    cs.RO eess.SY

    Multi-sensory Anti-collision Design for Autonomous Nano-swarm Exploration

    Authors: Mahyar Pourjabar, Manuele Rusci, Luca Bompani, Lorenzo Lamberti, Vlad Niculescu, Daniele Palossi, Luca Benini

    Abstract: This work presents a multi-sensory anti-collision system design to achieve robust autonomous exploration capabilities for a swarm of 10 cm-side nano-drones operating on object detection missions. We combine lightweight single-beam laser ranging to avoid proximity collisions with a long-range vision-based obstacle avoidance deep learning model (i.e., PULP-Dronet) and an ultra-wide-band (UWB) based… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  43. arXiv:2312.05605  [pdf, other

    cs.LG cs.CV

    TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing

    Authors: Aleksandar Terzic, Michael Hersche, Geethan Karunaratne, Luca Benini, Abu Sebastian, Abbas Rahimi

    Abstract: MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as $O(LlogL)$, with $L$ being the sequence length. We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computation… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  44. arXiv:2312.02829  [pdf, other

    cs.LG cs.AI stat.ML

    MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition

    Authors: Nicolas Menet, Michael Hersche, Geethan Karunaratne, Luca Benini, Abu Sebastian, Abbas Rahimi

    Abstract: With the advent of deep learning, progressively larger neural networks have been designed to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of inference by exploiting computation in superposition. To reduce the computational burden per input, we propose Multiple-Input-Multiple-Output Neural Networks (MIMONets) capable of handling many inputs at once. MIMONet… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: accepted in NeurIPS 2023

  45. arXiv:2311.17815  [pdf, other

    cs.AR cs.AI

    A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

    Authors: Fabrizio Ferrandi, Serena Curzel, Leandro Fiorin, Daniele Ielmini, Cristina Silvano, Francesco Conti, Alessio Burrello, Francesco Barchi, Luca Benini, Luciano Lavagno, Teodoro Urso, Enrico Calore, Sebastiano Fabio Schifano, Cristian Zambelli, Maurizio Palesi, Giuseppe Ascia, Enrico Russo, Nicola Petra, Davide De Caro, Gennaro Di Meo, Valeria Cardellini, Salvatore Filippone, Francesco Lo Presti, Francesco Silvestri, Paolo Palazzari , et al. (1 additional authors not shown)

    Abstract: In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  46. arXiv:2311.10378  [pdf, other

    cs.AR

    Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV

    Authors: Chi Zhang, Paul Scheffler, Thomas Benz, Matteo Perotti, Luca Benini

    Abstract: Sparse matrix vector multiplication (SpMV) is central to numerous data-intensive applications, but requires streaming indirect memory accesses that severely degrade both processing and memory throughput in state-of-the-art architectures. Near-memory hardware units, decoupling indirect streams from processing elements, partially alleviate the bottleneck, but rely on low DRAM access granularity, whi… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: 6 pages, 6 figures. Submitted to DATE 2024

  47. arXiv:2311.10207  [pdf, other

    cs.AR cs.CV cs.LG stat.ML

    Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication

    Authors: Jannis Schönleber, Lukas Cavigelli, Renzo Andri, Matteo Perotti, Luca Benini

    Abstract: From classical HPC to deep learning, MatMul is at the heart of today's computing. The recent Maddness method approximates MatMul without the need for multiplication by using a hash-based version of product quantization (PQ) indexing into a look-up table (LUT). Stella Nera is the first Maddness accelerator and it achieves 15x higher area efficiency (GMAC/s/mm^2) and more than 25x higher energy effi… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 6 pages, 7 figures, preprint under review

  48. arXiv:2311.09662  [pdf, other

    cs.AR

    AXI-REALM: A Lightweight and Modular Interconnect Extension for Traffic Regulation and Monitoring of Heterogeneous Real-Time SoCs

    Authors: Thomas Benz, Alessandro Ottaviano, Robert Balas, Angelo Garofalo, Francesco Restuccia, Alessandro Biondi, Luca Benini

    Abstract: The increasing demand for heterogeneous functionality in the automotive industry and the evolution of chip manufacturing processes have led to the transition from federated to integrated critical real-time embedded systems (CRTESs). This leads to higher integration challenges of conventional timing predictability techniques due to access contention on shared resources, which can be resolved by pro… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 6 pages, 6 figures, accepted as a regular paper at DATE24

  49. arXiv:2311.09645  [pdf, other

    cs.AR

    PELS: A Lightweight and Flexible Peripheral Event Linking System for Ultra-Low Power IoT Processors

    Authors: Alessandro Ottaviano, Robert Balas, Philippe Sauter, Manuel Eggimann, Luca Benini

    Abstract: A key challenge for ultra-low-power (ULP) devices is handling peripheral linking, where the main central processing unit (CPU) periodically mediates the interaction among multiple peripherals following wake-up events. Current solutions address this problem by either integrating event interconnects that route single-wire event lines among peripherals or by general-purpose I/O processors, with a str… ▽ More

    Submitted 18 January, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: 6 pages, accepted at DATE24 conference, camera-ready version

  50. arXiv:2311.08320  [pdf, other

    cs.AR

    CV32RT: Enabling Fast Interrupt and Context Switching for RISC-V Microcontrollers

    Authors: Robert Balas, Alessandro Ottaviano, Luca Benini

    Abstract: Processors using the open RISC-V ISA are finding increasing adoption in the embedded world. Many embedded use cases have real-time constraints and require flexible, predictable, and fast reactive handling of incoming events. However, RISC- V processors are still lagging in this area compared to more mature proprietary architectures, such as ARM Cortex-M and TriCore, which have been tuned for years… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 12 pages, submitted to IEEE Transactions on VLSI Systems (TVLSI)