-
Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey
Authors:
Milan Ganai,
Sicun Gao,
Sylvia Herbert
Abstract:
Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was limited to verifying lo…
▽ More
Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was limited to verifying low-dimensional dynamical systems -- this is because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. To address this limitation, in recent years, there have been methods that compute the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of learned control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. In this survey paper, we review the recent developments in the field of HJ reachability estimation in reinforcement learning that would provide a foundational basis for further research into reliability in high-dimensional systems.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Trustworthy Contrast-enhanced Brain MRI Synthesis
Authors:
Jiyao Liu,
Yuxin Li,
Shangqi Gao,
Yuncheng Zhou,
Xin Gao,
Ningsheng Xu,
Xiao-Yong Zhang,
Xiahai Zhuang
Abstract:
Contrast-enhanced brain MRI (CE-MRI) is a valuable diagnostic technique but may pose health risks and incur high costs. To create safer alternatives, multi-modality medical image translation aims to synthesize CE-MRI images from other available modalities. Although existing methods can generate promising predictions, they still face two challenges, i.e., exhibiting over-confidence and lacking inte…
▽ More
Contrast-enhanced brain MRI (CE-MRI) is a valuable diagnostic technique but may pose health risks and incur high costs. To create safer alternatives, multi-modality medical image translation aims to synthesize CE-MRI images from other available modalities. Although existing methods can generate promising predictions, they still face two challenges, i.e., exhibiting over-confidence and lacking interpretability on predictions. To address the above challenges, this paper introduces TrustI2I, a novel trustworthy method that reformulates multi-to-one medical image translation problem as a multimodal regression problem, aiming to build an uncertainty-aware and reliable system. Specifically, our method leverages deep evidential regression to estimate prediction uncertainties and employs an explicit intermediate and late fusion strategy based on the Mixture of Normal Inverse Gamma (MoNIG) distribution, enhancing both synthesis quality and interpretability. Additionally, we incorporate uncertainty calibration to improve the reliability of uncertainty. Validation on the BraTS2018 dataset demonstrates that our approach surpasses current methods, producing higher-quality images with rational uncertainty estimation.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Channel Modeling Aided Dataset Generation for AI-Enabled CSI Feedback: Advances, Challenges, and Solutions
Authors:
Yupeng Li,
Gang Li,
Zirui Wen,
Shuangfeng Han,
Shijian Gao,
Guangyi Liu,
Jiangzhou Wang
Abstract:
The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation metho…
▽ More
The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation method based on a limited number of field channel data. Specifically, the user equipment (UE) extracts the primary stochastic parameters of the field channel data and transmits them to the base station (BS). The BS then updates the typical TR 38.901 model parameters with the extracted parameters. In this way, the updated channel model is used to generate the dataset. This strategy comprehensively considers the dataset collection, model generalization, model monitoring, and so on. Simulations verify that our proposed strategy can significantly improve performance compared to the benchmarks.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
LLM4CP: Adapting Large Language Models for Channel Prediction
Authors:
Boxun Liu,
Xuanyu Liu,
Shijian Gao,
Xiang Cheng,
Liuqing Yang
Abstract:
Channel prediction is an effective approach for reducing the feedback or estimation overhead in massive multi-input multi-output (m-MIMO) systems. However, existing channel prediction methods lack precision due to model mismatch errors or network generalization issues. Large language models (LLMs) have demonstrated powerful modeling and generalization abilities, and have been successfully applied…
▽ More
Channel prediction is an effective approach for reducing the feedback or estimation overhead in massive multi-input multi-output (m-MIMO) systems. However, existing channel prediction methods lack precision due to model mismatch errors or network generalization issues. Large language models (LLMs) have demonstrated powerful modeling and generalization abilities, and have been successfully applied to cross-modal tasks, including the time series analysis. Leveraging the expressive power of LLMs, we propose a pre-trained LLM-empowered channel prediction method (LLM4CP) to predict the future downlink channel state information (CSI) sequence based on the historical uplink CSI sequence. We fine-tune the network while freezing most of the parameters of the pre-trained LLM for better cross-modality knowledge transfer. To bridge the gap between the channel data and the feature space of the LLM, preprocessor, embedding, and output modules are specifically tailored by taking into account unique channel characteristics. Simulations validate that the proposed method achieves SOTA prediction performance on full-sample, few-shot, and generalization tests with low training and inference costs.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Self-reconfigurable Multifunctional Memristive Nociceptor for Intelligent Robotics
Authors:
Shengbo Wang,
Mingchao Fang,
Lekai Song,
Cong Li,
Jian Zhang,
Arokia Nathan,
Guohua Hu,
Shuo Gao
Abstract:
Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute…
▽ More
Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute has been currently omitted, but it is highly desired for artificial nociceptors. Inspired by these shortcomings, this article presents, for the first time, a Self-Directed Channel (SDC) memristor-based self-reconfigurable nociceptor, capable of perceiving hazardous pressure stimuli under different temperatures and demonstrates key features of tactile nociceptors, including 'threshold,' 'no-adaptation,' and 'sensitization.' The maximum amplification of hazardous external stimuli is 1000%, and its response characteristics dynamically adapt to current temperature conditions by automatically altering the generated modulation schemes for the memristor. The maximum difference ratio of the response of memristors at different temperatures is 500%, and this adaptability closely mimics the functions of biological tactile nociceptors, resulting in accurate danger perception in various conditions. Beyond temperature adaptation, this memristor-based nociceptor has the potential to integrate different sensory modalities by applying various sensors, thereby achieving human-like perception capabilities in real-world environments.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
Doubly-Dynamic ISAC Precoding for Vehicular Networks: A Constrained Deep Reinforcement Learning (CDRL) Approach
Authors:
Zonghui Yang,
Shijian Gao,
Xiang Cheng
Abstract:
Integrated sensing and communication (ISAC) technology is essential for enabling the vehicular networks. However, the communication channel in this scenario exhibits time-varying characteristics, and the potential targets may move rapidly, creating a doubly-dynamic phenomenon. This nature poses a challenge for real-time precoder design. While optimization-based solutions are widely researched, the…
▽ More
Integrated sensing and communication (ISAC) technology is essential for enabling the vehicular networks. However, the communication channel in this scenario exhibits time-varying characteristics, and the potential targets may move rapidly, creating a doubly-dynamic phenomenon. This nature poses a challenge for real-time precoder design. While optimization-based solutions are widely researched, they are complex and heavily rely on perfect prior information, which is impractical in double dynamics. To address this challenge, we propose using constrained deep reinforcement learning (CDRL) to facilitate dynamic updates to the ISAC precoder design. Additionally, the primal dual-deep deterministic policy gradient (PD-DDPG) and Wolpertinger architecture are tailored to efficiently train the algorithm under complex constraints and variable numbers of users. The proposed scheme not only adapts to the dynamics based on observations but also leverages environmental information to enhance performance and reduce complexity. Its superiority over existing candidates has been validated through experiments.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Beam Pattern Modulation Embedded Hybrid Transceiver Optimization for Integrated Sensing and Communication
Authors:
Boxun Liu,
Shijian Gao,
Zonghui Yang,
Xiang Cheng,
Liuqing Yang
Abstract:
Integrated sensing and communication (ISAC) emerges as a promising technology for B5G/6G, particularly in the millimeter-wave (mmWave) band. However, the widely utilized hybrid architecture in mmWave systems compromises multiplexing gain due to the constraints of limited radio frequency chains. Moreover, additional sensing functionalities exacerbate the impairment of spectrum efficiency (SE). In t…
▽ More
Integrated sensing and communication (ISAC) emerges as a promising technology for B5G/6G, particularly in the millimeter-wave (mmWave) band. However, the widely utilized hybrid architecture in mmWave systems compromises multiplexing gain due to the constraints of limited radio frequency chains. Moreover, additional sensing functionalities exacerbate the impairment of spectrum efficiency (SE). In this paper, we present an optimized beam pattern modulation-embedded ISAC (BPM-ISAC) transceiver design, which spares one RF chain for sensing and the others for communication. To compensate for the reduced SE, index modulation across communication beams is applied. We formulate an optimization problem aimed at minimizing the mean squared error (MSE) of the sensing beampattern, subject to a symbol MSE constraint. This problem is then solved by sequentially optimizing the analog and digital parts. Both the multi-aperture structure (MAS) and the multi-beam structure (MBS) are considered for the design of the analog part. We conduct theoretical analysis on the asymptotic pairwise error probability (APEP) and the Cramér-Rao bound (CRB) of direction of arrival (DoA) estimation. Numerical simulations validate the overall enhanced ISAC performance over existing alternatives.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Design and Implementation of mmWave Surface Wave Enabled Fluid Antennas and Experimental Results for Fluid Antenna Multiple Access
Authors:
Yuanjun Shen,
Boyi Tang,
Shuai Gao,
Kin-Fai Tong,
Hang Wong,
Kai-Kit Wong,
Yangyang Zhang
Abstract:
While multiple-input multiple-output (MIMO) technologies continue to advance, concerns arise as to how MIMO can remain scalable if more users are to be accommodated with an increasing number of antennas at the base station (BS) in the upcoming sixth generation (6G). Recently, the concept of fluid antenna system (FAS) has emerged, which promotes position flexibility to enable transmitter channel st…
▽ More
While multiple-input multiple-output (MIMO) technologies continue to advance, concerns arise as to how MIMO can remain scalable if more users are to be accommodated with an increasing number of antennas at the base station (BS) in the upcoming sixth generation (6G). Recently, the concept of fluid antenna system (FAS) has emerged, which promotes position flexibility to enable transmitter channel state information (CSI) free spatial multiple access on one radio frequency (RF) chain. On the theoretical side, the fluid antenna multiple access (FAMA) approach offers a scalable alternative to massive MIMO spatial multiplexing. However, FAMA lacks experimental validation and the hardware implementation of FAS remains a mysterious approach. The aim of this paper is to provide a novel hardware design for FAS and evaluate the performance of FAMA using experimental data. Our FAS design is based on a dynamically reconfigurable "fluid" radiator which is capable of adjusting its position within a predefined space. One single-channel fluid antenna (SCFA) and one double-channel fluid antenna (DCFA) are designed, electromagnetically simulated, fabricated, and measured. The measured radiation patterns of prototypes are imported into channel and network models for evaluating their performance in FAMA. The experimental results demonstrate that in the 5G millimeter-wave (mmWave) bands (24-30 GHz), the FAS prototypes can vary their gain up to an averaged value of 11 dBi. In the case of 4-user FAMA, the double-channel FAS can significantly reduce outage probability by 57% and increases the multiplexing gain to 2.27 when compared to a static omnidirectional antenna.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens
Authors:
Shaohua Gao,
Qi Jiang,
Yiqi Liao,
Yi Qiu,
Wanglei Ying,
Kailun Yang,
Kaiwei Wang,
Benhao Zhang,
Jian Bai
Abstract:
We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360°x(35°~110°) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 len…
▽ More
We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360°x(35°~110°) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 lenses. Moreover, we establish a physical structure model of PAL using the ray tracing method and study the influence of its physical parameters on compactness ratio. In addition, for the evaluation of local tolerances of annular surfaces, we propose a tolerance analysis method suitable for ASPAL. This analytical method can effectively analyze surface irregularities on annular surfaces and provide clear guidance on manufacturing tolerances for ASPAL. Benefiting from high-precision glass molding and injection molding aspheric lens manufacturing techniques, we finally manufactured 20 ASPALs in small batches. The weight of an ASPAL prototype is only 8.5 g. Our framework provides promising insights for the application of panoramic systems in space and weight-constrained environmental sensing scenarios such as intelligent security, micro-UAVs, and micro-robots.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems
Authors:
Yao Gao,
Qi Jiang,
Shaohua Gao,
Lei Sun,
Kailun Yang,
Kaiwei Wang
Abstract:
The popularity of mobile vision creates a demand for advanced compact computational imaging systems, which call for the development of both a lightweight optical system and an effective image reconstruction model. Recently, joint design pipelines come to the research forefront, where the two significant components are simultaneously optimized via data-driven learning to realize the optimal system…
▽ More
The popularity of mobile vision creates a demand for advanced compact computational imaging systems, which call for the development of both a lightweight optical system and an effective image reconstruction model. Recently, joint design pipelines come to the research forefront, where the two significant components are simultaneously optimized via data-driven learning to realize the optimal system design. However, the effectiveness of these designs largely depends on the initial setup of the optical system, complicated by a non-convex solution space that impedes reaching a globally optimal solution. In this work, we present Global Search Optics (GSO) to automatically design compact computational imaging systems through two parts: (i) Fused Optimization Method for Automatic Optical Design (OptiFusion), which searches for diverse initial optical systems under certain design specifications; and (ii) Efficient Physic-aware Joint Optimization (EPJO), which conducts parallel joint optimization of initial optical systems and image reconstruction networks with the consideration of physical constraints, culminating in the selection of the optimal solution. Extensive experimental results on the design of three-piece (3P) sphere computational imaging systems illustrate that the GSO serves as a transformative end-to-end lens design paradigm for superior global optimal structure searching ability, which provides compact computational imaging systems with higher imaging quality compared to traditional methods. The source code will be made publicly available at https://github.com/wumengshenyou/GSO.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey
Authors:
Marcos V. Conde,
Zhijun Lei,
Wen Li,
Cosmin Stejerean,
Ioannis Katsavounidis,
Radu Timofte,
Kihwan Yoon,
Ganzorig Gankhuyag,
Jiangtao Lv,
Long Sun,
Jinshan Pan,
Jiangxin Dong,
Jinhui Tang,
Zhiyuan Li,
Hao Wei,
Chenyang Ge,
Dongyang Zhang,
Tianle Liu,
Huaian Chen,
Yi Jin,
Menghan Zhou,
Yiqiang Yan,
Si Gao,
Biao Wu,
Shaoli Liu
, et al. (50 additional authors not shown)
Abstract:
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod…
▽ More
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning
Authors:
Yong Liu,
Mengtian Kang,
Shuo Gao,
Chi Zhang,
Ying Liu,
Shiming Li,
Yue Qi,
Arokia Nathan,
Wenjun Xu,
Chenyu Tang,
Edoardo Occhipinti,
Mayinuer Yusufu,
Ningli Wang,
Weiling Bai,
Luigi Occhipinti
Abstract:
Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To addres…
▽ More
Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To address this dilemma, we propose a general self-supervised machine learning framework that can handle diverse fundus diseases from unlabeled fundus images. Our method's AUC surpasses existing supervised approaches by 15.7%, and even exceeds performance of a single human expert. Furthermore, our model adapts well to various datasets from different regions, races, and heterogeneous image sources or qualities from multiple cameras or devices. Our method offers a label-free general framework to diagnose fundus diseases, which could potentially benefit telehealth programs for early screening of people at risk of vision loss.
△ Less
Submitted 23 April, 2024; v1 submitted 20 April, 2024;
originally announced April 2024.
-
SSVT: Self-Supervised Vision Transformer For Eye Disease Diagnosis Based On Fundus Images
Authors:
Jiaqi Wang,
Mengtian Kang,
Yong Liu,
Chi Zhang,
Ying Liu,
Shiming Li,
Yue Qi,
Wenjun Xu,
Chenyu Tang,
Edoardo Occhipinti,
Mayinuer Yusufu,
Ningli Wang,
Weiling Bai,
Shuo Gao,
Luigi G. Occhipinti
Abstract:
Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results. However, current methods are commonly based on supervised methods, bringing in a heavy workload to biomedical staff and hence suffering in expanding effective databases. To address this issue, in this artic…
▽ More
Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results. However, current methods are commonly based on supervised methods, bringing in a heavy workload to biomedical staff and hence suffering in expanding effective databases. To address this issue, in this article, we established a label-free method, name 'SSVT',which can automatically analyze un-labeled fundus images and generate high evaluation accuracy of 97.0% of four main eye diseases based on six public datasets and two datasets collected by Beijing Tongren Hospital. The promising results showcased the effectiveness of the proposed unsupervised learning method, and the strong application potential in biomedical resource shortage regions to improve global eye health.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge
Authors:
Ezequiel de la Rosa,
Mauricio Reyes,
Sook-Lei Liew,
Alexandre Hutton,
Roland Wiest,
Johannes Kaesmacher,
Uta Hanning,
Arsany Hakim,
Richard Zubal,
Waldo Valenzuela,
David Robben,
Diana M. Sima,
Vincenzo Anania,
Arne Brys,
James A. Meakin,
Anne Mickan,
Gabriel Broocks,
Christian Heitkamp,
Shengbo Gao,
Kongming Liang,
Ziji Zhang,
Md Mahfuzur Rahman Siddiquee,
Andriy Myronenko,
Pooya Ashtari,
Sabine Van Huffel
, et al. (33 additional authors not shown)
Abstract:
Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemi…
▽ More
Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemic stroke from various medical centers, facilitating the development of a wide range of cutting-edge segmentation algorithms by the research community. Through collaboration with leading teams, we combined top-performing algorithms into an ensemble model that overcomes the limitations of individual solutions. Our ensemble model achieved superior ischemic lesion detection and segmentation accuracy on our internal test set compared to individual algorithms. This accuracy generalized well across diverse image and disease variables. Furthermore, the model excelled in extracting clinical biomarkers. Notably, in a Turing-like test, neuroradiologists consistently preferred the algorithm's segmentations over manual expert efforts, highlighting increased comprehensiveness and precision. Validation using a real-world external dataset (N=1686) confirmed the model's generalizability. The algorithm's outputs also demonstrated strong correlations with clinical scores (admission NIHSS and 90-day mRS) on par with or exceeding expert-derived results, underlining its clinical relevance. This study offers two key findings. First, we present an ensemble algorithm (https://github.com/Tabrisrei/ISLES22_Ensemble) that detects and segments ischemic stroke lesions on DWI across diverse scenarios on par with expert (neuro)radiologists. Second, we show the potential for biomedical challenge outputs to extend beyond the challenge's initial objectives, demonstrating their real-world clinical applicability.
△ Less
Submitted 3 April, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
On moment relaxations for linear state feedback controller synthesis with non-convex quadratic costs and constraints
Authors:
Dennis Gramlich,
Sheng Gao,
Hao Zhang,
Carsten W. Scherer,
Christian Ebenbauer
Abstract:
We present a simple and effective way to account for non-convex costs and constraints~in~state feedback synthesis, and an interpretation for the variables in which state feedback synthesis is typically convex. We achieve this by deriving the controller design using moment matrices of state and input. It turns out that this approach allows the consideration of non-convex constraints by relaxing the…
▽ More
We present a simple and effective way to account for non-convex costs and constraints~in~state feedback synthesis, and an interpretation for the variables in which state feedback synthesis is typically convex. We achieve this by deriving the controller design using moment matrices of state and input. It turns out that this approach allows the consideration of non-convex constraints by relaxing them as expectation constraints, and that the variables in which state feedback synthesis is typically convexified can be identified with blocks of these moment matrices.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Superposed IM-OFDM (S-IM-OFDM): An Enhanced OFDM for Integrated Sensing and Communications
Authors:
Zonghui Yang,
Shijian Gao,
Xiang Cheng,
Liuqing Yang
Abstract:
Integrated sensing and communications (ISAC) is a critical enabler for emerging 6G applications, and at its core lies in the dual-functional waveform design. While orthogonal frequency division multiplexing (OFDM) has been a popular basic waveform, its primitive version falls short in sensing due to the inherent unregulated auto-correlation properties. Furthermore, the sensitivity to Doppler shift…
▽ More
Integrated sensing and communications (ISAC) is a critical enabler for emerging 6G applications, and at its core lies in the dual-functional waveform design. While orthogonal frequency division multiplexing (OFDM) has been a popular basic waveform, its primitive version falls short in sensing due to the inherent unregulated auto-correlation properties. Furthermore, the sensitivity to Doppler shift hinders its broader applications in dynamic scenarios. To address these issues, we propose a superposed index-modulated OFDM (S-IM-OFDM). The proposed scheme improves the sensing performance without excess power consumption by translating the energy efficiency of IM-OFDM onto sensing-oriented signals over OFDM. Also, it maintains excellent communication performance in time-varying channels by leveraging the sensed parameters to compensate for Doppler. Compared to conventional OFDM, the proposed S-IM-OFDM waveform exhibits better sensing capabilities and wider applicability in dynamic scenarios. Both theoretical analyses and simulations corroborate its dual benefits.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Beam Pattern Modulation Embedded mmWave Hybrid Transceiver Design Towards ISAC
Authors:
Boxun Liu,
Shijian Gao,
Zonghui Yang,
Xiang Cheng
Abstract:
Integrated Sensing and Communication (ISAC) emerges as a promising technology for B5G/6G, particularly in the millimeter-wave (mmWave) band. However, the widespread adoption of hybrid architecture in mmWave systems compromises multiplexing gain due to limited radio-frequency chains, resulting in mediocre performance when embedding sensing functionality. To avoid sacrificing the spectrum efficiency…
▽ More
Integrated Sensing and Communication (ISAC) emerges as a promising technology for B5G/6G, particularly in the millimeter-wave (mmWave) band. However, the widespread adoption of hybrid architecture in mmWave systems compromises multiplexing gain due to limited radio-frequency chains, resulting in mediocre performance when embedding sensing functionality. To avoid sacrificing the spectrum efficiency in hybrid structures while addressing performance bottlenecks in its extension to ISAC, we present an optimized beam pattern modulation-embedded ISAC (BPM-ISAC). BPM-ISAC applies index modulation over beamspace by selectively activating communication beams, aiming to minimize sensing beampattern mean squared error (MSE) under communication MSE constraints through dedicated hybrid transceiver design. Optimization involves the analog part through a min-MSE-based beam selection algorithm, followed by the digital part using an alternating optimization algorithm. Convergence and asymptotic pairwise error probability (APEP) analyses accompany numerical simulations, validating its overall enhanced ISAC performance over existing alternatives.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Real-World Computational Aberration Correction via Quantized Domain-Mixing Representation
Authors:
Qi Jiang,
Zhonghua Yi,
Shaohua Gao,
Yao Gao,
Xiaolong Qian,
Hao Shi,
Lei Sun,
Zhijie Xu,
Kailun Yang,
Kaiwei Wang
Abstract:
Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervi…
▽ More
Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervised Domain Adaptation (UDA). By incorporating readily accessible unpaired real-world data into training, we formalize the Domain Adaptive CAC (DACAC) task, and then introduce a comprehensive Real-world aberrated images (Realab) dataset to benchmark it. The setup task presents a formidable challenge due to the intricacy of understanding the target aberration domain. To this intent, we propose a novel Quntized Domain-Mixing Representation (QDMR) framework as a potent solution to the issue. QDMR adapts the CAC model to the target domain from three key aspects: (1) reconstructing aberrated images of both domains by a VQGAN to learn a Domain-Mixing Codebook (DMC) which characterizes the degradation-aware priors; (2) modulating the deep features in CAC model with DMC to transfer the target domain knowledge; and (3) leveraging the trained VQGAN to generate pseudo target aberrated images from the source ones for convincing target domain supervision. Extensive experiments on both synthetic and real-world benchmarks reveal that the models with QDMR consistently surpass the competitive methods in mitigating the synthetic-to-real gap, which produces visually pleasant real-world CAC results with fewer artifacts. Codes and datasets will be made publicly available.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Content-aware Masked Image Modeling Transformer for Stereo Image Compression
Authors:
Xinjie Zhang,
Shenyuan Gao,
Zhening Liu,
Jiawei Shao,
Xingtong Ge,
Dailan He,
Tongda Xu,
Yan Wang,
Jun Zhang
Abstract:
Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compressi…
▽ More
Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compression framework, named CAMSIC. CAMSIC independently transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model to capture both spatial and disparity dependencies, by introducing a novel content-aware masked image modeling (MIM) technique. Our content-aware MIM facilitates efficient bidirectional interaction between prior information and estimated tokens, which naturally obviates the need for an extra Transformer decoder. Experiments show that our stereo image codec achieves state-of-the-art rate-distortion performance on two stereo image datasets Cityscapes and InStereo2K with fast encoding and decoding speed.
△ Less
Submitted 19 March, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information
Authors:
Qiaochu Huang,
Xu He,
Boshi Tang,
Haolin Zhuang,
Liyang Chen,
Shuochen Gao,
Zhiyong Wu,
Haozhi Huang,
Helen Meng
Abstract:
Dance generation, as a branch of human motion generation, has attracted increasing attention. Recently, a few works attempt to enhance dance expressiveness, which includes genre matching, beat alignment, and dance dynamics, from certain aspects. However, the enhancement is quite limited as they lack comprehensive consideration of the aforementioned three factors. In this paper, we propose Expressi…
▽ More
Dance generation, as a branch of human motion generation, has attracted increasing attention. Recently, a few works attempt to enhance dance expressiveness, which includes genre matching, beat alignment, and dance dynamics, from certain aspects. However, the enhancement is quite limited as they lack comprehensive consideration of the aforementioned three factors. In this paper, we propose ExpressiveBailando, a novel dance generation method designed to generate expressive dances, concurrently taking all three factors into account. Specifically, we mitigate the issue of speed homogenization by incorporating frequency information into VQ-VAE, thus improving dance dynamics. Additionally, we integrate music style information by extracting genre- and beat-related features with a pre-trained music model, hence achieving improvements in the other two factors. Extensive experimental results demonstrate that our proposed method can generate dances with high expressiveness and outperforms existing methods both qualitatively and quantitatively.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Lightweight, error-tolerant edge detection using memristor-enabled stochastic logics
Authors:
Lekai Song,
Pengyu Liu,
Jingfang Pei,
Yang Liu,
Songwei Liu,
Shengbo Wang,
Leonard W. T. Ng,
Tawfique Hasan,
Kong-Pang Pun,
Shuo Gao,
Guohua Hu
Abstract:
The demand for efficient edge vision has spurred the interest in developing stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing tech…
▽ More
The demand for efficient edge vision has spurred the interest in developing stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing technique, facilitated with memristor-enabled stochastic logics. Specifically, we integrate the memristors with logic circuits and harness the stochasticity from the memristors to realize compact stochastic logics for stochastic number encoding and processing. The stochastic numbers, exhibiting well-regulated probabilities and correlations, can be processed to perform logic operations with statistical probabilities. This can facilitate lightweight stochastic edge detection for edge visual scenarios characterized with high-level noise errors. As a practical demonstration, we implement a hardware stochastic Roberts cross operator using the stochastic logics, and prove its exceptional edge detection performance, remarkably, with 95% less computational cost while withstanding 50% bit-flip errors. The results underscore the great potential of our stochastic edge detection approach in developing lightweight, error-tolerant edge vision hardware and systems for autonomous driving, virtual/augmented reality, medical imaging diagnosis, industrial automation, and beyond.
△ Less
Submitted 20 March, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results
Authors:
Kelly Payette,
Céline Steger,
Roxane Licandro,
Priscille de Dumast,
Hongwei Bran Li,
Matthew Barkovich,
Liu Li,
Maik Dannecker,
Chen Chen,
Cheng Ouyang,
Niccolò McConnell,
Alina Miron,
Yongmin Li,
Alena Uus,
Irina Grigorescu,
Paula Ramirez Gilliland,
Md Mahfuzur Rahman Siddiquee,
Daguang Xu,
Andriy Myronenko,
Haoyu Wang,
Ziyan Huang,
Jin Ye,
Mireia Alenyà,
Valentin Comte,
Oscar Camara
, et al. (42 additional authors not shown)
Abstract:
Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across dif…
▽ More
Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across different imaging centers remains unsolved, limiting real-world clinical applicability. The multi-center FeTA Challenge 2022 focuses on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two imaging centers as well as two additional unseen centers. The data from different centers varied in many aspects, including scanners used, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated in the challenge, and 17 algorithms were evaluated. Here, a detailed overview and analysis of the challenge results are provided, focusing on the generalizability of the submissions. Both in- and out of domain, the white matter and ventricles were segmented with the highest accuracy, while the most challenging structure remains the cerebral cortex due to anatomical complexity. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms. The resulting new methods contribute to improving the analysis of brain development in utero.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Lightweight Deep Learning Based Channel Estimation for Extremely Large-Scale Massive MIMO Systems
Authors:
Shen Gao,
Peihao Dong,
Zhiwen Pan,
Xiaohu You
Abstract:
Extremely large-scale massive multiple-input multiple-output (XL-MIMO) systems introduce the much higher channel dimensionality and incur the additional near-field propagation effect, aggravating the computation load and the difficulty to acquire the prior knowledge for channel estimation. In this article, an XL-MIMO channel network (XLCNet) is developed to estimate the high-dimensional channel, w…
▽ More
Extremely large-scale massive multiple-input multiple-output (XL-MIMO) systems introduce the much higher channel dimensionality and incur the additional near-field propagation effect, aggravating the computation load and the difficulty to acquire the prior knowledge for channel estimation. In this article, an XL-MIMO channel network (XLCNet) is developed to estimate the high-dimensional channel, which is a universal solution for both the near-field users and far-field users with different channel statistics. Furthermore, a compressed XLCNet (C-XLCNet) is designed via weight pruning and quantization to accelerate the model inference as well as to facilitate the model storage and transmission. Simulation results show the performance superiority and universality of XLCNet. Compared to XLCNet, C-XLCNet incurs the limited performance loss while reducing the computational complexity and model size by about $10 \times$ and $36 \times$, respectively.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
A Robust Super-resolution Gridless Imaging Framework for UAV-borne SAR Tomography
Authors:
Silin Gao,
Wenlong Wang,
Muhan Wang,
Zhe Zhang,
Zai Yang,
Xiaolan Qiu,
Bingchen Zhang,
Yirong Wu
Abstract:
Synthetic aperture radar (SAR) tomography (TomoSAR) retrieves three-dimensional (3-D) information from multiple SAR images, effectively addresses the layover problem, and has become pivotal in urban mapping. Unmanned aerial vehicle (UAV) has gained popularity as a TomoSAR platform, offering distinct advantages such as the ability to achieve 3-D imaging in a single flight, cost-effectiveness, rapid…
▽ More
Synthetic aperture radar (SAR) tomography (TomoSAR) retrieves three-dimensional (3-D) information from multiple SAR images, effectively addresses the layover problem, and has become pivotal in urban mapping. Unmanned aerial vehicle (UAV) has gained popularity as a TomoSAR platform, offering distinct advantages such as the ability to achieve 3-D imaging in a single flight, cost-effectiveness, rapid deployment, and flexible trajectory planning. The evolution of compressed sensing (CS) has led to the widespread adoption of sparse reconstruction techniques in TomoSAR signal processing, with a focus on $\ell _1$ norm regularization and other grid-based CS methods. However, the discretization of illuminated scene along elevation introduces modeling errors, resulting in reduced reconstruction accuracy, known as the "off-grid" effect. Recent advancements have introduced gridless CS algorithms to mitigate this issue. This paper presents an innovative gridless 3-D imaging framework tailored for UAV-borne TomoSAR. Capitalizing on the pulse repetition frequency (PRF) redundancy inherent in slow UAV platforms, a multiple measurement vectors (MMV) model is constructed to enhance noise immunity without compromising azimuth-range resolution. Given the sparsely placed array elements due to mounting platform constraints, an atomic norm soft thresholding algorithm is proposed for partially observed MMV, offering gridless reconstruction capability and super-resolution. An efficient alternative optimization algorithm is also employed to enhance computational efficiency. Validation of the proposed framework is achieved through computer simulations and flight experiments, affirming its efficacy in UAV-borne TomoSAR applications.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Compressing Image-to-Image Translation GANs Using Local Density Structures on Their Learned Manifold
Authors:
Alireza Ganjdanesh,
Shangqian Gao,
Hirad Alipanah,
Heng Huang
Abstract:
Generative Adversarial Networks (GANs) have shown remarkable success in modeling complex data distributions for image-to-image translation. Still, their high computational demands prohibit their deployment in practical scenarios like edge devices. Existing GAN compression methods mainly rely on knowledge distillation or convolutional classifiers' pruning techniques. Thus, they neglect the critical…
▽ More
Generative Adversarial Networks (GANs) have shown remarkable success in modeling complex data distributions for image-to-image translation. Still, their high computational demands prohibit their deployment in practical scenarios like edge devices. Existing GAN compression methods mainly rely on knowledge distillation or convolutional classifiers' pruning techniques. Thus, they neglect the critical characteristic of GANs: their local density structure over their learned manifold. Accordingly, we approach GAN compression from a new perspective by explicitly encouraging the pruned model to preserve the density structure of the original parameter-heavy model on its learned manifold. We facilitate this objective for the pruned model by partitioning the learned manifold of the original generator into local neighborhoods around its generated samples. Then, we propose a novel pruning objective to regularize the pruned model to preserve the local density structure over each neighborhood, resembling the kernel density estimation method. Also, we develop a collaborative pruning scheme in which the discriminator and generator are pruned by two pruning agents. We design the agents to capture interactions between the generator and discriminator by exchanging their peer's feedback when determining corresponding models' architectures. Thanks to such a design, our pruning method can efficiently find performant sub-networks and can maintain the balance between the generator and discriminator more effectively compared to baselines during pruning, thereby showing more stable pruning dynamics. Our experiments on image translation GAN models, Pix2Pix and CycleGAN, with various benchmark datasets and architectures demonstrate our method's effectiveness.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Ultrasensitive Textile Strain Sensors Redefine Wearable Silent Speech Interfaces with High Machine Learning Efficiency
Authors:
Chenyu Tang,
Muzi Xu,
Wentian Yi,
Zibo Zhang,
Edoardo Occhipinti,
Chaoqun Dong,
Dafydd Ravenscroft,
Sung-Min Jung,
Sanghyo Lee,
Shuo Gao,
Jong Min Kim,
Luigi G. Occhipinti
Abstract:
Our research presents a wearable Silent Speech Interface (SSI) technology that excels in device comfort, time-energy efficiency, and speech decoding accuracy for real-world use. We developed a biocompatible, durable textile choker with an embedded graphene-based strain sensor, capable of accurately detecting subtle throat movements. This sensor, surpassing other strain sensors in sensitivity by 42…
▽ More
Our research presents a wearable Silent Speech Interface (SSI) technology that excels in device comfort, time-energy efficiency, and speech decoding accuracy for real-world use. We developed a biocompatible, durable textile choker with an embedded graphene-based strain sensor, capable of accurately detecting subtle throat movements. This sensor, surpassing other strain sensors in sensitivity by 420%, simplifies signal processing compared to traditional voice recognition methods. Our system uses a computationally efficient neural network, specifically a one-dimensional convolutional neural network with residual structures, to decode speech signals. This network is energy and time-efficient, reducing computational load by 90% while achieving 95.25% accuracy for a 20-word lexicon and swiftly adapting to new users and words with minimal samples. This innovation demonstrates a practical, sensitive, and precise wearable SSI suitable for daily communication applications.
△ Less
Submitted 7 December, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Integrated Sensing and Communications Towards Proactive Beamforming in mmWave V2I via Multi-Modal Feature Fusion (MMFF)
Authors:
Haotian Zhang,
Shijian Gao,
Xiang Cheng,
Liuqing Yang
Abstract:
The future of vehicular communication networks relies on mmWave massive multi-input-multi-output antenna arrays for intensive data transfer and massive vehicle access. However, reliable vehicle-to-infrastructure links require exact alignment between the narrow beams, which traditionally involves excessive signaling overhead. To address this issue, we propose a novel proactive beamforming scheme th…
▽ More
The future of vehicular communication networks relies on mmWave massive multi-input-multi-output antenna arrays for intensive data transfer and massive vehicle access. However, reliable vehicle-to-infrastructure links require exact alignment between the narrow beams, which traditionally involves excessive signaling overhead. To address this issue, we propose a novel proactive beamforming scheme that integrates multi-modal sensing and communications via Multi-Modal Feature Fusion Network (MMFF-Net), which is composed of multiple neural network components with distinct functions. Unlike existing methods that rely solely on communication processing, our approach obtains comprehensive environmental features to improve beam alignment accuracy. We verify our scheme on the Vision-Wireless (ViWi) dataset, which we enriched with realistic vehicle drifting behavior. Our proposed MMFF-Net achieves more accurate and stable angle prediction, which in turn increases the achievable rates and reduces the communication system outage probability. Even in complex dynamic scenarios with adverse environment conditions, robust prediction results can be guaranteed, demonstrating the feasibility and practicality of the proposed proactive beamforming approach.
△ Less
Submitted 26 March, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Intelligent machines work in unstructured environments by differential neuromorphic computing
Authors:
Shengbo Wang,
Shuo Gao,
Chenyu Tang,
Edoardo Occhipinti,
Cong Li,
Shurui Wang,
Jiaqi Wang,
Hubin Zhao,
Guohua Hu,
Arokia Nathan,
Ravinder Dahiya,
Luigi Occhipinti
Abstract:
Efficient operation of intelligent machines in the real world requires methods that allow them to understand and predict the uncertainties presented by the unstructured environments with good accuracy, scalability and generalization, similar to humans. Current methods rely on pretrained networks instead of continuously learning from the dynamic signal properties of working environments and suffer…
▽ More
Efficient operation of intelligent machines in the real world requires methods that allow them to understand and predict the uncertainties presented by the unstructured environments with good accuracy, scalability and generalization, similar to humans. Current methods rely on pretrained networks instead of continuously learning from the dynamic signal properties of working environments and suffer inherent limitations, such as data-hungry procedures, and limited generalization capabilities. Herein, we present a memristor-based differential neuromorphic computing, perceptual signal processing and learning method for intelligent machines. The main features of environmental information such as amplification (>720%) and adaptation (<50%) of mechanical stimuli encoded in memristors, are extracted to obtain human-like processing in unstructured environments. The developed method takes advantage of the intrinsic multi-state property of memristors and exhibits good scalability and generalization, as confirmed by validation in two different application scenarios: object grasping and autonomous driving. In the former, a robot hand experimentally realizes safe and stable grasping through fast learning (in ~1 ms) the unknown object features (e.g., sharp corner and smooth surface) with a single memristor. In the latter, the decision-making information of 10 unstructured environments in autonomous driving (e.g., overtaking cars, pedestrians) is accurately (94%) extracted with a 40*25 memristor array. By mimicking the intrinsic nature of human low-level perception mechanisms, the electronic memristive neuromorphic circuit-based method, presented here shows the potential for adapting to diverse sensing technologies and helping intelligent machines generate smart high-level decisions in the real world.
△ Less
Submitted 17 November, 2023; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Authors:
Sizhou Chen,
Songyang Gao,
Sen Fang
Abstract:
The Transformer architecture has proven to be highly effective for Automatic Speech Recognition (ASR) tasks, becoming a foundational component for a plethora of research in the domain. Historically, many approaches have leaned on fixed-length attention windows, which becomes problematic for varied speech samples in duration and complexity, leading to data over-smoothing and neglect of essential lo…
▽ More
The Transformer architecture has proven to be highly effective for Automatic Speech Recognition (ASR) tasks, becoming a foundational component for a plethora of research in the domain. Historically, many approaches have leaned on fixed-length attention windows, which becomes problematic for varied speech samples in duration and complexity, leading to data over-smoothing and neglect of essential long-term connectivity. Addressing this limitation, we introduce Echo-MSA, a nimble module equipped with a variable-length attention mechanism that accommodates a range of speech sample complexities and durations. This module offers the flexibility to extract speech features across various granularities, spanning from frames and phonemes to words and discourse. The proposed design captures the variable length feature of speech and addresses the limitations of fixed-length attention. Our evaluation leverages a parallel attention architecture complemented by a dynamic gating mechanism that amalgamates traditional attention with the Echo-MSA module output. Empirical evidence from our study reveals that integrating Echo-MSA into the primary model's training regime significantly enhances the word error rate (WER) performance, all while preserving the intrinsic stability of the original model.
△ Less
Submitted 7 April, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Human Body Digital Twin: A Master Plan
Authors:
Chenyu Tang,
Wentian Yi,
Edoardo Occhipinti,
Yanning Dai,
Shuo Gao,
Luigi G. Occhipinti
Abstract:
A human body digital twin (DT) is a virtual representation of an individual's physiological state, created using real-time data from sensors and medical test devices, with the purpose of simulating, predicting, and optimizing health outcomes through advanced analytics and simulations. The human body DT has the potential to revolutionize healthcare and wellness, but its responsible and effective im…
▽ More
A human body digital twin (DT) is a virtual representation of an individual's physiological state, created using real-time data from sensors and medical test devices, with the purpose of simulating, predicting, and optimizing health outcomes through advanced analytics and simulations. The human body DT has the potential to revolutionize healthcare and wellness, but its responsible and effective implementation requires consideration of various factors. This article presents a comprehensive overview of the current status and future prospects of the human body DT and proposes a five-level roadmap for its development. The roadmap covers the development of various components, such as wearable devices, data collection, data analysis, and decision-making systems. The article also highlights the necessary support, security, cost, and ethical considerations that must be addressed in order to ensure responsible and effective implementation of the human body DT. The proposed roadmap provides a framework for guiding future development and offers a unique perspective on the future of the human body DT, facilitating new interdisciplinary research and innovative solutions in this rapidly evolving field.
△ Less
Submitted 12 September, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Distributed fusion filter over lossy wireless sensor networks with the presence of non-Gaussian noise
Authors:
Jiacheng He,
Bei Peng,
Zhenyu Feng,
Xuemei Mao,
Song Gao,
Gang Wang
Abstract:
The information transmission between nodes in a wireless sensor networks (WSNs) often causes packet loss due to denial-of-service (DoS) attack, energy limitations, and environmental factors, and the information that is successfully transmitted can also be contaminated by non-Gaussian noise. The presence of these two factors poses a challenge for distributed state estimation (DSE) over WSNs. In thi…
▽ More
The information transmission between nodes in a wireless sensor networks (WSNs) often causes packet loss due to denial-of-service (DoS) attack, energy limitations, and environmental factors, and the information that is successfully transmitted can also be contaminated by non-Gaussian noise. The presence of these two factors poses a challenge for distributed state estimation (DSE) over WSNs. In this paper, a generalized packet drop model is proposed to describe the packet loss phenomenon caused by DoS attacks and other factors. Moreover, a modified maximum correntropy Kalman filter is given, and it is extended to distributed form (DM-MCKF). In addition, a distributed modified maximum correntropy Kalman filter incorporating the generalized data packet drop (DM-MCKF-DPD) algorithm is provided to implement DSE with the presence of both non-Gaussian noise pollution and packet drop. A sufficient condition to ensure the convergence of the fixed-point iterative process of the DM-MCKF-DPD algorithm is presented and the computational complexity of the DM-MCKF-DPD algorithm is analyzed. Finally, the effectiveness and feasibility of the proposed algorithms are verified by simulations.
△ Less
Submitted 6 July, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
AVSegFormer: Audio-Visual Segmentation with Transformer
Authors:
Shengyi Gao,
Zhe Chen,
Guo Chen,
Wenhai Wang,
Tong Lu
Abstract:
The combination of audio and vision has long been a topic of interest in the multi-modal community. Recently, a new audio-visual segmentation (AVS) task has been introduced, aiming to locate and segment the sounding objects in a given video. This task demands audio-driven pixel-level scene understanding for the first time, posing significant challenges. In this paper, we propose AVSegFormer, a nov…
▽ More
The combination of audio and vision has long been a topic of interest in the multi-modal community. Recently, a new audio-visual segmentation (AVS) task has been introduced, aiming to locate and segment the sounding objects in a given video. This task demands audio-driven pixel-level scene understanding for the first time, posing significant challenges. In this paper, we propose AVSegFormer, a novel framework for AVS tasks that leverages the transformer architecture. Specifically, we introduce audio queries and learnable queries into the transformer decoder, enabling the network to selectively attend to interested visual features. Besides, we present an audio-visual mixer, which can dynamically adjust visual features by amplifying relevant and suppressing irrelevant spatial channels. Additionally, we devise an intermediate mask loss to enhance the supervision of the decoder, encouraging the network to produce more accurate intermediate predictions. Extensive experiments demonstrate that AVSegFormer achieves state-of-the-art results on the AVS benchmark. The code is available at https://github.com/vvvb-github/AVSegFormer.
△ Less
Submitted 18 December, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Intelligent Multi-Modal Sensing-Communication Integration: Synesthesia of Machines
Authors:
Xiang Cheng,
Haotian Zhang,
Jianan Zhang,
Shijian Gao,
Sijiang Li,
Ziwei Huang,
Lu Bai,
Zonghui Yang,
Xinhu Zheng,
Liuqing Yang
Abstract:
In the era of sixth-generation (6G) wireless communications, integrated sensing and communications (ISAC) is recognized as a promising solution to upgrade the physical system by endowing wireless communications with sensing capability. Existing ISAC is mainly oriented to static scenarios with radio-frequency (RF) sensors being the primary participants, thus lacking a comprehensive environment feat…
▽ More
In the era of sixth-generation (6G) wireless communications, integrated sensing and communications (ISAC) is recognized as a promising solution to upgrade the physical system by endowing wireless communications with sensing capability. Existing ISAC is mainly oriented to static scenarios with radio-frequency (RF) sensors being the primary participants, thus lacking a comprehensive environment feature characterization and facing a severe performance bottleneck in dynamic environments. To date, extensive surveys on ISAC have been conducted but are limited to summarizing RF-based radar sensing. Currently, some research efforts have been devoted to exploring multi-modal sensing-communication integration but still lack a comprehensive review. Therefore, we generalize the concept of ISAC inspired by human synesthesia to establish a unified framework of intelligent multi-modal sensing-communication integration and provide a comprehensive review under such a framework in this paper. The so-termed Synesthesia of Machines (SoM) gives the clearest cognition of such intelligent integration and details its paradigm for the first time. We commence by justifying the necessity of the new paradigm. Subsequently, we offer a definition of SoM and zoom into the detailed paradigm, which is summarized as three operation modes. To facilitate SoM research, we overview the prerequisite of SoM research, i.e., mixed multi-modal (MMM) datasets. Then, we introduce the mapping relationships between multi-modal sensing and communications. Afterward, we cover the technological review on SoM-enhance-based and SoM-concert-based applications. To corroborate the superiority of SoM, we also present simulation results related to dual-function waveform and predictive beamforming design. Finally, we propose some potential directions to inspire future research efforts.
△ Less
Submitted 20 November, 2023; v1 submitted 25 June, 2023;
originally announced June 2023.
-
Minimalist and High-Quality Panoramic Imaging with PSF-aware Transformers
Authors:
Qi Jiang,
Shaohua Gao,
Yao Gao,
Kailun Yang,
Zhonghua Yi,
Hao Shi,
Lei Sun,
Kaiwei Wang
Abstract:
High-quality panoramic images with a Field of View (FoV) of 360° are essential for contemporary panoramic computer vision tasks. However, conventional imaging systems come with sophisticated lens designs and heavy optical components. This disqualifies their usage in many mobile and wearable applications where thin and portable, minimalist imaging systems are desired. In this paper, we propose a Pa…
▽ More
High-quality panoramic images with a Field of View (FoV) of 360° are essential for contemporary panoramic computer vision tasks. However, conventional imaging systems come with sophisticated lens designs and heavy optical components. This disqualifies their usage in many mobile and wearable applications where thin and portable, minimalist imaging systems are desired. In this paper, we propose a Panoramic Computational Imaging Engine (PCIE) to achieve minimalist and high-quality panoramic imaging. With less than three spherical lenses, a Minimalist Panoramic Imaging Prototype (MPIP) is constructed based on the design of the Panoramic Annular Lens (PAL), but with low-quality imaging results due to aberrations and small image plane size. We propose two pipelines, i.e. Aberration Correction (AC) and Super-Resolution and Aberration Correction (SR&AC), to solve the image quality problems of MPIP, with imaging sensors of small and large pixel size, respectively. To leverage the prior information of the optical system, we propose a Point Spread Function (PSF) representation method to produce a PSF map as an additional modality. A PSF-aware Aberration-image Recovery Transformer (PART) is designed as a universal network for the two pipelines, in which the self-attention calculation and feature extraction are guided by the PSF map. We train PART on synthetic image pairs from simulation and put forward the PALHQ dataset to fill the gap of real-world high-quality PAL images for low-level vision. A comprehensive variety of experiments on synthetic and real-world benchmarks demonstrates the impressive imaging results of PCIE and the effectiveness of the PSF representation. We further deliver heuristic experimental findings for minimalist and high-quality panoramic imaging. Our dataset and code will be available at https://github.com/zju-jiangqi/PCIE-PART.
△ Less
Submitted 4 July, 2024; v1 submitted 22 June, 2023;
originally announced June 2023.
-
A Model Fusion Distributed Kalman Filter For Non-Gaussian Observation Noise
Authors:
Xuemei Mao,
Gang Wang,
Bei Peng,
Jiacheng He,
Kun Zhang,
Song Gao
Abstract:
The distributed Kalman filter (DKF) has attracted extensive research as an information fusion method for wireless sensor systems(WSNs). And the DKF in non-Gaussian environments is still a pressing problem. In this paper, we approximate the non-Gaussian noise as a Gaussian mixture model and estimate the parameters through the expectation-maximization algorithm. A DKF, called model fusion DKF (MFDKF…
▽ More
The distributed Kalman filter (DKF) has attracted extensive research as an information fusion method for wireless sensor systems(WSNs). And the DKF in non-Gaussian environments is still a pressing problem. In this paper, we approximate the non-Gaussian noise as a Gaussian mixture model and estimate the parameters through the expectation-maximization algorithm. A DKF, called model fusion DKF (MFDKF) is proposed against the non-Gaussain noise. Specifically, the proposed MFDKF is obtained by fusing the sub-models that are built based on the noise approximation with the help of interacting multiple model (IMM). Considering that some WSNs demand high consensus or have restricted communication, consensus MFDKF (C-MFDKF) and simplified MFDKF (S-MFDKF) are proposed based on consensus theory, respectively. The convergence of MFDKF and its derivative algorithms are analyzed. A series of simulations indicate the effectiveness of the MFDKF and its derivative algorithms.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Recyclable Semi-supervised Method Based on Multi-model Ensemble for Video Scene Parsing
Authors:
Biao Wu,
Shaoli Liu,
Diankai Zhang,
Chengjian Zheng,
Si Gao,
Xiaofeng Zhang,
Ning Wang
Abstract:
Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Since the real-world is actually video-based rather than a static state, learning to perform video semantic segmentation is more reasonable and practical for realistic applications. In this paper, we adopt Mask2Former…
▽ More
Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Since the real-world is actually video-based rather than a static state, learning to perform video semantic segmentation is more reasonable and practical for realistic applications. In this paper, we adopt Mask2Former as architecture and ViT-Adapter as backbone. Then, we propose a recyclable semi-supervised training method based on multi-model ensemble. Our method achieves the mIoU scores of 62.97% and 65.83% on Development test and final test respectively. Finally, we obtain the 2nd place in the Video Scene Parsing in the Wild Challenge at CVPR 2023.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring
Authors:
Kaiqi Fu,
Shaojun Gao,
Shuju Shi,
Xiaohai Tian,
Wei Li,
Zejun Ma
Abstract:
Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features. Deep neural networks are commonly trained to map fluency-related features into the human scores. However, the effectiveness of deep learning-based models is constrained by the limited amount of labeled training samples. To address this, we introduce a self-supervised learning (SSL) approach that take…
▽ More
Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features. Deep neural networks are commonly trained to map fluency-related features into the human scores. However, the effectiveness of deep learning-based models is constrained by the limited amount of labeled training samples. To address this, we introduce a self-supervised learning (SSL) approach that takes into account phonetic and prosody awareness for fluency scoring. Specifically, we first pre-train the model using a reconstruction loss function, by masking phones and their durations jointly on a large amount of unlabeled speech and text prompts. We then fine-tune the pre-trained model using human-annotated scoring data. Our experimental results, conducted on datasets such as Speechocean762 and our non-native datasets, show that our proposed method outperforms the baseline systems in terms of Pearson correlation coefficients (PCC). Moreover, we also conduct an ablation study to better understand the contribution of phonetic and prosody factors during the pre-training stage.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Omni-Line-of-Sight Imaging for Holistic Shape Reconstruction
Authors:
Binbin Huang,
Xingyue Peng,
Siyuan Shen,
Suan Xia,
Ruiqian Li,
Yanhua Yu,
Yuehan Wang,
Shenghua Gao,
Wenzheng Chen,
Shiying Li,
Jingyi Yu
Abstract:
We introduce Omni-LOS, a neural computational imaging method for conducting holistic shape reconstruction (HSR) of complex objects utilizing a Single-Photon Avalanche Diode (SPAD)-based time-of-flight sensor. As illustrated in Fig. 1, our method enables new capabilities to reconstruct near-$360^\circ$ surrounding geometry of an object from a single scan spot. In such a scenario, traditional line-o…
▽ More
We introduce Omni-LOS, a neural computational imaging method for conducting holistic shape reconstruction (HSR) of complex objects utilizing a Single-Photon Avalanche Diode (SPAD)-based time-of-flight sensor. As illustrated in Fig. 1, our method enables new capabilities to reconstruct near-$360^\circ$ surrounding geometry of an object from a single scan spot. In such a scenario, traditional line-of-sight (LOS) imaging methods only see the front part of the object and typically fail to recover the occluded back regions. Inspired by recent advances of non-line-of-sight (NLOS) imaging techniques which have demonstrated great power to reconstruct occluded objects, Omni-LOS marries LOS and NLOS together, leveraging their complementary advantages to jointly recover the holistic shape of the object from a single scan position. The core of our method is to put the object nearby diffuse walls and augment the LOS scan in the front view with the NLOS scans from the surrounding walls, which serve as virtual ``mirrors'' to trap lights toward the object. Instead of separately recovering the LOS and NLOS signals, we adopt an implicit neural network to represent the object, analogous to NeRF and NeTF. While transients are measured along straight rays in LOS but over the spherical wavefronts in NLOS, we derive differentiable ray propagation models to simultaneously model both types of transient measurements so that the NLOS reconstruction also takes into account the direct LOS measurements and vice versa. We further develop a proof-of-concept Omni-LOS hardware prototype for real-world validation. Comprehensive experiments on various wall settings demonstrate that Omni-LOS successfully resolves shape ambiguities caused by occlusions, achieves high-fidelity 3D scan quality, and manages to recover objects of various scales and complexity.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Patching Neural Barrier Functions Using Hamilton-Jacobi Reachability
Authors:
Sander Tonkens,
Alex Toofanian,
Zhizhen Qin,
Sicun Gao,
Sylvia Herbert
Abstract:
Learning-based control algorithms have led to major advances in robotics at the cost of decreased safety guarantees. Recently, neural networks have also been used to characterize safety through the use of barrier functions for complex nonlinear systems. Learned barrier functions approximately encode and enforce a desired safety constraint through a value function, but do not provide any formal gua…
▽ More
Learning-based control algorithms have led to major advances in robotics at the cost of decreased safety guarantees. Recently, neural networks have also been used to characterize safety through the use of barrier functions for complex nonlinear systems. Learned barrier functions approximately encode and enforce a desired safety constraint through a value function, but do not provide any formal guarantees. In this paper, we propose a local dynamic programming (DP) based approach to "patch" an almost-safe learned barrier at potentially unsafe points in the state space. This algorithm, HJ-Patch, obtains a novel barrier that provides formal safety guarantees, yet retains the global structure of the learned barrier. Our local DP based reachability algorithm, HJ-Patch, updates the barrier function "minimally" at points that both (a) neighbor the barrier safety boundary and (b) do not satisfy the safety condition. We view this as a key step to bridging the gap between learning-based barrier functions and Hamilton-Jacobi reachability analysis, providing a framework for further integration of these approaches. We demonstrate that for well-trained barriers we reduce the computational load by 2 orders of magnitude with respect to standard DP-based reachability, and demonstrate scalability to a 6-dimensional system, which is at the limit of standard DP-based reachability.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Efficient Gridless DoA Estimation Method of Non-uniform Linear Arrays with Applications in Automotive Radars
Authors:
Silin Gao,
Zhe Zhang,
Muhan Wang,
Yan Zhang,
Jie Zhao,
Bingchen Zhang,
Yue Wang,
Yirong Wu
Abstract:
This paper focuses on the gridless direction-of-arrival (DoA) estimation for data acquired by non-uniform linear arrays (NLAs) in automotive applications. Atomic norm minimization (ANM) is a promising gridless sparse recovery algorithm under the Toeplitz model and solved by convex relaxation, thus it is only applicable to uniform linear arrays (ULAs) with array manifolds having a Vandermonde struc…
▽ More
This paper focuses on the gridless direction-of-arrival (DoA) estimation for data acquired by non-uniform linear arrays (NLAs) in automotive applications. Atomic norm minimization (ANM) is a promising gridless sparse recovery algorithm under the Toeplitz model and solved by convex relaxation, thus it is only applicable to uniform linear arrays (ULAs) with array manifolds having a Vandermonde structure. In automotive applications, it is essential to apply the gridless DoA estimation to NLAs with arbitrary geometry with efficiency. In this paper, a fast ANM-based gridless DoA estimation algorithm for NLAs is proposed, which employs the array manifold separation technique and the accelerated proximal gradient (APG) technique, making it applicable to NLAs without losing of efficiency. Simulation and measurement experiments on automotive multiple-input multiple-output (MIMO) radars demonstrate the superiority of the proposed method.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
State Estimation of Wireless Sensor Networks in the Presence of Data Packet Drops and Non-Gaussian Noise
Authors:
Jiacheng He,
Gang Wang,
Xuemei Mao,
Song Gao,
Bei Peng
Abstract:
Distributed Kalman filter approaches based on the maximum correntropy criterion have recently demonstrated superior state estimation performance to that of conventional distributed Kalman filters for wireless sensor networks in the presence of non-Gaussian impulsive noise. However, these algorithms currently fail to take account of data packet drops. The present work addresses this issue by propos…
▽ More
Distributed Kalman filter approaches based on the maximum correntropy criterion have recently demonstrated superior state estimation performance to that of conventional distributed Kalman filters for wireless sensor networks in the presence of non-Gaussian impulsive noise. However, these algorithms currently fail to take account of data packet drops. The present work addresses this issue by proposing a distributed maximum correntropy Kalman filter that accounts for data packet drops (i.e., the DMCKF-DPD algorithm). The effectiveness and feasibility of the algorithm are verified by simulations conducted in a wireless sensor network with intermittent observations due to data packet drops under a non-Gaussian noise environment. Moreover, the computational complexity of the DMCKF-DPD algorithm is demonstrated to be moderate compared with that of a conventional distributed Kalman filter, and we provide a sufficient condition to ensure the convergence of the proposed algorithm.
△ Less
Submitted 3 September, 2023; v1 submitted 14 January, 2023;
originally announced January 2023.
-
VertMatch: A Semi-supervised Framework for Vertebral Structure Detection in 3D Ultrasound Volume
Authors:
Hongye Zeng,
kang Zhou,
Songhan Ge,
Yuchong Gao,
Jianhao Zhao,
Shenghua Gao,
Rui Zheng
Abstract:
Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but current assessment method only uses coronal projection image and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch, a two…
▽ More
Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but current assessment method only uses coronal projection image and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch, a two-step framework to detect vertebral structures in 3D ultrasound volume by utilizing unlabeled data in semi-supervised manner. The first step is to detect the possible positions of structures on transverse slice globally, and then the local patches are cropped based on detected positions. The second step is to distinguish whether the patches contain real vertebral structures and screen the predicted positions from the first step. VertMatch develops three novel components for semi-supervised learning: for position detection in the first step, (1) anatomical prior is used to screen pseudo labels generated from confidence threshold method; (2) multi-slice consistency is used to utilize more unlabeled data by inputting multiple adjacent slices; (3) for patch identification in the second step, the categories are rebalanced in each batch to solve imbalance problem. Experimental results demonstrate that VertMatch can detect vertebra accurately in ultrasound volume and outperforms state-of-the-art methods. VertMatch is also validated in clinical application on forty ultrasound scans, and it can be a promising approach for 3D assessment of scoliosis.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
ATASI-Net: An Efficient Sparse Reconstruction Network for Tomographic SAR Imaging with Adaptive Threshold
Authors:
Muhan Wang,
Zhe Zhang,
Xiaolan Qiu,
Silin Gao,
Yue Wang
Abstract:
Tomographic SAR technique has attracted remarkable interest for its ability of three-dimensional resolving along the elevation direction via a stack of SAR images collected from different cross-track angles. The emerged compressed sensing (CS)-based algorithms have been introduced into TomoSAR considering its super-resolution ability with limited samples. However, the conventional CS-based methods…
▽ More
Tomographic SAR technique has attracted remarkable interest for its ability of three-dimensional resolving along the elevation direction via a stack of SAR images collected from different cross-track angles. The emerged compressed sensing (CS)-based algorithms have been introduced into TomoSAR considering its super-resolution ability with limited samples. However, the conventional CS-based methods suffer from several drawbacks, including weak noise resistance, high computational complexity, and complex parameter fine-tuning. Aiming at efficient TomoSAR imaging, this paper proposes a novel efficient sparse unfolding network based on the analytic learned iterative shrinkage thresholding algorithm (ALISTA) architecture with adaptive threshold, named Adaptive Threshold ALISTA-based Sparse Imaging Network (ATASI-Net). The weight matrix in each layer of ATASI-Net is pre-computed as the solution of an off-line optimization problem, leaving only two scalar parameters to be learned from data, which significantly simplifies the training stage. In addition, adaptive threshold is introduced for each azimuth-range pixel, enabling the threshold shrinkage to be not only layer-varied but also element-wise. Moreover, the final learned thresholds can be visualized and combined with the SAR image semantics for mutual feedback. Finally, extensive experiments on simulated and real data are carried out to demonstrate the effectiveness and efficiency of the proposed method.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
Computational Imaging for Machine Perception: Transferring Semantic Segmentation beyond Aberrations
Authors:
Qi Jiang,
Hao Shi,
Shaohua Gao,
Jiaming Zhang,
Kailun Yang,
Lei Sun,
Huajian Ni,
Kaiwei Wang
Abstract:
Semantic scene understanding with Minimalist Optical Systems (MOS) in mobile and wearable applications remains a challenge due to the corrupted imaging quality induced by optical aberrations. However, previous works only focus on improving the subjective imaging quality through the Computational Imaging (CI) technique, ignoring the feasibility of advancing semantic segmentation. In this paper, we…
▽ More
Semantic scene understanding with Minimalist Optical Systems (MOS) in mobile and wearable applications remains a challenge due to the corrupted imaging quality induced by optical aberrations. However, previous works only focus on improving the subjective imaging quality through the Computational Imaging (CI) technique, ignoring the feasibility of advancing semantic segmentation. In this paper, we pioneer the investigation of Semantic Segmentation under Optical Aberrations (SSOA) with MOS. To benchmark SSOA, we construct Virtual Prototype Lens (VPL) groups through optical simulation, generating Cityscapes-ab and KITTI-360-ab datasets under different behaviors and levels of aberrations. We look into SSOA via an unsupervised domain adaptation perspective to address the scarcity of labeled aberration data in real-world scenarios. Further, we propose Computational Imaging Assisted Domain Adaptation (CIADA) to leverage prior knowledge of CI for robust performance in SSOA. Based on our benchmark, we conduct experiments on the robustness of classical segmenters against aberrations. In addition, extensive evaluations of possible solutions to SSOA reveal that CIADA achieves superior performance under all aberration distributions, bridging the gap between computational imaging and downstream applications for MOS. The project page is at https://github.com/zju-jiangqi/CIADA.
△ Less
Submitted 14 March, 2024; v1 submitted 21 November, 2022;
originally announced November 2022.
-
DGD-cGAN: A Dual Generator for Image Dewatering and Restoration
Authors:
Salma Gonzalez-Sabbagh,
Antonio Robles-Kelly,
Shang Gao
Abstract:
Underwater images are usually covered with a blue-greenish colour cast, making them distorted, blurry or low in contrast. This phenomenon occurs due to the light attenuation given by the scattering and absorption in the water column. In this paper, we present an image enhancement approach for dewatering which employs a conditional generative adversarial network (cGAN) with two generators. Our Dual…
▽ More
Underwater images are usually covered with a blue-greenish colour cast, making them distorted, blurry or low in contrast. This phenomenon occurs due to the light attenuation given by the scattering and absorption in the water column. In this paper, we present an image enhancement approach for dewatering which employs a conditional generative adversarial network (cGAN) with two generators. Our Dual Generator Dewatering cGAN (DGD-cGAN) removes the haze and colour cast induced by the water column and restores the true colours of underwater scenes whereby the effects of various attenuation and scattering phenomena that occur in underwater images are tackled by the two generators. The first generator takes at input the underwater image and predicts the dewatered scene, while the second generator learns the underwater image formation process by implementing a custom loss function based upon the transmission and the veiling light components of the image formation model. Our experiments show that DGD-cGAN consistently delivers a margin of improvement as compared with the state-of-the-art methods on several widely available datasets.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Cheng-Ming Chiang,
Hsien-Kai Kuo,
Yu-Syuan Xu,
Man-Yu Lee,
Allen Lu,
Chia-Ming Cheng,
Chih-Cheng Chen,
Jia-Ying Yong,
Hong-Han Shuai,
Wen-Huang Cheng,
Zhuang Jia,
Tianyu Xu,
Yijian Zhang,
Long Bao,
Heng Sun,
Diankai Zhang,
Si Gao,
Shaoli Liu,
Biao Wu,
Xiaofeng Zhang,
Chengjian Zheng,
Kaidi Lu,
Ning Wang
, et al. (29 additional authors not shown)
Abstract:
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob…
▽ More
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Learning Control Admissibility Models with Graph Neural Networks for Multi-Agent Navigation
Authors:
Chenning Yu,
Hongzhan Yu,
Sicun Gao
Abstract:
Deep reinforcement learning in continuous domains focuses on learning control policies that map states to distributions over actions that ideally concentrate on the optimal choices in each step. In multi-agent navigation problems, the optimal actions depend heavily on the agents' density. Their interaction patterns grow exponentially with respect to such density, making it hard for learning-based…
▽ More
Deep reinforcement learning in continuous domains focuses on learning control policies that map states to distributions over actions that ideally concentrate on the optimal choices in each step. In multi-agent navigation problems, the optimal actions depend heavily on the agents' density. Their interaction patterns grow exponentially with respect to such density, making it hard for learning-based methods to generalize. We propose to switch the learning objectives from predicting the optimal actions to predicting sets of admissible actions, which we call control admissibility models (CAMs), such that they can be easily composed and used for online inference for an arbitrary number of agents. We design CAMs using graph neural networks and develop training methods that optimize the CAMs in the standard model-free setting, with the additional benefit of eliminating the need for reward engineering typically required to balance collision avoidance and goal-reaching requirements. We evaluate the proposed approach in multi-agent navigation environments. We show that the CAM models can be trained in environments with only a few agents and be easily composed for deployment in dense environments with hundreds of agents, achieving better performance than state-of-the-art methods.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Fixed-Point Centrality for Networks
Authors:
Shuang Gao
Abstract:
This paper proposes a family of network centralities called fixed-point centralities. This centrality family is defined via the fixed point of permutation equivariant mappings related to the underlying network. Such a centrality notion is immediately extended to define fixed-point centralities for infinite graphs characterized by graphons. Variation bounds of such centralities with respect to the…
▽ More
This paper proposes a family of network centralities called fixed-point centralities. This centrality family is defined via the fixed point of permutation equivariant mappings related to the underlying network. Such a centrality notion is immediately extended to define fixed-point centralities for infinite graphs characterized by graphons. Variation bounds of such centralities with respect to the variations of the underlying graphs and graphons under mild assumptions are established. Fixed-point centralities connect with a variety of different models on networks including graph neural networks, static and dynamic games on networks, and Markov decision processes.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Transmission Neural Networks: From Virus Spread Models to Neural Networks
Authors:
Shuang Gao,
Peter E. Caines
Abstract:
This work connects models for virus spread on networks with their equivalent neural network representations. Based on this connection, we propose a new neural network architecture, called Transmission Neural Networks (TransNNs) where activation functions are primarily associated with links and are allowed to have different activation levels. Furthermore, this connection leads to the discovery and…
▽ More
This work connects models for virus spread on networks with their equivalent neural network representations. Based on this connection, we propose a new neural network architecture, called Transmission Neural Networks (TransNNs) where activation functions are primarily associated with links and are allowed to have different activation levels. Furthermore, this connection leads to the discovery and the derivation of three new activation functions with tunable or trainable parameters. Moreover, we prove that TransNNs with a single hidden layer and a fixed non-zero bias term are universal function approximators. Finally, we present new fundamental derivations of continuous time epidemic network models based on TransNNs.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.