subscribe to arXiv mailings

UAV-Assisted Weather Radar Calibration: A Theoretical Model for Wind Influence on Metal Sphere Reflectivity

Authors: Jiabiao Zhao, Da Li, Jiayuan Cui, Houjun Sun, Jianjun Ma

Abstract: The calibration of weather radar for detecting meteorological phenomena has advanced rapidly, aiming to enhance accuracy. Utilizing an unmanned aerial vehicle (UAV) equipped with a suspended metal sphere introduces an efficient calibration method by allowing dynamic adjustment of the UAV's position, effectively acting as a mobile calibration platform. However, external factors such as wind can int… ▽ More The calibration of weather radar for detecting meteorological phenomena has advanced rapidly, aiming to enhance accuracy. Utilizing an unmanned aerial vehicle (UAV) equipped with a suspended metal sphere introduces an efficient calibration method by allowing dynamic adjustment of the UAV's position, effectively acting as a mobile calibration platform. However, external factors such as wind can introduce bias in reflectivity measurements by causing the sphere to deviate from its intended position. This study develops a theoretical model to assess the impact of the metal sphere's one-dimensional oscillation on reflectivity. The findings offer valuable insights for UAV based radar calibration efforts. △ Less

Submitted 20 June, 2024; originally announced July 2024.

Comments: to be published in the 2024 International Conference on Microwave and Millimeter Wave Technology

arXiv:2407.04675 [pdf, other]

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance. △ Less

Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.03374 [pdf]

An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges

Authors: Laifa Tao, Shangyu Li, Haifei Liu, Qixuan Huang, Liang Ma, Guoao Ning, Yiling Chen, Yunlong Wu, Bin Li, Weiwei Zhang, Zhengduo Zhao, Wenchao Zhan, Wenyan Cao, Chao Wang, Hongmei Liu, Jian Ma, Mingliang Suo, Yujie Cheng, Yu Ding, Dengwei Song, Chen Lu

Abstract: Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Larg… ▽ More Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Large Model, heralds a technological revolution with the potential to fundamentally reshape traditional technological fields and human production methods. Its capabilities, including strong generalization, reasoning, and generative attributes, present opportunities to address PHM's bottlenecks. To this end, based on a systematic analysis of the current challenges and bottlenecks in PHM, as well as the research status and advantages of Large Model, we propose a novel concept and three progressive paradigms of Prognosis and Health Management Large Model (PHM-LM) through the integration of the Large Model with PHM. Subsequently, we provide feasible technical approaches for PHM-LM to bolster PHM's core capabilities within the framework of the three paradigms. Moreover, to address core issues confronting PHM, we discuss a series of technical challenges of PHM-LM throughout the entire process of construction and application. This comprehensive effort offers a holistic PHM-LM technical framework, and provides avenues for new PHM technologies, methodologies, tools, platforms and applications, which also potentially innovates design, research & development, verification and application mode of PHM. And furthermore, a new generation of PHM with AI will also capably be realized, i.e., from custom to generalized, from discriminative to generative, and from theoretical conditions to practical applications. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.02264 [pdf, other]

SOAF: Scene Occlusion-aware Neural Acoustic Field

Authors: Huiyu Gao, Jiahao Ma, David Ahmedt-Aristizabal, Chuong Nguyen, Miaomiao Liu

Abstract: This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach… ▽ More This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation. Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling and then transforms it based on scene transmittance learned from the input video. We extract features from the local acoustic field centred around the receiver using a Fibonacci Sphere to generate binaural audio for novel views with a direction-aware attention mechanism. Extensive experiments on the real dataset RWAVS and the synthetic dataset SoundSpaces demonstrate that our method outperforms previous state-of-the-art techniques in audio generation. Project page: https://github.com/huiyu-gao/SOAF/. △ Less

Submitted 2 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.11519 [pdf, other]

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Authors: Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA, a vision transformer-based foundation model for HSI interpretation, scalable to over a billion parameters. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

arXiv:2406.10724 [pdf, other]

Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft

Authors: Ian Vyse, Rishit Dagli, Dav Vrat Chadha, John P. Ma, Hector Chen, Isha Ruparelia, Prithvi Seran, Matthew Xie, Eesa Aamer, Aidan Armstrong, Naveen Black, Ben Borstein, Kevin Caldwell, Orrin Dahanaggamaarachchi, Joe Dai, Abeer Fatima, Stephanie Lu, Maxime Michet, Anoushka Paul, Carrie Ann Po, Shivesh Prakash, Noa Prosser, Riddhiman Roy, Mirai Shinjo, Iliya Shofman , et al. (4 additional authors not shown)

Abstract: Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and… ▽ More Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and spatial information, it is prone to various types of noise, including random noise, stripe noise, and dead pixels. Effective denoising of these images is crucial for downstream scientific tasks. Traditional methods, including hand-crafted techniques encoding strong priors, learned 2D image denoising methods applied across different hyperspectral bands, or diffusion generative models applied independently on bands, often struggle with varying noise strengths across spectral bands, leading to significant spectral distortion. This paper presents a novel approach to hyperspectral image denoising using latent diffusion models that integrate spatial and spectral information. We particularly do so by building a 3D diffusion model and presenting a 3-stage training approach on real and synthetically crafted datasets. The proposed method preserves image structure while reducing noise. Evaluations on both popular hyperspectral denoising datasets and synthetically crafted datasets for the FINCH mission demonstrate the effectiveness of this approach. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: To appear in 38th Annual Small Satellite Conference

arXiv:2406.01138 [pdf, ps, other]

Precise Analysis of Covariance Identifiability for Activity Detection in Grant-Free Random Access

Authors: Shengsong Luo, Junjie Ma, Chongbin Xu, Xin Wang

Abstract: We consider the identifiability issue of maximum likelihood based activity detection in massive MIMO based grant-free random access. A prior work by Chen et al. indicates that the identifiability undergoes a phase transition for commonly-used random signatures. In this paper, we provide an analytical characterization of the boundary of the phase transition curve. Our theoretical results agree well… ▽ More We consider the identifiability issue of maximum likelihood based activity detection in massive MIMO based grant-free random access. A prior work by Chen et al. indicates that the identifiability undergoes a phase transition for commonly-used random signatures. In this paper, we provide an analytical characterization of the boundary of the phase transition curve. Our theoretical results agree well with the numerical experiments. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.19338 [pdf, other]

Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images

Authors: Yuzhen Ding, Jason M. Holmes, Hongying Feng, Baoxin Li, Lisa A. McGee, Jean-Claude M. Rwigema, Sujay A. Vora, Daniel J. Ma, Robert L. Foote, Samir H. Patel, Wei Liu

Abstract: In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imag… ▽ More In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imaging dose, thus unfavorable for pediatric patients. A solution to this dilemma is to reconstruct 3D CT from kV images obtained at the treatment position. Here, we propose a dual-models framework built with hierarchical ViT blocks. Unlike a proof-of-concept approach, our framework considers kV images as the solo input and can synthesize accurate, full-size 3D CT in real time(within milliseconds). We demonstrate the feasibility of the proposed approach on 10 patients with head and neck (H&N) cancer using image quality(MAE: <45HU), dosimetrical accuracy(Gamma passing rate (2%/2mm/10%)>97%) and patient position uncertainty(shift error: <0.4mm). The proposed framework can generate accurate 3D CT faithfully mirroring real-time patient position, thus significantly improving patient setup accuracy, keeping imaging dose minimum, and maintaining treatment veracity. △ Less

Submitted 1 April, 2024; originally announced May 2024.

Comments: 17 pages, 8 figures and tables

arXiv:2405.18435 [pdf, other]

QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks. △ Less

Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

Comments: initial technical report

arXiv:2405.18255 [pdf, other]

Channel Reciprocity Based Attack Detection for Securing UWB Ranging by Autoencoder

Authors: Wenlong Gou, Chuanhang Yu, Juntao Ma, Gang Wu, Vladimir Mordachev

Abstract: A variety of ranging threats represented by Ghost Peak attack have raised concerns regarding the security performance of Ultra-Wide Band (UWB) systems with the finalization of the IEEE 802.15.4z standard. Based on channel reciprocity, this paper proposes a low complexity attack detection scheme that compares Channel Impulse Response (CIR) features of both ranging sides utilizing an autoencoder wit… ▽ More A variety of ranging threats represented by Ghost Peak attack have raised concerns regarding the security performance of Ultra-Wide Band (UWB) systems with the finalization of the IEEE 802.15.4z standard. Based on channel reciprocity, this paper proposes a low complexity attack detection scheme that compares Channel Impulse Response (CIR) features of both ranging sides utilizing an autoencoder with the capability of data compression and feature extraction. Taking Ghost Peak attack as an example, this paper demonstrates the effectiveness, feasibility and generalizability of the proposed attack detection scheme through simulation and experimental validation. The proposed scheme achieves an attack detection success rate of over 99% and can be implemented in current systems at low cost. △ Less

Submitted 10 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

ACM Class: H.1.1

arXiv:2405.17702 [pdf]

A Two-sided Model for EV Market Dynamics and Policy Implications

Authors: Haoxuan Ma, Brian Yueshuai He, Tomas Kaljevic, Jiaqi Ma

Abstract: The diffusion of Electric Vehicles (EVs) plays a pivotal role in mitigating greenhouse gas emissions, particularly in the U.S., where ambitious zero-emission and carbon neutrality objectives have been set. In pursuit of these goals, many states have implemented a range of incentive policies aimed at stimulating EV adoption and charging infrastructure development, especially public EV charging stat… ▽ More The diffusion of Electric Vehicles (EVs) plays a pivotal role in mitigating greenhouse gas emissions, particularly in the U.S., where ambitious zero-emission and carbon neutrality objectives have been set. In pursuit of these goals, many states have implemented a range of incentive policies aimed at stimulating EV adoption and charging infrastructure development, especially public EV charging stations (EVCS). This study examines the indirect network effect observed between EV adoption and EVCS deployment within urban landscapes. We developed a two-sided log-log regression model with historical data on EV purchases and EVCS development to quantify this effect. To test the robustness, we then conducted a case study of the EV market in Los Angeles (LA) County, which suggests that a 1% increase in EVCS correlates with a 0.35% increase in EV sales. Additionally, we forecasted the future EV market dynamics in LA County, revealing a notable disparity between current policies and the targeted 80% EV market share for private cars by 2045. To bridge this gap, we proposed a combined policy recommendation that enhances EV incentives by 60% and EVCS rebates by 66%, facilitating the achievement of future EV market objectives. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Conference preprint, 8 pages, 3 figures

arXiv:2405.12408 [pdf, other]

Flexible Active Safety Motion Control for Robotic Obstacle Avoidance: A CBF-Guided MPC Approach

Authors: Jinhao Liu, Jun Yang, Jianliang Mao, Tianqi Zhu, Qihang Xie, Yimeng Li, Xiangyu Wang, Shihua Li

Abstract: A flexible active safety motion (FASM) control approach is proposed for the avoidance of dynamic obstacles and the reference tracking in robot manipulators. The distinctive feature of the proposed method lies in its utilization of control barrier functions (CBF) to design flexible CBF-guided safety criteria (CBFSC) with dynamically optimized decay rates, thereby offering flexibility and active saf… ▽ More A flexible active safety motion (FASM) control approach is proposed for the avoidance of dynamic obstacles and the reference tracking in robot manipulators. The distinctive feature of the proposed method lies in its utilization of control barrier functions (CBF) to design flexible CBF-guided safety criteria (CBFSC) with dynamically optimized decay rates, thereby offering flexibility and active safety for robot manipulators in dynamic environments. First, discrete-time CBFs are employed to formulate the novel flexible CBFSC with dynamic decay rates for robot manipulators. Following that, the model predictive control (MPC) philosophy is applied, integrating flexible CBFSC as safety constraints into the receding-horizon optimization problem. Significantly, the decay rates of the designed CBFSC are incorporated as decision variables in the optimization problem, facilitating the dynamic enhancement of flexibility during the obstacle avoidance process. In particular, a novel cost function that integrates a penalty term is designed to dynamically adjust the safety margins of the CBFSC. Finally, experiments are conducted in various scenarios using a Universal Robots 5 (UR5) manipulator to validate the effectiveness of the proposed approach. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 11 pages, 11 figures

arXiv:2405.00316 [pdf, other]

Enhance Planning with Physics-informed Safety Controller for End-to-end Autonomous Driving

Authors: Hang Zhou, Haichao Liu, Hongliang Lu, Dan Xu, Jun Ma, Yiding Ji

Abstract: Recent years have seen a growing research interest in applications of Deep Neural Networks (DNN) on autonomous vehicle technology. The trend started with perception and prediction a few years ago and it is gradually being applied to motion planning tasks. Despite the performance of networks improve over time, DNN planners inherit the natural drawbacks of Deep Learning. Learning-based planners have… ▽ More Recent years have seen a growing research interest in applications of Deep Neural Networks (DNN) on autonomous vehicle technology. The trend started with perception and prediction a few years ago and it is gradually being applied to motion planning tasks. Despite the performance of networks improve over time, DNN planners inherit the natural drawbacks of Deep Learning. Learning-based planners have limitations in achieving perfect accuracy on the training dataset and network performance can be affected by out-of-distribution problem. In this paper, we propose FusionAssurance, a novel trajectory-based end-to-end driving fusion framework which combines physics-informed control for safety assurance. By incorporating Potential Field into Model Predictive Control, FusionAssurance is capable of navigating through scenarios that are not included in the training dataset and scenarios where neural network fail to generalize. The effectiveness of the approach is demonstrated by extensive experiments under various scenarios on the CARLA benchmark. △ Less

Submitted 5 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.04879 [pdf, other]

Multi-Type Map Construction via Semantics-Aware Autonomous Exploration in Unknown Indoor Environments

Authors: Jianfang Mao, Yuheng Xie, Si Chen, Zhixiong Nan, Xiao Wang

Abstract: This paper proposes a novel semantics-aware autonomous exploration model to handle the long-standing issue: the mainstream RRT (Rapid-exploration Random Tree) based exploration models usually make the mobile robot switch frequently between different regions, leading to the excessively-repeated explorations for the same region. Our proposed semantics-aware model encourages a mobile robot to fully e… ▽ More This paper proposes a novel semantics-aware autonomous exploration model to handle the long-standing issue: the mainstream RRT (Rapid-exploration Random Tree) based exploration models usually make the mobile robot switch frequently between different regions, leading to the excessively-repeated explorations for the same region. Our proposed semantics-aware model encourages a mobile robot to fully explore the current region before moving to the next region, which is able to avoid excessively-repeated explorations and make the exploration faster. The core idea of semantics-aware autonomous exploration model is optimizing the sampling point selection mechanism and frontier point evaluation function by considering the semantic information of regions. In addition, compared with existing autonomous exploration methods that usually construct the single-type or 2-3 types of maps, our model allows to construct four kinds of maps including point cloud map, occupancy grid map, topological map, and semantic map. To test the performance of our model, we conducted experiments in three simulated environments. The experiment results demonstrate that compared to Improved RRT, our model achieved 33.0% exploration time reduction and 39.3% exploration trajectory length reduction when maintaining >98% exploration rate. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.02663 [pdf]

Ground-to-UAV sub-Terahertz channel measurement and modeling

Authors: Da Li, Peian Li, Jiabiao Zhao, Jianjian Liang, Jiacheng Liu, Guohao Liu, Yuanshuai Lei, Wenbo Liu, Jianqin Deng, Fuyong Liu, Jianjun Ma

Abstract: Unmanned Aerial Vehicle (UAV) assisted terahertz (THz) wireless communications have been expected to play a vital role in the next generation of wireless networks. UAVs can serve as either repeaters or data collectors within the communication link, thereby potentially augmenting the efficacy of communication systems. Despite their promise, the channel analysis and modeling specific to THz wireless… ▽ More Unmanned Aerial Vehicle (UAV) assisted terahertz (THz) wireless communications have been expected to play a vital role in the next generation of wireless networks. UAVs can serve as either repeaters or data collectors within the communication link, thereby potentially augmenting the efficacy of communication systems. Despite their promise, the channel analysis and modeling specific to THz wireless channels leveraging UAVs remain under explored. This work delves into a ground-to-UAV channel at 140 GHz, with a specific focus on the influence of UAV hovering behavior on channel performance. Employing experimental measurements through an unmodulated channel setup and a geometry-based stochastic model (GBSM) that integrates three-dimensional positional coordinates and beamwidth, this work evaluates the impact of UAV dynamic movements and antenna orientation on channel performance. Our findings highlight the minimal impact of UAV orientation adjustments on channel performance and underscore the diminishing necessity for precise alignment between UAVs and ground stations as beamwidth increases. △ Less

Submitted 28 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: Submitted to Optics Express

arXiv:2404.02661 [pdf]

Terahertz channel modeling based on surface sensing characteristics

Authors: Jiayuan Cui, Da Li, Jiabiao Zhao, Jiacheng Liu, Guohao Liu, Xiangkun He, Yue Su, Fei Song, Peian Li, Jianjun Ma

Abstract: The dielectric properties of environmental surfaces, including walls, floors and the ground, etc., play a crucial role in shaping the accuracy of terahertz (THz) channel modeling, thereby directly impacting the effectiveness of communication systems. Traditionally, acquiring these properties has relied on methods such as terahertz time-domain spectroscopy (THz-TDS) or vector network analyzers (VNA… ▽ More The dielectric properties of environmental surfaces, including walls, floors and the ground, etc., play a crucial role in shaping the accuracy of terahertz (THz) channel modeling, thereby directly impacting the effectiveness of communication systems. Traditionally, acquiring these properties has relied on methods such as terahertz time-domain spectroscopy (THz-TDS) or vector network analyzers (VNA), demanding rigorous sample preparation and entailing a significant expenditure of time. However, such measurements are not always feasible, particularly in novel and uncharacterized scenarios. In this work, we propose a new approach for channel modeling that leverages the inherent sensing capabilities of THz channels. By comparing the results obtained through channel sensing with that derived from THz-TDS measurements, we demonstrate the method's ability to yield dependable surface property information. The application of this approach in both a miniaturized cityscape scenario and an indoor environment has shown consistency with experimental measurements, thereby verifying its effectiveness in real-world settings. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Submitted to Nano Communication Networks

arXiv:2404.01654 [pdf, other]

AI WALKUP: A Computer-Vision Approach to Quantifying MDS-UPDRS in Parkinson's Disease

Authors: Xiang Xiang, Zihan Zhang, Jing Ma, Yao Deng

Abstract: Parkinson's Disease (PD) is the second most common neurodegenerative disorder. The existing assessment method for PD is usually the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS) to assess the severity of various types of motor symptoms and disease progression. However, manual assessment suffers from high subjectivity, lack of consistency, and high cost and low ef… ▽ More Parkinson's Disease (PD) is the second most common neurodegenerative disorder. The existing assessment method for PD is usually the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS) to assess the severity of various types of motor symptoms and disease progression. However, manual assessment suffers from high subjectivity, lack of consistency, and high cost and low efficiency of manual communication. We want to use a computer vision based solution to capture human pose images based on a camera, reconstruct and perform motion analysis using algorithms, and extract the features of the amount of motion through feature engineering. The proposed approach can be deployed on different smartphones, and the video recording and artificial intelligence analysis can be done quickly and easily through our APP. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Technical report for AI WALKUP, an APP winning 3rd Prize of 2022 HUST GS AI Innovation and Design Competition

arXiv:2403.19943 [pdf, other]

TDANet: A Novel Temporal Denoise Convolutional Neural Network With Attention for Fault Diagnosis

Authors: Zhongzhi Li, Rong Fan, Jingqi Tu, Jinyi Ma, Jianliang Ai, Yiqun Dong

Abstract: Fault diagnosis plays a crucial role in maintaining the operational integrity of mechanical systems, preventing significant losses due to unexpected failures. As intelligent manufacturing and data-driven approaches evolve, Deep Learning (DL) has emerged as a pivotal technique in fault diagnosis research, recognized for its ability to autonomously extract complex features. However, the practical ap… ▽ More Fault diagnosis plays a crucial role in maintaining the operational integrity of mechanical systems, preventing significant losses due to unexpected failures. As intelligent manufacturing and data-driven approaches evolve, Deep Learning (DL) has emerged as a pivotal technique in fault diagnosis research, recognized for its ability to autonomously extract complex features. However, the practical application of current fault diagnosis methods is challenged by the complexity of industrial environments. This paper proposed the Temporal Denoise Convolutional Neural Network With Attention (TDANet), designed to improve fault diagnosis performance in noise environments. This model transforms one-dimensional signals into two-dimensional tensors based on their periodic properties, employing multi-scale 2D convolution kernels to extract signal information both within and across periods. This method enables effective identification of signal characteristics that vary over multiple time scales. The TDANet incorporates a Temporal Variable Denoise (TVD) module with residual connections and a Multi-head Attention Fusion (MAF) module, enhancing the saliency of information within noisy data and maintaining effective fault diagnosis performance. Evaluation on two datasets, CWRU (single sensor) and Real aircraft sensor fault (multiple sensors), demonstrates that the TDANet model significantly outperforms existing deep learning approaches in terms of diagnostic accuracy under noisy environments. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.17615 [pdf, other]

Grad-CAMO: Learning Interpretable Single-Cell Morphological Profiles from 3D Cell Painting Images

Authors: Vivek Gopalakrishnan, Jingzhe Ma, Zhiyong Xie

Abstract: Despite their black-box nature, deep learning models are extensively used in image-based drug discovery to extract feature vectors from single cells in microscopy images. To better understand how these networks perform representation learning, we employ visual explainability techniques (e.g., Grad-CAM). Our analyses reveal several mechanisms by which supervised models cheat, exploiting biologicall… ▽ More Despite their black-box nature, deep learning models are extensively used in image-based drug discovery to extract feature vectors from single cells in microscopy images. To better understand how these networks perform representation learning, we employ visual explainability techniques (e.g., Grad-CAM). Our analyses reveal several mechanisms by which supervised models cheat, exploiting biologically irrelevant pixels when extracting morphological features from images, such as noise in the background. This raises doubts regarding the fidelity of learned single-cell representations and their relevance when investigating downstream biological questions. To address this misalignment between researcher expectations and machine behavior, we introduce Grad-CAMO, a novel single-cell interpretability score for supervised feature extractors. Grad-CAMO measures the proportion of a model's attention that is concentrated on the cell of interest versus the background. This metric can be assessed per-cell or averaged across a validation set, offering a tool to audit individual features vectors or guide the improved design of deep learning architectures. Importantly, Grad-CAMO seamlessly integrates into existing workflows, requiring no dataset or model modifications, and is compatible with both 2D and 3D Cell Painting data. Additional results are available at https://github.com/eigenvivek/Grad-CAMO. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.13225 [pdf, other]

Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation

Authors: Linshan Wu, Zhun Zhong, Jiayi Ma, Yunchao Wei, Hao Chen, Leyuan Fang, Shutao Li

Abstract: Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each… ▽ More Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each other in the feature space are more likely to share the same class, and those closer to the distribution centers tend to have higher confidence. Motivated by this, we propose to model the underlying label distributions and employ cross-label constraints to generate more accurate pseudo labels. In this paper, we develop a unified WSSS framework named Adaptive Gaussian Mixtures Model, which leverages a GMM to model the label distributions. Specifically, we calculate the feature distribution centers of pseudo-labeled pixels and build the GMM by measuring the distance between the centers and each pseudo-labeled pixel. Then, we introduce an Online Expectation-Maximization (OEM) algorithm and a novel maximization loss to optimize the GMM adaptively, aiming to learn more discriminative decision boundaries between different class-wise Gaussian mixtures. Based on the label distributions, we leverage the GMM to generate high-quality pseudo labels for more reliable supervision. Our framework is capable of solving different forms of weak labels: image-level labels, points, scribbles, blocks, and bounding-boxes. Extensive experiments on PASCAL, COCO, Cityscapes, and ADE20K datasets demonstrate that our framework can effectively provide more reliable supervision and outperform the state-of-the-art methods under all settings. Code will be available at https://github.com/Luffy03/AGMM-SASS. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.06474 [pdf, other]

Non-Intrusive Load Monitoring in Smart Grids: A Comprehensive Review

Authors: Yinyan Liu, Yi Wang, Jin Ma

Abstract: Non-Intrusive Load Monitoring (NILM) is pivotal in today's energy landscape, offering vital solutions for energy conservation and efficient management. Its growing importance in enhancing energy savings and understanding consumer behavior makes it a pivotal technology for addressing global energy challenges. This paper delivers an in-depth review of NILM, highlighting its critical role in smart ho… ▽ More Non-Intrusive Load Monitoring (NILM) is pivotal in today's energy landscape, offering vital solutions for energy conservation and efficient management. Its growing importance in enhancing energy savings and understanding consumer behavior makes it a pivotal technology for addressing global energy challenges. This paper delivers an in-depth review of NILM, highlighting its critical role in smart homes and smart grids. The significant contributions of this study are threefold: Firstly, it compiles a comprehensive global dataset table, providing a valuable tool for researchers and engineers to select appropriate datasets for their NILM studies. Secondly, it categorizes NILM approaches, simplifying the understanding of various algorithms by focusing on technologies, label data requirements, feature usage, and monitoring states. Lastly, by identifying gaps in current NILM research, this work sets a clear direction for future studies, discussing potential areas of innovation. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: a comprehensive summary with a dataset list

arXiv:2403.04245 [pdf, other]

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

Authors: Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee

Abstract: Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed to be sensitive to missing video frames, performing even worse than single-modality models. While applying the dropout technique to the video modality enhances robustness to missing frames, it simultaneously results in a performance loss when dealing with complete data input. In this paper, we investigate this contrasting p… ▽ More Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed to be sensitive to missing video frames, performing even worse than single-modality models. While applying the dropout technique to the video modality enhances robustness to missing frames, it simultaneously results in a performance loss when dealing with complete data input. In this paper, we investigate this contrasting phenomenon from the perspective of modality bias and reveal that an excessive modality bias on the audio caused by dropout is the underlying reason. Moreover, we present the Modality Bias Hypothesis (MBH) to systematically describe the relationship between modality bias and robustness against missing modality in multimodal systems. Building on these findings, we propose a novel Multimodal Distribution Approximation with Knowledge Distillation (MDA-KD) framework to reduce over-reliance on the audio modality and to maintain performance and robustness simultaneously. Finally, to address an entirely missing modality, we adopt adapters to dynamically switch decision strategies. The effectiveness of our proposed approach is evaluated and validated through a series of comprehensive experiments using the MISP2021 and MISP2022 datasets. Our code is available at https://github.com/dalision/ModalBiasAVSR △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: the paper is accepted by CVPR2024

arXiv:2402.17487 [pdf, other]

Bit Rate Matching Algorithm Optimization in JPEG-AI Verification Model

Authors: Panqi Jia, A. Burakhan Koyuncu, Jue Mao, Ze Cui, Yi Ma, Tiansheng Guo, Timofey Solovyev, Alexander Karabutov, Yin Zhao, Jing Wang, Elena Alshina, Andre Kaup

Abstract: The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties… ▽ More The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties evoked the attention of both scientific and industrial communities, resulting in the standardization activity JPEG-AI. The verification model for the standardization process of JPEG-AI is already in development and has surpassed the advanced VVC intra codec. To generate reconstructed images with the desired bits per pixel and assess the BD-rate performance of both the JPEG-AI verification model and VVC intra, bit rate matching is employed. However, the current state of the JPEG-AI verification model experiences significant slowdowns during bit rate matching, resulting in suboptimal performance due to an unsuitable model. The proposed methodology offers a gradual algorithmic optimization for matching bit rates, resulting in a fourfold acceleration and over 1% improvement in BD-rate at the base operation point. At the high operation point, the acceleration increases up to sixfold. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted at (IEEE) PCS 2024; 6 pages

arXiv:2402.17470 [pdf, other]

Bit Distribution Study and Implementation of Spatial Quality Map in the JPEG-AI Standardization

Authors: Panqi Jia, Jue Mao, Esin Koyuncu, A. Burakhan Koyuncu, Timofey Solovyev, Alexander Karabutov, Yin Zhao, Elena Alshina, Andre Kaup

Abstract: Currently, there is a high demand for neural network-based image compression codecs. These codecs employ non-linear transforms to create compact bit representations and facilitate faster coding speeds on devices compared to the hand-crafted transforms used in classical frameworks. The scientific and industrial communities are highly interested in these properties, leading to the standardization ef… ▽ More Currently, there is a high demand for neural network-based image compression codecs. These codecs employ non-linear transforms to create compact bit representations and facilitate faster coding speeds on devices compared to the hand-crafted transforms used in classical frameworks. The scientific and industrial communities are highly interested in these properties, leading to the standardization effort of JPEG-AI. The JPEG-AI verification model has been released and is currently under development for standardization. Utilizing neural networks, it can outperform the classic codec VVC intra by over 10% BD-rate operating at base operation point. Researchers attribute this success to the flexible bit distribution in the spatial domain, in contrast to VVC intra's anchor that is generated with a constant quality point. However, our study reveals that VVC intra displays a more adaptable bit distribution structure through the implementation of various block sizes. As a result of our observations, we have proposed a spatial bit allocation method to optimize the JPEG-AI verification model's bit distribution and enhance the visual quality. Furthermore, by applying the VVC bit distribution strategy, the objective performance of JPEG-AI verification mode can be further improved, resulting in a maximum gain of 0.45 dB in PSNR-Y. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 5 pages, 3 figures, 4 tables

arXiv:2402.09451 [pdf, other]

doi 10.1364/OL.400969

Effects of Transceiver Jitter on the Performance of Optical Scattering Communication Systems

Authors: Zanqiu Shen, Jianshe Ma, Serge B. Provost, Ping Su

Abstract: In ultraviolet communications, the transceiver jitter effects have been ignored in previous studies, which can result in non-negligible performance degradation especially in vibration states or in mobile scenes. To address this issue, we model the relationship between the received power and transceiver jitter by making use of a moment-based density function approximation method. Based on this rela… ▽ More In ultraviolet communications, the transceiver jitter effects have been ignored in previous studies, which can result in non-negligible performance degradation especially in vibration states or in mobile scenes. To address this issue, we model the relationship between the received power and transceiver jitter by making use of a moment-based density function approximation method. Based on this relationship, we incorporate the transceiver jitter effects in combination with Poisson distribution. The error rate results are obtained assuming on-off key modulation with optimal threshold based detection. We validate the error rate expressions by comparing the analytical results with Monte-Carlo simulation results. The results show that the transceiver jitter effects cause performance degradation especially in smaller transceiver elevation angles or in shorter distances, which are often adopted in short-range ultraviolet communications. The results also show that larger elevation angle cases have a better performance with respect to anti-jitter and may perform better compared to smaller elevation angle situations in the case of larger standard deviation of jitter. This work studies for the first time the transceiver jitter effects in ultraviolet communications and provides guidelines for experimental system design. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 5 pages, 2 figures, comments are welcome!

Journal ref: Optics Letters, 45(20), 5680-5683 (2020)

arXiv:2402.05373 [pdf, other]

Unleashing the Infinity Power of Geometry: A Novel Geometry-Aware Transformer (GOAT) for Whole Slide Histopathology Image Analysis

Authors: Mingxin Liu, Yunzan Liu, Pengbo Xu, Jiquan Ma

Abstract: The histopathology analysis is of great significance for the diagnosis and prognosis of cancers, however, it has great challenges due to the enormous heterogeneity of gigapixel whole slide images (WSIs) and the intricate representation of pathological features. However, recent methods have not adequately exploited geometrical representation in WSIs which is significant in disease diagnosis. Theref… ▽ More The histopathology analysis is of great significance for the diagnosis and prognosis of cancers, however, it has great challenges due to the enormous heterogeneity of gigapixel whole slide images (WSIs) and the intricate representation of pathological features. However, recent methods have not adequately exploited geometrical representation in WSIs which is significant in disease diagnosis. Therefore, we proposed a novel weakly-supervised framework, Geometry-Aware Transformer (GOAT), in which we urge the model to pay attention to the geometric characteristics within the tumor microenvironment which often serve as potent indicators. In addition, a context-aware attention mechanism is designed to extract and enhance the morphological features within WSIs. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 5 pages, 3 figures. Accepted by 21st IEEE International Symposium on Biomedical Imaging (ISBI 2024)

arXiv:2401.16714 [pdf]

A Point Cloud Enhancement Method for 4D mmWave Radar Imagery

Authors: Qingmian Wan, Hongli Peng, Xing Liao, Kuayue Liu, Junfa Mao

Abstract: A point cloud enhancement method for 4D mmWave radar imagery is proposed in this paper. Based on the patch antenna and MIMO array theories, the MIMO array with small redundancy and high SNR is designed to provide the probability of high angular resolution and detection rate. The antenna array is deployed using a ladder shape in vertical direction to decrease the redundancy and improve the resoluti… ▽ More A point cloud enhancement method for 4D mmWave radar imagery is proposed in this paper. Based on the patch antenna and MIMO array theories, the MIMO array with small redundancy and high SNR is designed to provide the probability of high angular resolution and detection rate. The antenna array is deployed using a ladder shape in vertical direction to decrease the redundancy and improve the resolution in horizontal direction with the constrains of physical factors. Considering the complicated environment of the real world with non-uniform distributed clutters, the dynamic detection method is used to solve the weak target sensing problem. The window size of CFAR detector is assumed variant to be determined using optimization method, making it adaptive to different environments especially when weak targets exist. The angular resolution increase using FT-based DOA method and the designed antenna array is described, which provides the basis of accurate detection and dense point cloud. To verify the performance of the proposed method, experiments of simulations and practical measurements are carried out, whose results show that the accuracy and the point cloud density are improved with comparison of the original manufacturer mmWave radar of TI AWR2243. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.09032 [pdf, other]

Improved Consensus ADMM for Cooperative Motion Planning of Large-Scale Connected Autonomous Vehicles with Limited Communication

Authors: Haichao Liu, Zhenmin Huang, Zicheng Zhu, Yulin Li, Shaojie Shen, Jun Ma

Abstract: This paper investigates a cooperative motion planning problem for large-scale connected autonomous vehicles (CAVs) under limited communications, which addresses the challenges of high communication and computing resource requirements. Our proposed methodology incorporates a parallel optimization algorithm with improved consensus ADMM considering a more realistic locally connected topology network,… ▽ More This paper investigates a cooperative motion planning problem for large-scale connected autonomous vehicles (CAVs) under limited communications, which addresses the challenges of high communication and computing resource requirements. Our proposed methodology incorporates a parallel optimization algorithm with improved consensus ADMM considering a more realistic locally connected topology network, and time complexity of O(N) is achieved by exploiting the sparsity in the dual update process. To further enhance the computational efficiency, we employ a lightweight evolution strategy for the dynamic connectivity graph of CAVs, and each sub-problem split from the consensus ADMM only requires managing a small group of CAVs. The proposed method implemented with the receding horizon scheme is validated thoroughly, and comparisons with existing numerical solvers and approaches demonstrate the efficiency of our proposed algorithm. Also, simulations on large-scale cooperative driving tasks involving 80 vehicles are performed in the high-fidelity CARLA simulator, which highlights the remarkable computational efficiency, scalability, and effectiveness of our proposed development. Demonstration videos are available at https://henryhcliu.github.io/icadmm_cmp_carla. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: 15 pages, 10 figures

arXiv:2401.08851 [pdf]

Using i-vectors for subject-independent cross-session EEG transfer learning

Authors: Jonathan Lasko, Jeff Ma, Mike Nicoletti, Jonathan Sussman-Fort, Sooyoung Jeong, William Hartmann

Abstract: Cognitive load classification is the task of automatically determining an individual's utilization of working memory resources during performance of a task based on physiologic measures such as electroencephalography (EEG). In this paper, we follow a cross-disciplinary approach, where tools and methodologies from speech processing are used to tackle this problem. The corpus we use was released pub… ▽ More Cognitive load classification is the task of automatically determining an individual's utilization of working memory resources during performance of a task based on physiologic measures such as electroencephalography (EEG). In this paper, we follow a cross-disciplinary approach, where tools and methodologies from speech processing are used to tackle this problem. The corpus we use was released publicly in 2021 as part of the first passive brain-computer interface competition on cross-session workload estimation. We present our approach which used i-vector-based neural network classifiers to accomplish inter-subject cross-session EEG transfer learning, achieving 18% relative improvement over equivalent subject-dependent models. We also report experiments showing how our subject-independent models perform competitively on held-out subjects and improve with additional subject data, suggesting that subject-dependent training is not required for effective cognitive load determination. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 11 pages

arXiv:2401.04968 [pdf, other]

A Universal Cooperative Decision-Making Framework for Connected Autonomous Vehicles with Generic Road Topologies

Authors: Zhenmin Huang, Shaojie Shen, Jun Ma

Abstract: Cooperative decision-making of Connected Autonomous Vehicles (CAVs) presents a longstanding challenge due to its inherent nonlinearity, non-convexity, and discrete characteristics, compounded by the diverse road topologies encountered in real-world traffic scenarios. The majority of current methodologies are only applicable to a single and specific scenario, predicated on scenario-specific assumpt… ▽ More Cooperative decision-making of Connected Autonomous Vehicles (CAVs) presents a longstanding challenge due to its inherent nonlinearity, non-convexity, and discrete characteristics, compounded by the diverse road topologies encountered in real-world traffic scenarios. The majority of current methodologies are only applicable to a single and specific scenario, predicated on scenario-specific assumptions. Consequently, their application in real-world environments is restricted by the innumerable nature of traffic scenarios. In this study, we propose a unified optimization approach that exhibits the potential to address cooperative decision-making problems related to traffic scenarios with generic road topologies. This development is grounded in the premise that the topologies of various traffic scenarios can be universally represented as Directed Acyclic Graphs (DAGs). Particularly, the reference paths and time profiles for all involved CAVs are determined in a fully cooperative manner, taking into account factors such as velocities, accelerations, conflict resolutions, and overall traffic efficiency. The cooperative decision-making of CAVs is approximated as a mixed-integer linear programming (MILP) problem building on the DAGs of road topologies. This favorably facilitates the use of standard numerical solvers and the global optimality can be attained through the optimization. Case studies corresponding to different multi-lane traffic scenarios featuring diverse topologies are scheduled as the test itineraries, and the efficacy of our proposed methodology is corroborated. △ Less

Submitted 10 January, 2024; originally announced January 2024.

arXiv:2401.04722 [pdf, other]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

Authors: Jun Ma, Feifei Li, Bo Wang

Abstract: Convolutional Neural Networks (CNNs) and Transformers have been the most popular architectures for biomedical image segmentation, but both of them have limited ability to handle long-range dependencies because of inherent locality or computational complexity. To address this challenge, we introduce U-Mamba, a general-purpose network for biomedical image segmentation. Inspired by the State Space Se… ▽ More Convolutional Neural Networks (CNNs) and Transformers have been the most popular architectures for biomedical image segmentation, but both of them have limited ability to handle long-range dependencies because of inherent locality or computational complexity. To address this challenge, we introduce U-Mamba, a general-purpose network for biomedical image segmentation. Inspired by the State Space Sequence Models (SSMs), a new family of deep sequence models known for their strong capability in handling long sequences, we design a hybrid CNN-SSM block that integrates the local feature extraction power of convolutional layers with the abilities of SSMs for capturing the long-range dependency. Moreover, U-Mamba enjoys a self-configuring mechanism, allowing it to automatically adapt to various datasets without manual intervention. We conduct extensive experiments on four diverse tasks, including the 3D abdominal organ segmentation in CT and MR images, instrument segmentation in endoscopy images, and cell segmentation in microscopy images. The results reveal that U-Mamba outperforms state-of-the-art CNN-based and Transformer-based segmentation networks across all tasks. This opens new avenues for efficient long-range dependency modeling in biomedical image analysis. The code, models, and data are publicly available at https://wanglab.ai/u-mamba.html. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.02673 [pdf, other]

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

Authors: Dongdi Zhao, Jianbo Ma, Lu Lu, Jinke Li, Xuan Ji, Lei Zhu, Fuming Fang, Ming Liu, Feijun Jiang

Abstract: Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen… ▽ More Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usually available especially in multi-modality scenario. Experiments on different microphone array geometry are conducted to evaluate the robustness against spacing variance of microphone array. Large in-house databases are used to evaluate the effectiveness of the proposed framework and the proposed method achieve 19.26\% improvement when compared with a strong baseline. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2312.16247 [pdf, other]

Toward Accurate and Temporally Consistent Video Restoration from Raw Data

Authors: Shi Guo, Jianqi Ma, Xi Yang, Zhengqiang Zhang, Lei Zhang

Abstract: Denoising and demosaicking are two fundamental steps in reconstructing a clean full-color video from raw data, while performing video denoising and demosaicking jointly, namely VJDD, could lead to better video restoration performance than performing them separately. In addition to restoration accuracy, another key challenge to VJDD lies in the temporal consistency of consecutive frames. This issue… ▽ More Denoising and demosaicking are two fundamental steps in reconstructing a clean full-color video from raw data, while performing video denoising and demosaicking jointly, namely VJDD, could lead to better video restoration performance than performing them separately. In addition to restoration accuracy, another key challenge to VJDD lies in the temporal consistency of consecutive frames. This issue exacerbates when perceptual regularization terms are introduced to enhance video perceptual quality. To address these challenges, we present a new VJDD framework by consistent and accurate latent space propagation, which leverages the estimation of previous frames as prior knowledge to ensure consistent recovery of the current frame. A data temporal consistency (DTC) loss and a relational perception consistency (RPC) loss are accordingly designed. Compared with the commonly used flow-based losses, the proposed losses can circumvent the error accumulation problem caused by inaccurate flow estimation and effectively handle intensity changes in videos, improving much the temporal consistency of output videos while preserving texture details. Extensive experiments demonstrate the leading VJDD performance of our method in term of restoration accuracy, perceptual quality and temporal consistency. Codes and dataset are available at \url{https://github.com/GuoShi28/VJDD}. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.15389 [pdf, other]

TJDR: A High-Quality Diabetic Retinopathy Pixel-Level Annotation Dataset

Authors: Jingxin Mao, Xiaoyu Ma, Yanlong Bi, Rongqing Zhang

Abstract: Diabetic retinopathy (DR), as a debilitating ocular complication, necessitates prompt intervention and treatment. Despite the effectiveness of artificial intelligence in aiding DR grading, the progression of research toward enhancing the interpretability of DR grading through precise lesion segmentation faces a severe hindrance due to the scarcity of pixel-level annotated DR datasets. To mitigate… ▽ More Diabetic retinopathy (DR), as a debilitating ocular complication, necessitates prompt intervention and treatment. Despite the effectiveness of artificial intelligence in aiding DR grading, the progression of research toward enhancing the interpretability of DR grading through precise lesion segmentation faces a severe hindrance due to the scarcity of pixel-level annotated DR datasets. To mitigate this, this paper presents and delineates TJDR, a high-quality DR pixel-level annotation dataset, which comprises 561 color fundus images sourced from the Tongji Hospital Affiliated to Tongji University. These images are captured using diverse fundus cameras including Topcon's TRC-50DX and Zeiss CLARUS 500, exhibit high resolution. For the sake of adhering strictly to principles of data privacy, the private information of images is meticulously removed while ensuring clarity in displaying anatomical structures such as the optic disc, retinal blood vessels, and macular fovea. The DR lesions are annotated using the Labelme tool, encompassing four prevalent DR lesions: Hard Exudates (EX), Hemorrhages (HE), Microaneurysms (MA), and Soft Exudates (SE), labeled respectively from 1 to 4, with 0 representing the background. Significantly, experienced ophthalmologists conduct the annotation work with rigorous quality assurance, culminating in the construction of this dataset. This dataset has been partitioned into training and testing sets and publicly released to contribute to advancements in the DR lesion segmentation research community. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.14610 [pdf, ps, other]

doi 10.1109/LWC.2021.3095066

LMMSE-based SIMO Receiver for Ultraviolet Scattering Communication with Nonlinear Conversion

Authors: Zanqiu Shen, Jianshe Ma, Ping Su

Abstract: Linear minimum mean square error (LMMSE) receivers are often applied in practical communication scenarios for single-input-multiple-output (SIMO) systems owing to their low computational complexity and competitive performance. However, their performance is only the best among all the linear receivers, as they minimize the bit mean square error (MSE) alone in linear space. To overcome this limitati… ▽ More Linear minimum mean square error (LMMSE) receivers are often applied in practical communication scenarios for single-input-multiple-output (SIMO) systems owing to their low computational complexity and competitive performance. However, their performance is only the best among all the linear receivers, as they minimize the bit mean square error (MSE) alone in linear space. To overcome this limitation, in this study, we propose an LMMSE receiver based on the measurements augmented by their nonlinear conversion for a photon-counting receiver, a photomultiplier tube, and an avalanche photodetector. The performance of the proposed LMMSE receiver is studied for different nonlinear conversions, numbers of receivers, and receiver types. The simulation results indicate that the Monte Carlo results are consistent with the analytical results and that the proposed LMMSE receiver outperforms the conventional one in terms of bit MSE and bit error rate. Accordingly, it can be concluded that to achieve a desired bit MSE, the proposed LMMSE-based nonlinear receiver not only reduces the need to increase the number of receivers but also reduces the bandwidth requirements. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: 5 pages, 3 figures, comments are welcome!

Journal ref: IEEE Wireless Communications Letters, 10(10), 2140-2144 (2021)

arXiv:2312.11868 [pdf, other]

Dynamic Loco-manipulation on HECTOR: Humanoid for Enhanced ConTrol and Open-source Research

Authors: Junheng Li, Junchao Ma, Omar Kolt, Manas Shah, Quan Nguyen

Abstract: Despite their remarkable advancement in locomotion and manipulation, humanoid robots remain challenged by a lack of synchronized loco-manipulation control, hindering their full dynamic potential. In this work, we introduce a versatile and effective approach to controlling and generalizing dynamic locomotion and loco-manipulation on humanoid robots via a Force-and-moment-based Model Predictive Cont… ▽ More Despite their remarkable advancement in locomotion and manipulation, humanoid robots remain challenged by a lack of synchronized loco-manipulation control, hindering their full dynamic potential. In this work, we introduce a versatile and effective approach to controlling and generalizing dynamic locomotion and loco-manipulation on humanoid robots via a Force-and-moment-based Model Predictive Control (MPC). Specifically, we proposed a simplified rigid body dynamics (SRBD) model to take into account both humanoid and object dynamics for humanoid loco-manipulation. This linear dynamics model allows us to directly solve for ground reaction forces and moments via an MPC problem to achieve highly dynamic real-time control. Our proposed framework is highly versatile and generalizable. We introduce HECTOR (Humanoid for Enhanced ConTrol and Open-source Research) platform to demonstrate its effectiveness in hardware experiments. With the proposed framework, HECTOR can maintain exceptional balance during double-leg stance mode, even when subjected to external force disturbances to the body or foot location. In addition, it can execute 3-D dynamic walking on a variety of uneven terrains, including wet grassy surfaces, slopes, randomly placed wood slats, and stacked wood slats up to 6 cm high with the speed of 0.6 m/s. In addition, we have demonstrated dynamic humanoid loco-manipulation over uneven terrain, carrying 2.5 kg load. HECTOR simulations, along with the proposed control framework, are made available as an open-source project. (https://github.com/DRCL-USC/Hector_Simulation). △ Less

Submitted 21 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: 14 pages, 13 figures

arXiv:2311.16531 [pdf]

Channel Modeling for Terahertz Communications in Rain

Authors: Peian Li, Wenbo Liu, Jiacheng Liu, Da Li, Guohao Liu, Yuanshuai Lei, Jiabiao Zhao, Xiaopeng Wang, Houjun Sun, Jianjun Ma, John F. Federici

Abstract: Terahertz (THz) communication channels, integral to outdoor applications, are critically influenced by natural factors like rainfall. Our research focused on the nuanced effects of rain on these channels, employing an advanced rainfall emulation system. By analyzing key parameters such as rain rate, altitude based variations in rainfall, and diverse raindrop sizes, we identified the paramount sign… ▽ More Terahertz (THz) communication channels, integral to outdoor applications, are critically influenced by natural factors like rainfall. Our research focused on the nuanced effects of rain on these channels, employing an advanced rainfall emulation system. By analyzing key parameters such as rain rate, altitude based variations in rainfall, and diverse raindrop sizes, we identified the paramount significance of the number of raindrops in the THz channel, particularly in scenarios with constant rain rates but varying drop sizes. Central to our findings is a novel model grounded in Mie scattering theory, which adeptly incorporates the variability of raindrop size distributions at different altitudes. This model has displayed strong congruence with our experimental results. In essence, our study underscores the inadequacy of solely depending on a fixed ground-based rain rate and emphasizes the imperative of calibrating distribution metrics to cater to specific environmental and operational contexts. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: submitted to IEEE Transactions on Antennas and Propagation

arXiv:2311.04769 [pdf]

An attention-based deep learning network for predicting Platinum resistance in ovarian cancer

Authors: Haoming Zhuang, Beibei Li, Jingtong Ma, Patrice Monkam, Shouliang Qi, Wei Qian, Dianning He

Abstract: Background: Ovarian cancer is among the three most frequent gynecologic cancers globally. High-grade serous ovarian cancer (HGSOC) is the most common and aggressive histological type. Guided treatment for HGSOC typically involves platinum-based combination chemotherapy, necessitating an assessment of whether the patient is platinum-resistant. The purpose of this study is to propose a deep learning… ▽ More Background: Ovarian cancer is among the three most frequent gynecologic cancers globally. High-grade serous ovarian cancer (HGSOC) is the most common and aggressive histological type. Guided treatment for HGSOC typically involves platinum-based combination chemotherapy, necessitating an assessment of whether the patient is platinum-resistant. The purpose of this study is to propose a deep learning-based method to determine whether a patient is platinum-resistant using multimodal positron emission tomography/computed tomography (PET/CT) images. Methods: 289 patients with HGSOC were included in this study. An end-to-end SE-SPP-DenseNet model was built by adding Squeeze-Excitation Block (SE Block) and Spatial Pyramid Pooling Layer (SPPLayer) to Dense Convolutional Network (DenseNet). Multimodal data from PET/CT images of the regions of interest (ROI) were used to predict platinum resistance in patients. Results: Through five-fold cross-validation, SE-SPP-DenseNet achieved a high accuracy rate and an area under the curve (AUC) in predicting platinum resistance in patients, which were 92.6% and 0.93, respectively. The importance of incorporating SE Block and SPPLayer into the deep learning model, and considering multimodal data was substantiated by carrying out ablation studies and experiments with single modality data. Conclusions: The obtained classification results indicate that our proposed deep learning framework performs better in predicting platinum resistance in patients, which can help gynecologists make better treatment decisions. Keywords: PET/CT, CNN, SE Block, SPP Layer, Platinum resistance, Ovarian cancer △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.17974

FaultSeg Swin-UNETR: Transformer-Based Self-Supervised Pretraining Model for Fault Recognition

Authors: Zeren Zhang, Ran Chen, Jinwen Ma

Abstract: This paper introduces an approach to enhance seismic fault recognition through self-supervised pretraining. Seismic fault interpretation holds great significance in the fields of geophysics and geology. However, conventional methods for seismic fault recognition encounter various issues, including dependence on data quality and quantity, as well as susceptibility to interpreter subjectivity. Curre… ▽ More This paper introduces an approach to enhance seismic fault recognition through self-supervised pretraining. Seismic fault interpretation holds great significance in the fields of geophysics and geology. However, conventional methods for seismic fault recognition encounter various issues, including dependence on data quality and quantity, as well as susceptibility to interpreter subjectivity. Currently, automated fault recognition methods proposed based on small synthetic datasets experience performance degradation when applied to actual seismic data. To address these challenges, we have introduced the concept of self-supervised learning, utilizing a substantial amount of relatively easily obtainable unlabeled seismic data for pretraining. Specifically, we have employed the Swin Transformer model as the core network and employed the SimMIM pretraining task to capture unique features related to discontinuities in seismic data. During the fine-tuning phase, inspired by edge detection techniques, we have also refined the structure of the Swin-UNETR model, enabling multiscale decoding and fusion for more effective fault detection. Experimental results demonstrate that our proposed method attains state-of-the-art performance on the Thebe dataset, as measured by the OIS and ODS metrics. △ Less

Submitted 8 January, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: The logical flow and background of the article need significant revisions

arXiv:2310.12570 [pdf, other]

DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation

Authors: Guanqun Sun, Yizhi Pan, Weikun Kong, Zichang Xu, Jianhua Ma, Teeradaj Racharak, Le-Minh Nguyen, Junyi Xin

Abstract: Accurate medical image segmentation is critical for disease quantification and treatment evaluation. While traditional Unet architectures and their transformer-integrated variants excel in automated segmentation tasks. However, they lack the ability to harness the intrinsic position and channel features of image. Existing models also struggle with parameter efficiency and computational complexity,… ▽ More Accurate medical image segmentation is critical for disease quantification and treatment evaluation. While traditional Unet architectures and their transformer-integrated variants excel in automated segmentation tasks. However, they lack the ability to harness the intrinsic position and channel features of image. Existing models also struggle with parameter efficiency and computational complexity, often due to the extensive use of Transformers. To address these issues, this study proposes a novel deep medical image segmentation framework, called DA-TransUNet, aiming to integrate the Transformer and dual attention block(DA-Block) into the traditional U-shaped architecture. Unlike earlier transformer-based U-net models, DA-TransUNet utilizes Transformers and DA-Block to integrate not only global and local features, but also image-specific positional and channel features, improving the performance of medical image segmentation. By incorporating a DA-Block at the embedding layer and within each skip connection layer, we substantially enhance feature extraction capabilities and improve the efficiency of the encoder-decoder structure. DA-TransUNet demonstrates superior performance in medical image segmentation tasks, consistently outperforming state-of-the-art techniques across multiple datasets. In summary, DA-TransUNet offers a significant advancement in medical image segmentation, providing an effective and powerful alternative to existing techniques. Our architecture stands out for its ability to improve segmentation accuracy, thereby advancing the field of automated medical image diagnostics. The codes and parameters of our model will be publicly available at https://github.com/SUN-1024/DA-TransUnet. △ Less

Submitted 14 November, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.05547 [pdf, other]

Geometry-Aware Safety-Critical Local Reactive Controller for Robot Navigation in Unknown and Cluttered Environments

Authors: Yulin Li, Xindong Tang, Kai Chen, Chunxin Zheng, Haichao Liu, Jun Ma

Abstract: This work proposes a safety-critical local reactive controller that enables the robot to navigate in unknown and cluttered environments. In particular, the trajectory tracking task is formulated as a constrained polynomial optimization problem. Then, safety constraints are imposed on the control variables invoking the notion of polynomial positivity certificates in conjunction with their Sum-of-Sq… ▽ More This work proposes a safety-critical local reactive controller that enables the robot to navigate in unknown and cluttered environments. In particular, the trajectory tracking task is formulated as a constrained polynomial optimization problem. Then, safety constraints are imposed on the control variables invoking the notion of polynomial positivity certificates in conjunction with their Sum-of-Squares (SOS) approximation, thereby confining the robot motion inside the locally extracted convex free region. It is noteworthy that, in the process of devising the proposed safety constraints, the geometry of the robot can be approximated using any shape that can be characterized with a set of polynomial functions. The optimization problem is further convexified into a semidefinite program (SDP) leveraging truncated multi-sequences (tms) and moment relaxation, which favorably facilitates the effective use of off-the-shelf conic programming solvers, such that real-time performance is attainable. Various robot navigation tasks are investigated to demonstrate the effectiveness of the proposed approach in terms of safety and tracking performance. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.10758 [pdf, other]

RIS-Assisted Over-the-Air Adaptive Federated Learning with Noisy Downlink

Authors: Jiayu Mao, Aylin Yener

Abstract: Over-the-air federated learning (OTA-FL) exploits the inherent superposition property of wireless channels to integrate the communication and model aggregation. Though a naturally promising framework for wireless federated learning, it requires care to mitigate physical layer impairments. In this work, we consider a heterogeneous edge-intelligent network with different edge device resources and no… ▽ More Over-the-air federated learning (OTA-FL) exploits the inherent superposition property of wireless channels to integrate the communication and model aggregation. Though a naturally promising framework for wireless federated learning, it requires care to mitigate physical layer impairments. In this work, we consider a heterogeneous edge-intelligent network with different edge device resources and non-i.i.d. user dataset distributions, under a general non-convex learning objective. We leverage the Reconfigurable Intelligent Surface (RIS) technology to augment OTA-FL system over simultaneous time varying uplink and downlink noisy communication channels under imperfect CSI scenario. We propose a cross-layer algorithm that jointly optimizes RIS configuration, communication and computation resources in this general realistic setting. Specifically, we design dynamic local update steps in conjunction with RIS phase shifts and transmission power to boost learning performance. We present a convergence analysis of the proposed algorithm, and show that it outperforms the existing unified approach under heterogeneous system and imperfect CSI in numerical results. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Appeared in 2023 IEEE ICC Workshop on Edge Learning over 5G Mobile Networks and Beyond

arXiv:2309.09883 [pdf, other]

ROAR-Fed: RIS-Assisted Over-the-Air Adaptive Resource Allocation for Federated Learning

Authors: Jiayu Mao, Aylin Yener

Abstract: Over-the-air federated learning (OTA-FL) integrates communication and model aggregation by exploiting the innate superposition property of wireless channels. The approach renders bandwidth efficient learning, but requires care in handling the wireless physical layer impairments. In this paper, federated edge learning is considered for a network that is heterogeneous with respect to client (edge no… ▽ More Over-the-air federated learning (OTA-FL) integrates communication and model aggregation by exploiting the innate superposition property of wireless channels. The approach renders bandwidth efficient learning, but requires care in handling the wireless physical layer impairments. In this paper, federated edge learning is considered for a network that is heterogeneous with respect to client (edge node) data set distributions and individual client resources, under a general non-convex learning objective. We augment the wireless OTA-FL system with a Reconfigurable Intelligent Surface (RIS) to enable a propagation environment with improved learning performance in a realistic time varying physical layer. Our approach is a cross-layer perspective that jointly optimizes communication, computation and learning resources, in this general heterogeneous setting. We adapt the local computation steps and transmission power of the clients in conjunction with the RIS phase shifts. The resulting joint communication and learning algorithm, RIS-assisted Over-the-air Adaptive Resource Allocation for Federated learning (ROAR-Fed) is shown to be convergent in this general setting. Numerical results demonstrate the effectiveness of ROAR-Fed under heterogeneous (non i.i.d.) data and imperfect CSI, indicating the advantage of RIS assisted learning in this general set up. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: Appeared in 2023 IEEE International Conference on Communications (ICC): Wireless Communications Symposium

arXiv:2309.07925 [pdf, other]

doi 10.1145/3581783.3612859

Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023

Authors: Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng

Abstract: In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for e… ▽ More In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for emotion classification and valence regression in the decoding stage. A multi-task loss based on uncertainty is also designed to optimize the whole process. Finally, by combining three different structures on the posterior probability level, we obtain the final predictions of discrete and dimensional emotions. When tested on the dataset of multimodal emotion recognition challenge (MER 2023), the proposed framework yields consistent improvements in both emotion classification and valence regression. Our final system achieves state-of-the-art performance and ranks third on the leaderboard on MER-MULTI sub-challenge. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 5 pages, 4 figures

Journal ref: The 31st ACM International Conference on Multimedia (MM'23), 2023

arXiv:2309.07185 [pdf]

A Health Monitoring System Based on Flexible Triboelectric Sensors for Intelligence Medical Internet of Things and its Applications in Virtual Reality

Authors: Junqi Mao, Puen Zhou, Xiaoyao Wang, Hongbo Yao, Liuyang Liang, Yiqiao Zhao, Jiawei Zhang, Dayan Ban, Haiwu Zheng

Abstract: The Internet of Medical Things (IoMT) is a platform that combines Internet of Things (IoT) technology with medical applications, enabling the realization of precision medicine, intelligent healthcare, and telemedicine in the era of digitalization and intelligence. However, the IoMT faces various challenges, including sustainable power supply, human adaptability of sensors and the intelligence of s… ▽ More The Internet of Medical Things (IoMT) is a platform that combines Internet of Things (IoT) technology with medical applications, enabling the realization of precision medicine, intelligent healthcare, and telemedicine in the era of digitalization and intelligence. However, the IoMT faces various challenges, including sustainable power supply, human adaptability of sensors and the intelligence of sensors. In this study, we designed a robust and intelligent IoMT system through the synergistic integration of flexible wearable triboelectric sensors and deep learning-assisted data analytics. We embedded four triboelectric sensors into a wristband to detect and analyze limb movements in patients suffering from Parkinson's Disease (PD). By further integrating deep learning-assisted data analytics, we actualized an intelligent healthcare monitoring system for the surveillance and interaction of PD patients, which includes location/trajectory tracking, heart monitoring and identity recognition. This innovative approach enabled us to accurately capture and scrutinize the subtle movements and fine motor of PD patients, thus providing insightful feedback and comprehensive assessment of the patients conditions. This monitoring system is cost-effective, easily fabricated, highly sensitive, and intelligent, consequently underscores the immense potential of human body sensing technology in a Health 4.0 society. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.05298 [pdf, other]

Real-Time Parallel Trajectory Optimization with Spatiotemporal Safety Constraints for Autonomous Driving in Congested Traffic

Authors: Lei Zheng, Rui Yang, Zengqi Peng, Haichao Liu, Michael Yu Wang, Jun Ma

Abstract: Multi-modal behaviors exhibited by surrounding vehicles (SVs) can typically lead to traffic congestion and reduce the travel efficiency of autonomous vehicles (AVs) in dense traffic. This paper proposes a real-time parallel trajectory optimization method for the AV to achieve high travel efficiency in dynamic and congested environments. A spatiotemporal safety module is developed to facilitate the… ▽ More Multi-modal behaviors exhibited by surrounding vehicles (SVs) can typically lead to traffic congestion and reduce the travel efficiency of autonomous vehicles (AVs) in dense traffic. This paper proposes a real-time parallel trajectory optimization method for the AV to achieve high travel efficiency in dynamic and congested environments. A spatiotemporal safety module is developed to facilitate the safe interaction between the AV and SVs in the presence of trajectory prediction errors resulting from the multi-modal behaviors of the SVs. By leveraging multiple shooting and constraint transcription, we transform the trajectory optimization problem into a nonlinear programming problem, which allows for the use of optimization solvers and parallel computing techniques to generate multiple feasible trajectories in parallel. Subsequently, these spatiotemporal trajectories are fed into a multi-objective evaluation module considering both safety and efficiency objectives, such that the optimal feasible trajectory corresponding to the optimal target lane can be selected. The proposed framework is validated through simulations in a dense and congested driving scenario with multiple uncertain SVs. The results demonstrate that our method enables the AV to safely navigate through a dense and congested traffic scenario while achieving high travel efficiency and task accuracy in real time. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: 8 pages, 7 figures, accepted for publication in the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

arXiv:2309.02574 [pdf, other]

An Improved Upper Bound on the Rate-Distortion Function of Images

Authors: Zhihao Duan, Jack Ma, Jiangpeng He, Fengqing Zhu

Abstract: Recent work has shown that Variational Autoencoders (VAEs) can be used to upper-bound the information rate-distortion (R-D) function of images, i.e., the fundamental limit of lossy image compression. In this paper, we report an improved upper bound on the R-D function of images implemented by (1) introducing a new VAE model architecture, (2) applying variable-rate compression techniques, and (3) p… ▽ More Recent work has shown that Variational Autoencoders (VAEs) can be used to upper-bound the information rate-distortion (R-D) function of images, i.e., the fundamental limit of lossy image compression. In this paper, we report an improved upper bound on the R-D function of images implemented by (1) introducing a new VAE model architecture, (2) applying variable-rate compression techniques, and (3) proposing a novel \ourfunction{} to stabilize training. We demonstrate that at least 30\% BD-rate reduction w.r.t. the intra prediction mode in VVC codec is achievable, suggesting that there is still great potential for improving lossy image compression. Code is made publicly available at https://github.com/duanzhiihao/lossy-vae. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Conference paper at ICIP 2023. The first two authors share equal contributions

arXiv:2308.13789 [pdf]

Sensiverse: A dataset for ISAC study

Authors: Jiajin Luo, Baojian Zhou, Yang Yu, Ping Zhang, Xiaohui Peng, Jianglei Ma, Peiying Zhu, Jianmin Lu, Wen Tong

Abstract: In order to address the lack of applicable channel models for ISAC research and evaluation, we release Sensiverse, a dataset that can be used for ISAC research. In this paper, we present the method of generating Sensiverse, including the acquisition and formatting of the 3D scene models, the generation of the channel data and associations with Tx/Rx deployment. The file structure and usage of the… ▽ More In order to address the lack of applicable channel models for ISAC research and evaluation, we release Sensiverse, a dataset that can be used for ISAC research. In this paper, we present the method of generating Sensiverse, including the acquisition and formatting of the 3D scene models, the generation of the channel data and associations with Tx/Rx deployment. The file structure and usage of the dataset are also described, and finally the use of the dataset is illustrated with examples through the evaluation of use cases such as 3D environment reconstruction and moving targets. △ Less

Submitted 26 August, 2023; originally announced August 2023.

arXiv:2308.13164 [pdf, other]

Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model

Authors: Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, Jiayi Ma

Abstract: In this paper, we rethink the low-light image enhancement task and propose a physically explainable and generative diffusion model for low-light image enhancement, termed as Diff-Retinex. We aim to integrate the advantages of the physical model and the generative network. Furthermore, we hope to supplement and even deduce the information missing in the low-light image through the generative networ… ▽ More In this paper, we rethink the low-light image enhancement task and propose a physically explainable and generative diffusion model for low-light image enhancement, termed as Diff-Retinex. We aim to integrate the advantages of the physical model and the generative network. Furthermore, we hope to supplement and even deduce the information missing in the low-light image through the generative network. Therefore, Diff-Retinex formulates the low-light image enhancement problem into Retinex decomposition and conditional image generation. In the Retinex decomposition, we integrate the superiority of attention in Transformer and meticulously design a Retinex Transformer decomposition network (TDN) to decompose the image into illumination and reflectance maps. Then, we design multi-path generative diffusion networks to reconstruct the normal-light Retinex probability distribution and solve the various degradations in these components respectively, including dark illumination, noise, color deviation, loss of scene contents, etc. Owing to generative diffusion model, Diff-Retinex puts the restoration of low-light subtle detail into practice. Extensive experiments conducted on real-world low-light datasets qualitatively and quantitatively demonstrate the effectiveness, superiority, and generalization of the proposed method. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV 2023

arXiv:2308.11940 [pdf, other]

Audio Generation with Multiple Conditional Diffusion Model

Authors: Zhifang Guo, Jianguo Mao, Rui Tao, Long Yan, Kazushige Ouchi, Hong Liu, Xiangdong Wang

Abstract: Text-based audio generation models have limitations as they cannot encompass all the information in audio, leading to restricted controllability when relying solely on text. To address this issue, we propose a novel model that enhances the controllability of existing pre-trained text-to-audio models by incorporating additional conditions including content (timestamp) and style (pitch contour and e… ▽ More Text-based audio generation models have limitations as they cannot encompass all the information in audio, leading to restricted controllability when relying solely on text. To address this issue, we propose a novel model that enhances the controllability of existing pre-trained text-to-audio models by incorporating additional conditions including content (timestamp) and style (pitch contour and energy contour) as supplements to the text. This approach achieves fine-grained control over the temporal order, pitch, and energy of generated audio. To preserve the diversity of generation, we employ a trainable control condition encoder that is enhanced by a large language model and a trainable Fusion-Net to encode and fuse the additional conditions while keeping the weights of the pre-trained text-to-audio model frozen. Due to the lack of suitable datasets and evaluation metrics, we consolidate existing datasets into a new dataset comprising the audio and corresponding conditions and use a series of evaluation metrics to evaluate the controllability performance. Experimental results demonstrate that our model successfully achieves fine-grained control to accomplish controllable audio generation. Audio samples and our dataset are publicly available at https://conditionaudiogen.github.io/conditionaudiogen/ △ Less

Submitted 28 December, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted by AAAI 2024

Showing 1–50 of 205 results for author: Ma, J