subscribe to arXiv mailings

Low-Complexity SVM Signal Recovery in Bandwidth-Limited 100Gb/s PAM4 PON Upstream

Authors: Liyan Wu, Yanlu Huang, Kai Jin, Shangya Han, Kun Xu, Yanni Ou

Abstract: We proposed a low-complexity SVM-based signal recovery algorithm and evaluated it in 100G-PON with 25G-class devices. For the first time, it experimentally achieved 24 dB power budget @ FEC threshold 1E-3 over 40 km SMF, improving receiver sensitivity over 2 dB compared to FFE&DFE. We proposed a low-complexity SVM-based signal recovery algorithm and evaluated it in 100G-PON with 25G-class devices. For the first time, it experimentally achieved 24 dB power budget @ FEC threshold 1E-3 over 40 km SMF, improving receiver sensitivity over 2 dB compared to FFE&DFE. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.00896 [pdf, other]

Channel Modeling Aided Dataset Generation for AI-Enabled CSI Feedback: Advances, Challenges, and Solutions

Authors: Yupeng Li, Gang Li, Zirui Wen, Shuangfeng Han, Shijian Gao, Guangyi Liu, Jiangzhou Wang

Abstract: The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation metho… ▽ More The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation method based on a limited number of field channel data. Specifically, the user equipment (UE) extracts the primary stochastic parameters of the field channel data and transmits them to the base station (BS). The BS then updates the typical TR 38.901 model parameters with the extracted parameters. In this way, the updated channel model is used to generate the dataset. This strategy comprehensively considers the dataset collection, model generalization, model monitoring, and so on. Simulations verify that our proposed strategy can significantly improve performance compared to the benchmarks. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.19856 [pdf]

LUT-boosted CDR and Equalization for Burst-mode 50/100 Gbit/s Bandwidth-limited Flexible PON

Authors: Yanlu Huang, Liyan Wu, Shangya Han, Kai Jin, Kun Xu, Yanni Ou

Abstract: We proposed and experimentally demonstrated a look-up table boosted fast CDR and equalization scheme for the burst-mode 50/100 Gbps bandwidth-limited flexible PON, requiring no preamble for convergence and achieved the same bit error rate performance as in the case of long preambles. We proposed and experimentally demonstrated a look-up table boosted fast CDR and equalization scheme for the burst-mode 50/100 Gbps bandwidth-limited flexible PON, requiring no preamble for convergence and achieved the same bit error rate performance as in the case of long preambles. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19135 [pdf, other]

DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

Authors: Hyun Joon Park, Jin Sob Kim, Wooseok Shin, Sung Won Han

Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a general diffusion TTS framework, DEX-TTS includes encoders and adapters to handle styles extracted from reference speech. Key innovations contain the differentiation of styles into time-invariant and time-variant categories for effective style extraction, as well as the design of encoders and adapters with high generalization ability. In addition, we introduce overlapping patchify and convolution-frequency patch embedding strategies to improve DiT-based diffusion networks for TTS. DEX-TTS yields outstanding performance in terms of objective and subjective evaluation in English multi-speaker and emotional multi-speaker datasets, without relying on pre-training strategies. Lastly, the comparison results for the general TTS on a single-speaker dataset verify the effectiveness of our enhanced diffusion backbone. Demos are available here. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.03274 [pdf, other]

Enhancing CTC-based speech recognition with diverse modeling units

Authors: Shiyi Han, Zhihong Lei, Mingbin Xu, Xingyu Na, Zhen Huang

Abstract: In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer. On top of E2E systems, researchers have achieved substantial accuracy improvement by rescoring E2E model's N-best hypotheses with a phoneme-based model. This raises an interesting question about where the improvem… ▽ More In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer. On top of E2E systems, researchers have achieved substantial accuracy improvement by rescoring E2E model's N-best hypotheses with a phoneme-based model. This raises an interesting question about where the improvements come from other than the system combination effect. We examine the underlying mechanisms driving these gains and propose an efficient joint training approach, where E2E models are trained jointly with diverse modeling units. This methodology does not only align the strengths of both phoneme and grapheme-based models but also reveals that using these diverse modeling units in a synergistic way can significantly enhance model accuracy. Our findings offer new insights into the optimal integration of heterogeneous modeling units in the development of more robust and accurate ASR systems. △ Less

Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.14770 [pdf, other]

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography

Authors: Shuo Han, Yongshun Xu, Dayang Wang, Bahareh Morovati, Li Zhou, Jonathan S. Maltz, Ge Wang, Hengyong Yu

Abstract: Cardiac computed tomography (CT) has emerged as a major imaging modality for the diagnosis and monitoring of cardiovascular diseases. High temporal resolution is essential to ensure diagnostic accuracy. Limited-angle data acquisition can reduce scan time and improve temporal resolution, but typically leads to severe image degradation and motivates for improved reconstruction techniques. In this pa… ▽ More Cardiac computed tomography (CT) has emerged as a major imaging modality for the diagnosis and monitoring of cardiovascular diseases. High temporal resolution is essential to ensure diagnostic accuracy. Limited-angle data acquisition can reduce scan time and improve temporal resolution, but typically leads to severe image degradation and motivates for improved reconstruction techniques. In this paper, we propose a novel physics-informed score-based diffusion model (PSDM) for limited-angle reconstruction of cardiac CT. At the sampling time, we combine a data prior from a diffusion model and a model prior obtained via an iterative algorithm and Fourier fusion to further enhance the image quality. Specifically, our approach integrates the primal-dual hybrid gradient (PDHG) algorithm with score-based diffusion models, thereby enabling us to reconstruct high-quality cardiac CT images from limited-angle data. The numerical simulations and real data experiments confirm the effectiveness of our proposed approach. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 12 pages

arXiv:2404.08199 [pdf, other]

doi 10.1109/TCSII.2023.3266594

Cepstral Analysis Based Artifact Detection, Recognition and Removal for Prefrontal EEG

Authors: Siqi Han, Chao Zhang, Jiaxin Lei, Qingquan Han, Yuhui Du, Anhe Wang, Shuo Bai, Milin Zhang

Abstract: This paper proposes to use cepstrum for artifact detection, recognition and removal in prefrontal EEG. This work focuses on the artifact caused by eye movement. A database containing artifact-free EEG and eye movement contaminated EEG from different subjects is established. A cepstral analysis-based feature extraction with support vector machine (SVM) based classifier is designed to identify the a… ▽ More This paper proposes to use cepstrum for artifact detection, recognition and removal in prefrontal EEG. This work focuses on the artifact caused by eye movement. A database containing artifact-free EEG and eye movement contaminated EEG from different subjects is established. A cepstral analysis-based feature extraction with support vector machine (SVM) based classifier is designed to identify the artifacts from the target EEG signals. The proposed method achieves an accuracy of 99.62% on the artifact detection task and a 82.79% accuracy on the 6-category eye movement classification task. A statistical value-based artifact removal method is proposed and evaluated on a public EEG database, where an accuracy improvement of 3.46% is obtained on the 3-category emotion classification task. In order to make a confident decision of each 5s EEG segment, the algorithm requires only 0.66M multiplication operations. Compared to the state-of-the-art approaches in artifact detection and removal, the proposed method features higher detection accuracy and lower computational cost, which makes it a more suitable solution to be integrated into a real-time and artifact robust Brain-Machine Interface (BMI). △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 5 pages, 4 figures, published by TCAS-II

Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, 2023

arXiv:2403.18695 [pdf, other]

An Efficient Risk-aware Branch MPC for Automated Driving that is Robust to Uncertain Vehicle Behaviors

Authors: Luyao Zhang, George Pantazis, Shaohang Han, Sergio Grammatico

Abstract: One of the critical challenges in automated driving is ensuring safety of automated vehicles despite the unknown behavior of the other vehicles. Although motion prediction modules are able to generate a probability distribution associated with various behavior modes, their probabilistic estimates are often inaccurate, thus leading to a possibly unsafe trajectory. To overcome this challenge, we pro… ▽ More One of the critical challenges in automated driving is ensuring safety of automated vehicles despite the unknown behavior of the other vehicles. Although motion prediction modules are able to generate a probability distribution associated with various behavior modes, their probabilistic estimates are often inaccurate, thus leading to a possibly unsafe trajectory. To overcome this challenge, we propose a risk-aware motion planning framework that appropriately accounts for the ambiguity in the estimated probability distribution. We formulate the risk-aware motion planning problem as a min-max optimization problem and develop an efficient iterative method by incorporating a regularization term in the probability update step. Via extensive numerical studies, we validate the convergence of our method and demonstrate its advantages compared to the state-of-the-art approaches. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.05912 [pdf, other]

Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation

Authors: Hairong Shi, Songhao Han, Shaofei Huang, Yue Liao, Guanbin Li, Xiangxing Kong, Hua Zhu, Xiaomu Wang, Si Liu

Abstract: Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent st… ▽ More Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent studies have attempted to enhance SAM with medical expertise by pre-training on large-scale medical segmentation datasets. However, challenges still exist in 3D tumor lesion segmentation owing to tumor complexity and the imbalance in foreground and background regions. Therefore, we introduce Mask-Enhanced SAM (M-SAM), an innovative architecture tailored for 3D tumor lesion segmentation. We propose a novel Mask-Enhanced Adapter (MEA) within M-SAM that enriches the semantic information of medical images with positional data from coarse segmentation masks, facilitating the generation of more precise segmentation masks. Furthermore, an iterative refinement scheme is implemented in M-SAM to refine the segmentation masks progressively, leading to improved performance. Extensive experiments on seven tumor lesion segmentation datasets indicate that our M-SAM not only achieves high segmentation accuracy but also exhibits robust generalization. The code is available at https://github.com/nanase1025/M-SAM. △ Less

Submitted 11 July, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2402.17127 [pdf, other]

Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0

Authors: Taein Kang, Soyul Han, Sunmook Choi, Jaejin Seo, Sanghyeok Chung, Seungeun Lee, Seungsang Oh, Il-Youp Kwak

Abstract: Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, p… ▽ More Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, particularly those utilizing large pretrained wav2vec 2.0 as a featurization front-end, highlights the importance of refined feature encoders. In response, this research assessed the representational capability of wav2vec 2.0 as an audio feature extractor, modifying the size of its pretrained Transformer layers through two key adjustments: (1) selecting a subset of layers starting from the leftmost one and (2) fine-tuning a portion of the selected layers from the rightmost one. We complemented this analysis with five spoofing detection back-end models, with a primary focus on AASIST, enabling us to pinpoint the optimal configuration for the selection and fine-tuning process. In contrast to conventional handcrafted features, our investigation identified several spoofing detection systems that achieve state-of-the-art performance in the ASVspoof 2019 LA dataset. This comprehensive exploration offers valuable insights into feature selection strategies, advancing the field of spoofing detection. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 5 pages

MSC Class: 00A71 ACM Class: I.2.6

arXiv:2402.16581 [pdf, other]

Rate Splitting Multiple Access-Enabled Adaptive Panoramic Video Semantic Transmission

Authors: Haixiao Gao, Mengying Sun, Xiaodong Xu, Shujun Han, Bizhu Wang, Jingxuan Zhang, Ping Zhang

Abstract: In this paper, we propose an adaptive panoramic video semantic transmission (APVST) framework enabled by rate splitting multiple access (RSMA). The APVST framework consists of a semantic transmitter and receiver, utilizing a deep joint source-channel coding structure to adaptively extract and encode semantic features from panoramic frames. To achieve higher spectral efficiency and conserve bandwid… ▽ More In this paper, we propose an adaptive panoramic video semantic transmission (APVST) framework enabled by rate splitting multiple access (RSMA). The APVST framework consists of a semantic transmitter and receiver, utilizing a deep joint source-channel coding structure to adaptively extract and encode semantic features from panoramic frames. To achieve higher spectral efficiency and conserve bandwidth, APVST employs an entropy model and a dimension-adaptive module to control the transmission rate. Additionally, we take weighted-to-spherically-uniform peak signal-to-noise ratio (WS-PSNR) and weighted-to-spherically-uniform structural similarity (WS-SSIM) as distortion evaluation metrics for panoramic videos and design a weighted self-attention module for APVST. This module integrates weights and feature maps to enhance the quality of the immersive experience. Considering the overlap in the field of view when users watch panoramic videos, we further utilize RSMA to split the required panoramic video semantic streams into common and private messages for transmission. We propose an RSMA-enabled semantic stream transmission scheme and formulate a joint problem of latency and immersive experience quality by optimizing the allocation ratios of power, common rate, and channel bandwidth, aiming to maximize the quality of service (QoS) scores for users. To address the above problem, we propose a deep reinforcement learning algorithm based on proximal policy optimization (PPO) with high efficiency to handle dynamically changing environments. Simulation results demonstrate that our proposed APVST framework saves up to 20% and 50% of channel bandwidth compared to other semantic and traditional video transmission schemes, respectively. Moreover, our study confirms the efficiency of RSMA in panoramic video transmission, achieving performance gains of 13% and 20% compared to NOMA and OFDMA. △ Less

Submitted 23 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.15151 [pdf, other]

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

Authors: Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

Abstract: In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements. For example, homophenes, words that share identical lip movements but produce different sounds, can be distinguished by considering the context. In this paper, we propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM),… ▽ More In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements. For example, homophenes, words that share identical lip movements but produce different sounds, can be distinguished by considering the context. In this paper, we propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM), to maximize the context modeling ability by bringing the overwhelming power of LLMs. Specifically, VSP-LLM is designed to perform multi-tasks of visual speech recognition and translation, where the given instructions control the type of task. The input video is mapped to the input latent space of an LLM by employing a self-supervised visual speech model. Focused on the fact that there is redundant information in input frames, we propose a novel deduplication method that reduces the embedded visual features by employing visual speech units. Through the proposed deduplication and Low Rank Adaptation (LoRA), VSP-LLM can be trained in a computationally efficient manner. In the translation dataset, the MuAViC benchmark, we demonstrate that VSP-LLM trained on just 30 hours of labeled data can more effectively translate lip movements compared to the recent model trained with 433 hours of data. △ Less

Submitted 13 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: An Erratum was added on the last page of this paper

arXiv:2402.13776 [pdf, other]

Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

Authors: Lianghu Guo, Tianli Tao, Xinyi Cai, Zihao Zhu, Jiawei Huang, Lixuan Zhu, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han, Yan Liang, Qing Yang, Dinggang Shen, Han Zhang

Abstract: Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, makin… ▽ More Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, making longitudinal infant brain atlas construction and developmental trajectory delineation quite challenging. Thanks to the development of an AI-based generative model, neuroimage completion has become a powerful technique to retain as much available data as possible. However, current image completion methods usually suffer from inconsistency within each individual subject in the time dimension, compromising the overall quality. To solve this problem, our paper proposed a two-stage cascaded diffusion model, Cas-DiffCom, for dense and longitudinal 3D infant brain MRI completion and super-resolution. We applied our proposed method to the Baby Connectome Project (BCP) dataset. The experiment results validate that Cas-DiffCom achieves both individual consistency and high fidelity in longitudinal infant brain image completion. We further applied the generated infant brain images to two downstream tasks, brain tissue segmentation and developmental trajectory delineation, to declare its task-oriented potential in the neuroscience field. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2401.17450 [pdf, other]

Qplacer: Frequency-Aware Component Placement for Superconducting Quantum Computers

Authors: Junyao Zhang, Hanrui Wang, Qi Ding, Jiaqi Gu, Reouven Assouly, William D. Oliver, Song Han, Kenneth R. Brown, Hai "Helen" Li, Yiran Chen

Abstract: Noisy Intermediate-Scale Quantum (NISQ) computers face a critical limitation in qubit numbers, hindering their progression towards large-scale and fault-tolerant quantum computing. A significant challenge impeding scaling is crosstalk, characterized by unwanted interactions among neighboring components on quantum chips, including qubits, resonators, and substrate. We motivate a general approach to… ▽ More Noisy Intermediate-Scale Quantum (NISQ) computers face a critical limitation in qubit numbers, hindering their progression towards large-scale and fault-tolerant quantum computing. A significant challenge impeding scaling is crosstalk, characterized by unwanted interactions among neighboring components on quantum chips, including qubits, resonators, and substrate. We motivate a general approach to systematically resolving multifaceted crosstalks in a limited substrate area. We propose Qplacer, a frequency-aware electrostatic-based placement framework tailored for superconducting quantum computers, to alleviate crosstalk by isolating these components in spatial and frequency domains alongside compact substrate design. Qplacer commences with a frequency assigner that ensures frequency domain isolation for qubits and resonators. It then incorporates a padding strategy and resonator partitioning for layout flexibility. Central to our approach is the conceptualization of quantum components as charged particles, enabling strategic spatial isolation through a 'frequency repulsive force' concept. Our results demonstrate that Qplacer carefully crafts the physical component layout in mitigating various crosstalk impacts while maintaining a compact substrate size. On various device topologies and NISQ benchmarks, Qplacer improves fidelity by an average of 36.7x and reduces spatial violations (susceptible to crosstalk) by an average of 12.76x, compared to classical placement engines. Regarding area optimization, compared to manual designs, Qplacer can reduce the required layout area by 2.14x on average △ Less

Submitted 8 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.03430 [pdf, other]

doi 10.1109/TCSII.2022.3187205

Multi-Channel Multi-Domain based Knowledge Distillation Algorithm for Sleep Staging with Single-Channel EEG

Authors: Chao Zhang, Yiqiao Liao, Siqi Han, Milin Zhang, Zhihua Wang, Xiang Xie

Abstract: This paper proposed a Multi-Channel Multi-Domain (MCMD) based knowledge distillation algorithm for sleep staging using single-channel EEG. Both knowledge from different domains and different channels are learnt in the proposed algorithm, simultaneously. A multi-channel pre-training and single-channel fine-tuning scheme is used in the proposed work. The knowledge from different channels in the sour… ▽ More This paper proposed a Multi-Channel Multi-Domain (MCMD) based knowledge distillation algorithm for sleep staging using single-channel EEG. Both knowledge from different domains and different channels are learnt in the proposed algorithm, simultaneously. A multi-channel pre-training and single-channel fine-tuning scheme is used in the proposed work. The knowledge from different channels in the source domain is transferred to the single-channel model in the target domain. A pre-trained teacher-student model scheme is used to distill knowledge from the multi-channel teacher model to the single-channel student model combining with output transfer and intermediate feature transfer in the target domain. The proposed algorithm achieves a state-of-the-art single-channel sleep staging accuracy of 86.5%, with only 0.6% deterioration from the state-of-the-art multi-channel model. There is an improvement of 2% compared to the baseline model. The experimental results show that knowledge from multiple domains (different datasets) and multiple channels (e.g. EMG, EOG) could be transferred to single-channel sleep staging. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: 5 pages, 2 figures, published by IEEE TCAS-II

Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, 2022, 69(11): 4608-4612

arXiv:2401.02101 [pdf, ps, other]

ICI-Free Channel Estimation and Wireless Gesture Recognition Based on Cellular Signals

Authors: Rui Peng, Yafei Tian, Shengqian Han

Abstract: Device-free wireless sensing attracts enormous attentions since it senses the environment without additional devices. While cellular signals are good opportunistic radio sources, the influence of inter-cell interference (ICI) on wireless sensing has not been adequately addressed. In this letter, we first investigate the cause of ICI and its impact on wireless sensing. Then we propose an ICI-free c… ▽ More Device-free wireless sensing attracts enormous attentions since it senses the environment without additional devices. While cellular signals are good opportunistic radio sources, the influence of inter-cell interference (ICI) on wireless sensing has not been adequately addressed. In this letter, we first investigate the cause of ICI and its impact on wireless sensing. Then we propose an ICI-free channel estimation method by reconstructing the broadcast signals of adjacent cells and solving simultaneous equations. Wireless gesture recognition can be greatly benefited by ICI mitigation. Finally, we build a prototype system to receive the commercial 4G-LTE signals, and demonstrate the accuracies of wireless gesture recognition under various conditions. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.15873 [pdf, other]

Investigating Inter-Satellite Link Spanning Patterns on Networking Performance in Mega-constellations

Authors: Xiangtong Wang, Xiaodong Han, Menglong Yang, Chuan Xing, Yuqi Wang, Songchen Han, Wei Li

Abstract: Low Earth orbit (LEO) mega-constellations rely on inter-satellite links (ISLs) to provide global connectivity. We note that in addition to the general constellation parameters, the ISL spanning patterns are also greatly influence the final network structure and thus the network performance. In this work, we formulate the ISL spanning patterns, apply different patterns to mega-constellation and g… ▽ More Low Earth orbit (LEO) mega-constellations rely on inter-satellite links (ISLs) to provide global connectivity. We note that in addition to the general constellation parameters, the ISL spanning patterns are also greatly influence the final network structure and thus the network performance. In this work, we formulate the ISL spanning patterns, apply different patterns to mega-constellation and generate multiple structures. Then, we delve into the performance estimation of these networks, specifically evaluating network capacity, throughput, latency, and routing path stretch. The experimental findings provide insights into the optimal network structure under diverse conditions, showcasing superior performance when compared to alternative network configurations. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: 5pages

arXiv:2311.18762 [pdf, ps, other]

Performance Analysis of Integrated Sensing and Communications Under Gain-Phase Imperfections

Authors: Shuaishuai Han, Mohammad Ahmad Al-Jarrah, Emad Alsusa

Abstract: This paper evaluates the performance of uplink integrated sensing and communication systems in the presence of gain and phase imperfections. Specifically, we consider multiple unmanned aerial vehicles (UAVs) transmitting data to a multiple-input-multiple-output base-station (BS) that is responsible for estimating the transmitted information in addition to localising the transmitting UAVs. The sign… ▽ More This paper evaluates the performance of uplink integrated sensing and communication systems in the presence of gain and phase imperfections. Specifically, we consider multiple unmanned aerial vehicles (UAVs) transmitting data to a multiple-input-multiple-output base-station (BS) that is responsible for estimating the transmitted information in addition to localising the transmitting UAVs. The signal processing at the BS is divided into two consecutive stages: localisation and communication. A maximum likelihood (ML) algorithm is introduced for the localisation stage to jointly estimate the azimuth-elevation angles and Doppler frequency of the UAVs under gain-phase defects, which are then compared to the estimation of signal parameters via rotational invariance techniques (ESPRIT) and multiple signal classification (MUSIC). Furthermore, the Cramer-Rao lower bound (CRLB) is derived to evaluate the asymptotic performance and quantify the influence of the gain-phase imperfections which are modelled using Rician and von Mises distributions, respectively. Thereafter, in the communication stage, the location parameters estimated in the first stage are employed to estimate the communication channels which are fed into a maximum ratio combiner to preprocess the received communication signal. An accurate closed-form approximation of the achievable average sum data rate (SDR) for all UAVs is derived. The obtained results show that gain-phase imperfections have a significant influence on both localisation and communication, however, the proposed ML is less sensitive when compared to other algorithms. The derived analysis is concurred with simulations. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: 38 pages, 7 figures

arXiv:2311.15313 [pdf, ps, other]

Low-Complexity Joint Beamforming for RIS-Assisted MU-MISO Systems Based on Model-Driven Deep Learning

Authors: Weijie Jin, Jing Zhang, Chao-Kai Wen, Shi Jin, Xiao Li, Shuangfeng Han

Abstract: Reconfigurable intelligent surfaces (RIS) can improve signal propagation environments by adjusting the phase of the incident signal. However, optimizing the phase shifts jointly with the beamforming vector at the access point is challenging due to the non-convex objective function and constraints. In this study, we propose an algorithm based on weighted minimum mean square error optimization and p… ▽ More Reconfigurable intelligent surfaces (RIS) can improve signal propagation environments by adjusting the phase of the incident signal. However, optimizing the phase shifts jointly with the beamforming vector at the access point is challenging due to the non-convex objective function and constraints. In this study, we propose an algorithm based on weighted minimum mean square error optimization and power iteration to maximize the weighted sum rate (WSR) of a RIS-assisted downlink multi-user multiple-input single-output system. To further improve performance, a model-driven deep learning (DL) approach is designed, where trainable variables and graph neural networks are introduced to accelerate the convergence of the proposed algorithm. We also extend the proposed method to include beamforming with imperfect channel state information and derive a two-timescale stochastic optimization algorithm. Simulation results show that the proposed algorithm outperforms state-of-the-art algorithms in terms of complexity and WSR. Specifically, the model-driven DL approach has a runtime that is approximately 3% of the state-of-the-art algorithm to achieve the same performance. Additionally, the proposed algorithm with 2-bit phase shifters outperforms the compared algorithm with continuous phase shift. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 14 pages, 9 figures, 2 tables. This paper has been accepted for publication by the IEEE Transactions on Wireless Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2311.14916 [pdf, other]

Automated Lane Merging via Game Theory and Branch Model Predictive Control

Authors: Luyao Zhang, Shaohang Han, Sergio Grammatico

Abstract: We propose an integrated behavior and motion planning framework for the automated lane-merging problem. The behavior planner combines search-based planning with game theory to model the interaction between vehicles and select multi-vehicle trajectories. Inspired by human drivers, we model the lane-merging problem as a gap selection process. To overcome the challenge of multi-modal driving behavior… ▽ More We propose an integrated behavior and motion planning framework for the automated lane-merging problem. The behavior planner combines search-based planning with game theory to model the interaction between vehicles and select multi-vehicle trajectories. Inspired by human drivers, we model the lane-merging problem as a gap selection process. To overcome the challenge of multi-modal driving behavior exhibited by the surrounding vehicles, we formulate the trajectory selection as a matrix game and compute some equilibrium solutions. In practice, however, the surrounding vehicles might deviate from the computed equilibrium trajectories. Thus, we introduce a branch model predictive control (BMPC) framework to account for the uncertain behavior modes of the surrounding vehicles. A tailored numerical solver is developed to enhance computational efficiency by leveraging the tree structure inherent in BMPC. Finally, we validate our proposed integrated planner using real traffic data and demonstrate its effectiveness in handling interactions in dense traffic scenarios. △ Less

Submitted 8 March, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

arXiv:2310.16869 [pdf]

Single-pixel imaging based on deep learning

Authors: Kai Song, Yaoxing Bian, Ku Wu, Hongrui Liu, Shuangping Han, Jiaming Li, Jiazhao Tian, Chengbin Qin, Jianyong Hu, Liantuan Xiao

Abstract: Single-pixel imaging can collect images at the wavelengths outside the reach of conventional focal plane array detectors. However, the limited image quality and lengthy computational times for iterative reconstruction still impede the practical application of single-pixel imaging. Recently, deep learning has been introduced into single-pixel imaging, which has attracted a lot of attention due to i… ▽ More Single-pixel imaging can collect images at the wavelengths outside the reach of conventional focal plane array detectors. However, the limited image quality and lengthy computational times for iterative reconstruction still impede the practical application of single-pixel imaging. Recently, deep learning has been introduced into single-pixel imaging, which has attracted a lot of attention due to its exceptional reconstruction quality, fast reconstruction speed, and the potential to complete advanced sensing tasks without reconstructing images. Here, this advance is discussed and some opinions are offered. Firstly, based on the fundamental principles of single-pixel imaging and deep learning, the principles and algorithms of single-pixel imaging based on deep learning are described and analyzed. Subsequently, the implementation technologies of single-pixel imaging based on deep learning are reviewed. They are divided into super-resolution single-pixel imaging, single-pixel imaging through scattering media, photon-level single-pixel imaging, optical encryption based on single-pixel imaging, color single-pixel imaging, and image-free sensing according to diverse application fields. Finally, major challenges and corresponding feasible approaches are discussed, as well as more possible applications in the future. △ Less

Submitted 16 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.12405 [pdf, other]

LoMAE: Low-level Vision Masked Autoencoders for Low-dose CT Denoising

Authors: Dayang Wang, Yongshun Xu, Shuo Han, Zhan Wu, Li Zhou, Bahareh Morovati, Hengyong Yu

Abstract: Low-dose computed tomography (LDCT) offers reduced X-ray radiation exposure but at the cost of compromised image quality, characterized by increased noise and artifacts. Recently, transformer models emerged as a promising avenue to enhance LDCT image quality. However, the success of such models relies on a large amount of paired noisy and clean images, which are often scarce in clinical settings.… ▽ More Low-dose computed tomography (LDCT) offers reduced X-ray radiation exposure but at the cost of compromised image quality, characterized by increased noise and artifacts. Recently, transformer models emerged as a promising avenue to enhance LDCT image quality. However, the success of such models relies on a large amount of paired noisy and clean images, which are often scarce in clinical settings. In the fields of computer vision and natural language processing, masked autoencoders (MAE) have been recognized as an effective label-free self-pretraining method for transformers, due to their exceptional feature representation ability. However, the original pretraining and fine-tuning design fails to work in low-level vision tasks like denoising. In response to this challenge, we redesign the classical encoder-decoder learning model and facilitate a simple yet effective low-level vision MAE, referred to as LoMAE, tailored to address the LDCT denoising problem. Moreover, we introduce an MAE-GradCAM method to shed light on the latent learning mechanisms of the MAE/LoMAE. Additionally, we explore the LoMAE's robustness and generability across a variety of noise levels. Experiments results show that the proposed LoMAE can enhance the transformer's denoising performance and greatly relieve the dependence on the ground truth clean data. It also demonstrates remarkable robustness and generalizability over a spectrum of noise levels. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.07161 [pdf, ps, other]

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

Authors: Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Shuo Han, Yunyang Zeng, Ankit Shah, Bhiksha Raj

Abstract: Within the ambit of VoIP (Voice over Internet Protocol) telecommunications, the complexities introduced by acoustic transformations merit rigorous analysis. This research, rooted in the exploration of proprietary sender-side denoising effects, meticulously evaluates platforms such as Google Meets and Zoom. The study draws upon the Deep Noise Suppression (DNS) 2020 dataset, ensuring a structured ex… ▽ More Within the ambit of VoIP (Voice over Internet Protocol) telecommunications, the complexities introduced by acoustic transformations merit rigorous analysis. This research, rooted in the exploration of proprietary sender-side denoising effects, meticulously evaluates platforms such as Google Meets and Zoom. The study draws upon the Deep Noise Suppression (DNS) 2020 dataset, ensuring a structured examination tailored to various denoising settings and receiver interfaces. A methodological novelty is introduced via the Oaxaca decomposition, traditionally an econometric tool, repurposed herein to analyze acoustic-phonetic perturbations within VoIP systems. To further ground the implications of these transformations, psychoacoustic metrics, specifically PESQ and STOI, were harnessed to furnish a comprehensive understanding of speech alterations. Cumulatively, the insights garnered underscore the intricate landscape of VoIP-influenced acoustic dynamics. In addition to the primary findings, a multitude of metrics are reported, extending the research purview. Moreover, out-of-domain benchmarking for both time and time-frequency domain speech enhancement models is included, thereby enhancing the depth and applicability of this inquiry. Repository: github.com/deepology/VoIP-DNS-Challenge △ Less

Submitted 21 November, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.07062 [pdf, other]

Acoustic Model Fusion for End-to-end Speech Recognition

Authors: Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu

Abstract: Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, tr… ▽ More Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E system has proven to be beneficial. However, the application of LM fusion presents certain drawbacks, such as its inability to address the domain mismatch issue inherent to the internal AM. Drawing inspiration from the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address the domain mismatch. By implementing this novel approach, we have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets. We also discovered that this AM fusion approach is particularly beneficial in enhancing named entity recognition. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2308.16551 [pdf]

Object Detection for Caries or Pit and Fissure Sealing Requirement in Children's First Permanent Molars

Authors: Chenyao Jiang, Shiyao Zhai, Hengrui Song, Yuqing Ma, Yachen Fan, Yancheng Fang, Dongmei Yu, Canyang Zhang, Sanyang Han, Runming Wang, Yong Liu, Jianbo Li, Peiwu Qin

Abstract: Dental caries is one of the most common oral diseases that, if left untreated, can lead to a variety of oral problems. It mainly occurs inside the pits and fissures on the occlusal/buccal/palatal surfaces of molars and children are a high-risk group for pit and fissure caries in permanent molars. Pit and fissure sealing is one of the most effective methods that is widely used in prevention of pit… ▽ More Dental caries is one of the most common oral diseases that, if left untreated, can lead to a variety of oral problems. It mainly occurs inside the pits and fissures on the occlusal/buccal/palatal surfaces of molars and children are a high-risk group for pit and fissure caries in permanent molars. Pit and fissure sealing is one of the most effective methods that is widely used in prevention of pit and fissure caries. However, current detection of pits and fissures or caries depends primarily on the experienced dentists, which ordinary parents do not have, and children may miss the remedial treatment without timely detection. To address this issue, we present a method to autodetect caries and pit and fissure sealing requirements using oral photos taken by smartphones. We use the YOLOv5 and YOLOX models and adopt a tiling strategy to reduce information loss during image pre-processing. The best result for YOLOXs model with tiling strategy is 72.3 mAP.5, while the best result without tiling strategy is 71.2. YOLOv5s6 model with/without tiling attains 70.9/67.9 mAP.5, respectively. We deploy the pre-trained network to mobile devices as a WeChat applet, allowing in-home detection by parents or children guardian. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.11335 [pdf, other]

Graph Neural Network-Enhanced Expectation Propagation Algorithm for MIMO Turbo Receivers

Authors: Xingyu Zhou, Jing Zhang, Chao-Kai Wen, Shi Jin, Shuangfeng Han

Abstract: Deep neural networks (NNs) are considered a powerful tool for balancing the performance and complexity of multiple-input multiple-output (MIMO) receivers due to their accurate feature extraction, high parallelism, and excellent inference ability. Graph NNs (GNNs) have recently demonstrated outstanding capability in learning enhanced message passing rules and have shown success in overcoming the dr… ▽ More Deep neural networks (NNs) are considered a powerful tool for balancing the performance and complexity of multiple-input multiple-output (MIMO) receivers due to their accurate feature extraction, high parallelism, and excellent inference ability. Graph NNs (GNNs) have recently demonstrated outstanding capability in learning enhanced message passing rules and have shown success in overcoming the drawback of inaccurate Gaussian approximation of expectation propagation (EP)-based MIMO detectors. However, the application of the GNN-enhanced EP detector to MIMO turbo receivers is underexplored and non-trivial due to the requirement of extrinsic information for iterative processing. This paper proposes a GNN-enhanced EP algorithm for MIMO turbo receivers, which realizes the turbo principle of generating extrinsic information from the MIMO detector through a specially designed training procedure. Additionally, an edge pruning strategy is designed to eliminate redundant connections in the original fully connected model of the GNN utilizing the correlation information inherently from the EP algorithm. Edge pruning reduces the computational cost dramatically and enables the network to focus more attention on the weights that are vital for performance. Simulation results and complexity analysis indicate that the proposed MIMO turbo receiver outperforms the EP turbo approaches by over 1 dB at the bit error rate of $10^{-5}$, exhibits performance equivalent to state-of-the-art receivers with 2.5 times shorter running time, and adapts to various scenarios. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: 15 pages, 12 figures, 2 tables. This paper has been accepted for publication by the IEEE Transactions on Signal Processing. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2308.06634 [pdf, other]

DISQ: Dynamic Iteration Skipping for Variational Quantum Algorithms

Authors: Junyao Zhang, Hanrui Wang, Gokul Subramanian Ravi, Frederic T. Chong, Song Han, Frank Mueller, Yiran Chen

Abstract: This paper proposes DISQ to craft a stable landscape for VQA training and tackle the noise drift challenge. DISQ adopts a "drift detector" with a reference circuit to identify and skip iterations that are severely affected by noise drift errors. Specifically, the circuits from the previous training iteration are re-executed as a reference circuit in the current iteration to estimate noise drift im… ▽ More This paper proposes DISQ to craft a stable landscape for VQA training and tackle the noise drift challenge. DISQ adopts a "drift detector" with a reference circuit to identify and skip iterations that are severely affected by noise drift errors. Specifically, the circuits from the previous training iteration are re-executed as a reference circuit in the current iteration to estimate noise drift impacts. The iteration is deemed compromised by noise drift errors and thus skipped if noise drift flips the direction of the ideal optimization gradient. To enhance noise drift detection reliability, we further propose to leverage multiple reference circuits from previous iterations to provide a well founded judge of current noise drift. Nevertheless, multiple reference circuits also introduce considerable execution overhead. To mitigate extra overhead, we propose Pauli-term subsetting (prime and minor subsets) to execute only observable circuits with large coefficient magnitudes (prime subset) during drift detection. Only this minor subset is executed when the current iteration is drift-free. Evaluations across various applications and QPUs demonstrate that DISQ can mitigate a significant portion of the noise drift impact on VQAs and achieve 1.51-2.24x fidelity improvement over the traditional baseline. DISQ's benefit is 1.1-1.9x over the best alternative approach while boosting average noise detection speed by 2.07x △ Less

Submitted 12 July, 2024; v1 submitted 12 August, 2023; originally announced August 2023.

arXiv:2307.16228 [pdf, other]

Robust Electric Vehicle Balancing of Autonomous Mobility-On-Demand System: A Multi-Agent Reinforcement Learning Approach

Authors: Sihong He, Shuo Han, Fei Miao

Abstract: Electric autonomous vehicles (EAVs) are getting attention in future autonomous mobility-on-demand (AMoD) systems due to their economic and societal benefits. However, EAVs' unique charging patterns (long charging time, high charging frequency, unpredictable charging behaviors, etc.) make it challenging to accurately predict the EAVs supply in E-AMoD systems. Furthermore, the mobility demand's pred… ▽ More Electric autonomous vehicles (EAVs) are getting attention in future autonomous mobility-on-demand (AMoD) systems due to their economic and societal benefits. However, EAVs' unique charging patterns (long charging time, high charging frequency, unpredictable charging behaviors, etc.) make it challenging to accurately predict the EAVs supply in E-AMoD systems. Furthermore, the mobility demand's prediction uncertainty makes it an urgent and challenging task to design an integrated vehicle balancing solution under supply and demand uncertainties. Despite the success of reinforcement learning-based E-AMoD balancing algorithms, state uncertainties under the EV supply or mobility demand remain unexplored. In this work, we design a multi-agent reinforcement learning (MARL)-based framework for EAVs balancing in E-AMoD systems, with adversarial agents to model both the EAVs supply and mobility demand uncertainties that may undermine the vehicle balancing solutions. We then propose a robust E-AMoD Balancing MARL (REBAMA) algorithm to train a robust EAVs balancing policy to balance both the supply-demand ratio and charging utilization rate across the whole city. Experiments show that our proposed robust method performs better compared with a non-robust MARL method that does not consider state uncertainties; it improves the reward, charging utilization fairness, and supply-demand fairness by 19.28%, 28.18%, and 3.97%, respectively. Compared with a robust optimization-based method, the proposed MARL algorithm can improve the reward, charging utilization fairness, and supply-demand fairness by 8.21%, 8.29%, and 9.42%, respectively. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: accepted to International Conference on Intelligent Robots and Systems (IROS2023)

arXiv:2307.16212 [pdf, other]

Robust Multi-Agent Reinforcement Learning with State Uncertainty

Authors: Sihong He, Songyang Han, Sanbao Su, Shuo Han, Shaofeng Zou, Fei Miao

Abstract: In real-world multi-agent reinforcement learning (MARL) applications, agents may not have perfect state information (e.g., due to inaccurate measurement or malicious attacks), which challenges the robustness of agents' policies. Though robustness is getting important in MARL deployment, little prior work has studied state uncertainties in MARL, neither in problem formulation nor algorithm design.… ▽ More In real-world multi-agent reinforcement learning (MARL) applications, agents may not have perfect state information (e.g., due to inaccurate measurement or malicious attacks), which challenges the robustness of agents' policies. Though robustness is getting important in MARL deployment, little prior work has studied state uncertainties in MARL, neither in problem formulation nor algorithm design. Motivated by this robustness issue and the lack of corresponding studies, we study the problem of MARL with state uncertainty in this work. We provide the first attempt to the theoretical and empirical analysis of this challenging problem. We first model the problem as a Markov Game with state perturbation adversaries (MG-SPA) by introducing a set of state perturbation adversaries into a Markov Game. We then introduce robust equilibrium (RE) as the solution concept of an MG-SPA. We conduct a fundamental analysis regarding MG-SPA such as giving conditions under which such a robust equilibrium exists. Then we propose a robust multi-agent Q-learning (RMAQ) algorithm to find such an equilibrium, with convergence guarantees. To handle high-dimensional state-action space, we design a robust multi-agent actor-critic (RMAAC) algorithm based on an analytical expression of the policy gradient derived in the paper. Our experiments show that the proposed RMAQ algorithm converges to the optimal value function; our RMAAC algorithm outperforms several MARL and robust MARL methods in multiple multi-agent environments when state uncertainty is present. The source code is public on \url{https://github.com/sihongho/robust_marl_with_state_uncertainty}. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: 50 pages, Published in TMLR, Transactions on Machine Learning Research (06/2023)

arXiv:2307.05799 [pdf]

3D Medical Image Segmentation based on multi-scale MPU-Net

Authors: Zeqiu. Yu, Shuo. Han, Ziheng. Song

Abstract: The high cure rate of cancer is inextricably linked to physicians' accuracy in diagnosis and treatment, therefore a model that can accomplish high-precision tumor segmentation has become a necessity in many applications of the medical industry. It can effectively lower the rate of misdiagnosis while considerably lessening the burden on clinicians. However, fully automated target organ segmentation… ▽ More The high cure rate of cancer is inextricably linked to physicians' accuracy in diagnosis and treatment, therefore a model that can accomplish high-precision tumor segmentation has become a necessity in many applications of the medical industry. It can effectively lower the rate of misdiagnosis while considerably lessening the burden on clinicians. However, fully automated target organ segmentation is problematic due to the irregular stereo structure of 3D volume organs. As a basic model for this class of real applications, U-Net excels. It can learn certain global and local features, but still lacks the capacity to grasp spatial long-range relationships and contextual information at multiple scales. This paper proposes a tumor segmentation model MPU-Net for patient volume CT images, which is inspired by Transformer with a global attention mechanism. By combining image serialization with the Position Attention Module, the model attempts to comprehend deeper contextual dependencies and accomplish precise positioning. Each layer of the decoder is also equipped with a multi-scale module and a cross-attention mechanism. The capability of feature extraction and integration at different levels has been enhanced, and the hybrid loss function developed in this study can better exploit high-resolution characteristic information. Moreover, the suggested architecture is tested and evaluated on the Liver Tumor Segmentation Challenge 2017 (LiTS 2017) dataset. Compared with the benchmark model U-Net, MPU-Net shows excellent segmentation results. The dice, accuracy, precision, specificity, IOU, and MCC metrics for the best model segmentation results are 92.17%, 99.08%, 91.91%, 99.52%, 85.91%, and 91.74%, respectively. Outstanding indicators in various aspects illustrate the exceptional performance of this framework in automatic medical image segmentation. △ Less

Submitted 24 July, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

Comments: 37 pages

arXiv:2306.16193 [pdf]

Deterministic End-to-End Transmission to Optimize the Network Efficiency and Quality of Service: A Paradigm Shift in 6G

Authors: Xiaoyun Wang, Shuangfeng Han, Zhiming Liu, Qixing Wang

Abstract: Toward end-to-end mobile service provision with optimized network efficiency and quality of service, tremendous efforts have been devoted in upgrading mobile applications, transport and internet networks, and wireless communication networks for many years. However, the inherent loose coordination between different layers in the end-to-end communication networks leads to unreliable data transmissio… ▽ More Toward end-to-end mobile service provision with optimized network efficiency and quality of service, tremendous efforts have been devoted in upgrading mobile applications, transport and internet networks, and wireless communication networks for many years. However, the inherent loose coordination between different layers in the end-to-end communication networks leads to unreliable data transmission with uncontrollable packet delay and packet error rate, and a terrible waste of network resources incurred for data re-transmission. In an attempt to shed some lights on how to tackle these challenges, design methodologies and some solutions for deterministic end-to-end transmission for 6G and beyond are presented, which will bring a paradigm shift to the end-to-end wireless communication networks. △ Less

Submitted 2 July, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: 5 pages, 2 figures

arXiv:2306.09164 [pdf]

Network Architecture Design toward Convergence of Mobile Applications and Networks

Authors: Shuangfeng Han, Zhiming Liu, Tao Sun, Xiaoyun Wang

Abstract: With the quick proliferation of extended reality (XR) services, the mobile communications networks are faced with gigantic challenges to meet the diversified and challenging service requirements. A tight coordination or even convergence of applications and mobile networks is highly motivated. In this paper, a multi-domain (e.g. application layer, transport layer, the core network, radio access net… ▽ More With the quick proliferation of extended reality (XR) services, the mobile communications networks are faced with gigantic challenges to meet the diversified and challenging service requirements. A tight coordination or even convergence of applications and mobile networks is highly motivated. In this paper, a multi-domain (e.g. application layer, transport layer, the core network, radio access network, user equipment) coordination scheme is first proposed, which facilitates a tight coordination between applications and networks based on the current 5G networks. Toward the convergence of applications and networks, a network architectures with cross-domain joint processing capability is further proposed for 6G mobile communications and beyond. Both designs are able to provide more accurate information of the quality of experience (QoE) and quality of service (QoS), thus paving the path for the joint optimization of applications and networks. The benefits of the QoE assisted scheduling are further investigated via simulations. A new QoE-oriented fairness metric is further proposed, which is capable of ensuring better fairness when different services are scheduled. Future research directions and their standardization impacts are also identified. Toward optimized end-to-end service provision, the paradigm shift from loosely coupled to converged design of applications and wireless communication networks is indispensable. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: 7 pages, 5 figures, IEEE communications magazine, under review

arXiv:2306.04628 [pdf, other]

Systematic Analysis of Music Representations from BERT

Authors: Sangjun Han, Hyeongrae Ihm, Woohyung Lim

Abstract: There have been numerous attempts to represent raw data as numerical vectors that effectively capture semantic and contextual information. However, in the field of symbolic music, previous works have attempted to validate their music embeddings by observing the performance improvement of various fine-tuning tasks. In this work, we directly analyze embeddings from BERT and BERT with contrastive lea… ▽ More There have been numerous attempts to represent raw data as numerical vectors that effectively capture semantic and contextual information. However, in the field of symbolic music, previous works have attempted to validate their music embeddings by observing the performance improvement of various fine-tuning tasks. In this work, we directly analyze embeddings from BERT and BERT with contrastive learning trained on bar-level MIDI, inspecting their musical information that can be obtained from MIDI events. We observe that the embeddings exhibit distinct characteristics of information depending on the contrastive objectives and the choice of layers. Our code is available at https://github.com/sjhan91/MusicBERT. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2305.09793 [pdf, other]

Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

Authors: Desong Du, Shaohang Han, Naiming Qi, Haitham Bou Ammar, Jun Wang, Wei Pan

Abstract: Reinforcement learning (RL) exhibits impressive performance when managing complicated control tasks for robots. However, its wide application to physical robots is limited by the absence of strong safety guarantees. To overcome this challenge, this paper explores the control Lyapunov barrier function (CLBF) to analyze the safety and reachability solely based on data without explicitly employing a… ▽ More Reinforcement learning (RL) exhibits impressive performance when managing complicated control tasks for robots. However, its wide application to physical robots is limited by the absence of strong safety guarantees. To overcome this challenge, this paper explores the control Lyapunov barrier function (CLBF) to analyze the safety and reachability solely based on data without explicitly employing a dynamic model. We also proposed the Lyapunov barrier actor-critic (LBAC), a model-free RL algorithm, to search for a controller that satisfies the data-based approximation of the safety and reachability conditions. The proposed approach is demonstrated through simulation and real-world robot control experiments, i.e., a 2D quadrotor navigation task. The experimental findings reveal this approach's effectiveness in reachability and safety, surpassing other model-free RL methods. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.08878 [pdf, other]

Learning to Learn Unlearned Feature for Brain Tumor Segmentation

Authors: Seungyub Han, Yeongmo Kim, Seokhyeon Ha, Jungwoo Lee, Seunghong Choi

Abstract: We propose a fine-tuning algorithm for brain tumor segmentation that needs only a few data samples and helps networks not to forget the original tasks. Our approach is based on active learning and meta-learning. One of the difficulties in medical image segmentation is the lack of datasets with proper annotations, because it requires doctors to tag reliable annotation and there are many variants of… ▽ More We propose a fine-tuning algorithm for brain tumor segmentation that needs only a few data samples and helps networks not to forget the original tasks. Our approach is based on active learning and meta-learning. One of the difficulties in medical image segmentation is the lack of datasets with proper annotations, because it requires doctors to tag reliable annotation and there are many variants of a disease, such as glioma and brain metastasis, which are the different types of brain tumor and have different structural features in MR images. Therefore, it is impossible to produce the large-scale medical image datasets for all types of diseases. In this paper, we show a transfer learning method from high grade glioma to brain metastasis, and demonstrate that the proposed algorithm achieves balanced parameters for both glioma and brain metastasis domains within a few steps. △ Less

Submitted 13 May, 2023; originally announced May 2023.

Comments: Medical Imaging Meets NeurIPS 2018

arXiv:2305.05587 [pdf, other]

Predictive Control of Linear Discrete-Time Markovian Jump Systems by Learning Recurrent Patterns

Authors: SooJean Han, Soon-Jo Chung, John C. Doyle

Abstract: Incorporating pattern-learning for prediction (PLP) in many discrete-time or discrete-event systems allows for computation-efficient controller design by memorizing patterns to schedule control policies based on their future occurrences. In this paper, we demonstrate the effect of PLP by designing a controller architecture for a class of linear Markovian jump systems (MJS) where the aforementioned… ▽ More Incorporating pattern-learning for prediction (PLP) in many discrete-time or discrete-event systems allows for computation-efficient controller design by memorizing patterns to schedule control policies based on their future occurrences. In this paper, we demonstrate the effect of PLP by designing a controller architecture for a class of linear Markovian jump systems (MJS) where the aforementioned ``patterns'' correspond to finite-length sequences of modes. In our analysis of recurrent patterns, we use martingale theory to derive closed-form solutions to quantities pertaining to the occurrence of patterns: 1) the expected minimum occurrence time of any pattern from some predefined collection, 2) the probability of a pattern being the first to occur among the collection. Our method is applicable to real-world dynamics because we make two extensions to common assumptions in prior pattern-occurrence literature. First, the distribution of the mode process is unknown, and second, the true realization of the mode process is not observable. As demonstration, we consider fault-tolerant control of a dynamic topology-switching network, and empirically compare PLP to two controllers without PLP: a baseline based on the novel System Level Synthesis (SLS) approach and a topology-robust extension of the SLS baseline. We show that PLP is able to reject disturbances as effectively as the topology-robust controller at reduced computation time and control effort. We discuss several important tradeoffs, such as the size of the pattern collection and the system scale versus the accuracy of the mode predictions, which show how different PLP implementations affect stabilization and runtime performance. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: Preprint submitted to Automatica as of Jan 2023

arXiv:2304.14467 [pdf, other]

Distributed Quantized Detection of Sparse Signals Under Byzantine Attacks

Authors: Chen Quan, Yunghsiang S. Han, Baocheng Geng, Pramod K. Varshney

Abstract: This paper investigates distributed detection of sparse stochastic signals with quantized measurements under Byzantine attacks. Under this type of attack, sensors in the networks might send falsified data to degrade system performance. The Bernoulli-Gaussian (BG) distribution in terms of the sparsity degree of the stochastic signal is utilized for modeling the sparsity of signals. Several detector… ▽ More This paper investigates distributed detection of sparse stochastic signals with quantized measurements under Byzantine attacks. Under this type of attack, sensors in the networks might send falsified data to degrade system performance. The Bernoulli-Gaussian (BG) distribution in terms of the sparsity degree of the stochastic signal is utilized for modeling the sparsity of signals. Several detectors with improved detection performance are proposed by incorporating the estimated attack parameters into the detection process. First, we propose the generalized likelihood ratio test with reference sensors (GLRTRS) and the locally most powerful test with reference sensors (LMPTRS) detectors with adaptive thresholds, given that the sparsity degree and the attack parameters are unknown. Our simulation results show that the LMPTRS and GLRTRS detectors outperform the LMPT and GLRT detectors proposed for an attack-free environment and are more robust against attacks. The proposed detectors can achieve the detection performance close to the benchmark likelihood ratio test (LRT) detector, which has perfect knowledge of the attack parameters and sparsity degree. When the fraction of Byzantine nodes are assumed to be known, we can further improve the system's detection performance. We propose the enhanced LMPTRS (E-LMPTRS) and enhanced GLRTRS (E-GLRTRS) detectors by filtering out potential malicious sensors with the knowledge of the fraction of Byzantine nodes in the network. Simulation results show the superiority of proposed enhanced detectors over LMPTRS and GLRTRS detectors. △ Less

Submitted 27 April, 2023; originally announced April 2023.

arXiv:2304.06246 [pdf, other]

Rapid Brain Meninges Surface Reconstruction with Layer Topology Guarantee

Authors: Peiyu Duan, Yuan Xue, Shuo Han, Lianrui Zuo, Aaron Carass, Caitlyn Bernhard, Savannah Hays, Peter A. Calabresi, Susan M. Resnick, James S. Duncan, Jerry L. Prince

Abstract: The meninges, located between the skull and brain, are composed of three membrane layers: the pia, the arachnoid, and the dura. Reconstruction of these layers can aid in studying volume differences between patients with neurodegenerative diseases and normal aging subjects. In this work, we use convolutional neural networks (CNNs) to reconstruct surfaces representing meningeal layer boundaries from… ▽ More The meninges, located between the skull and brain, are composed of three membrane layers: the pia, the arachnoid, and the dura. Reconstruction of these layers can aid in studying volume differences between patients with neurodegenerative diseases and normal aging subjects. In this work, we use convolutional neural networks (CNNs) to reconstruct surfaces representing meningeal layer boundaries from magnetic resonance (MR) images. We first use the CNNs to predict the signed distance functions (SDFs) representing these surfaces while preserving their anatomical ordering. The marching cubes algorithm is then used to generate continuous surface representations; both the subarachnoid space (SAS) and the intracranial volume (ICV) are computed from these surfaces. The proposed method is compared to a state-of-the-art deformable model-based reconstruction method, and we show that our method can reconstruct smoother and more accurate surfaces using less computation time. Finally, we conduct experiments with volumetric analysis on both subjects with multiple sclerosis and healthy controls. For healthy and MS subjects, ICVs and SAS volumes are found to be significantly correlated to sex (p<0.01) and age (p<0.03) changes, respectively. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: ISBI 2023 Oral

arXiv:2303.15703 [pdf, other]

AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection

Authors: Jin Sob Kim, Hyun Joon Park, Wooseok Shin, Sung Won Han

Abstract: Sound event localization and detection (SELD) combines the identification of sound events with the corresponding directions of arrival (DOA). Recently, event-oriented track output formats have been adopted to solve this problem; however, they still have limited generalization toward real-world problems in an unknown polyphony environment. To address the issue, we proposed an angular-distance-based… ▽ More Sound event localization and detection (SELD) combines the identification of sound events with the corresponding directions of arrival (DOA). Recently, event-oriented track output formats have been adopted to solve this problem; however, they still have limited generalization toward real-world problems in an unknown polyphony environment. To address the issue, we proposed an angular-distance-based multiple SELD (AD-YOLO), which is an adaptation of the "You Only Look Once" algorithm for SELD. The AD-YOLO format allows the model to learn sound occurrences location-sensitively by assigning class responsibility to DOA predictions. Hence, the format enables the model to handle the polyphony problem, regardless of the number of sound overlaps. We evaluated AD-YOLO on DCASE 2020-2022 challenge Task 3 datasets using four SELD objective metrics. The experimental results show that AD-YOLO achieved outstanding performance overall and also accomplished robustness in class-homogeneous polyphony environments. △ Less

Submitted 10 May, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: 5 pages, 3 figures, accepted for publication in IEEE ICASSP 2023

arXiv:2303.09463 [pdf, other]

An Autonomous System for Head-to-Head Race: Design, Implementation and Analysis; Team KAIST at the Indy Autonomous Challenge

Authors: Chanyoung Jung, Andrea Finazzi, Hyunki Seong, Daegyu Lee, Seungwook Lee, Bosung Kim, Gyuri Gang, Seungil Han, David Hyunchul Shim

Abstract: While the majority of autonomous driving research has concentrated on everyday driving scenarios, further safety and performance improvements of autonomous vehicles require a focus on extreme driving conditions. In this context, autonomous racing is a new area of research that has been attracting considerable interest recently. Due to the fact that a vehicle is driven by its perception, planning,… ▽ More While the majority of autonomous driving research has concentrated on everyday driving scenarios, further safety and performance improvements of autonomous vehicles require a focus on extreme driving conditions. In this context, autonomous racing is a new area of research that has been attracting considerable interest recently. Due to the fact that a vehicle is driven by its perception, planning, and control limits during racing, numerous research and development issues arise. This paper provides a comprehensive overview of the autonomous racing system built by team KAIST for the Indy Autonomous Challenge (IAC). Our autonomy stack consists primarily of a multi-modal perception module, a high-speed overtaking planner, a resilient control stack, and a system status manager. We present the details of all components of our autonomy solution, including algorithms, implementation, and unit test results. In addition, this paper outlines the design principles and the results of a systematical analysis. Even though our design principles are derived from the unique application domain of autonomous racing, they can also be applied to a variety of safety-critical, high-cost-of-failure robotics applications. The proposed system was integrated into a full-scale autonomous race car (Dallara AV-21) and field-tested extensively. As a result, team KAIST was one of three teams who qualified and participated in the official IAC race events without any accidents. Our proposed autonomous system successfully completed all missions, including overtaking at speeds of around $220 km/h$ in the IAC@CES2022, the world's first autonomous 1:1 head-to-head race. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: 35 pages, 31 figures, 5 tables, Field Robotics (accepted)

arXiv:2303.09057 [pdf, other]

TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion

Authors: Hyun Joon Park, Seok Woo Yang, Jin Sob Kim, Wooseok Shin, Sung Won Han

Abstract: Voice Conversion (VC) must be achieved while maintaining the content of the source speech and representing the characteristics of the target speaker. The existing methods do not simultaneously satisfy the above two aspects of VC, and their conversion outputs suffer from a trade-off problem between maintaining source contents and target characteristics. In this study, we propose Triple Adaptive Att… ▽ More Voice Conversion (VC) must be achieved while maintaining the content of the source speech and representing the characteristics of the target speaker. The existing methods do not simultaneously satisfy the above two aspects of VC, and their conversion outputs suffer from a trade-off problem between maintaining source contents and target characteristics. In this study, we propose Triple Adaptive Attention Normalization VC (TriAAN-VC), comprising an encoder-decoder and an attention-based adaptive normalization block, that can be applied to non-parallel any-to-any VC. The proposed adaptive normalization block extracts target speaker representations and achieves conversion while minimizing the loss of the source content with siamese loss. We evaluated TriAAN-VC on the VCTK dataset in terms of the maintenance of the source content and target speaker similarity. Experimental results for one-shot VC suggest that TriAAN-VC achieves state-of-the-art performance while mitigating the trade-off problem encountered in the existing VC methods. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: To appear in ICASSP 2023

arXiv:2303.09048 [pdf, other]

Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms

Authors: Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Hojeong Lee, Ankit Shah, Shuo Han, Yunyang Zeng, Amanda Shu, Haohui Liu, Xuankai Chang, Hamza Khalid, Minseon Gwak, Kawon Lee, Minjeong Kim, Bhiksha Raj

Abstract: In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications. Our approach involves adapting the DNS 2020 models to the specific acoustic characteristics of VoIP communications, which includes distortion and artifacts caused by compression, transmission, and plat… ▽ More In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications. Our approach involves adapting the DNS 2020 models to the specific acoustic characteristics of VoIP communications, which includes distortion and artifacts caused by compression, transmission, and platform-specific processing. To this end, we propose a multi-task learning framework for VoIP-DNS that jointly optimizes noise suppression and VoIP-specific acoustics for speech enhancement. We evaluate our approach on a diverse VoIP scenarios and show that it outperforms both industry performance and state-of-the-art methods for speech enhancement on VoIP applications. Our results demonstrate the potential of models trained on DNS-2020 to be improved and tailored to different VoIP platforms using VoIP-DNS, whose findings have important applications in areas such as speech recognition, voice assistants, and telecommunication. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: Under review at European Association for Signal Processing. 5 pages

arXiv:2302.08095 [pdf, other]

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

Authors: Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

Abstract: Despite rapid advancement in recent years, current speech enhancement models often produce speech that differs in perceptual quality from real clean speech. We propose a learning objective that formalizes differences in perceptual quality, by using domain knowledge of acoustic-phonetics. We identify temporal acoustic parameters -- such as spectral tilt, spectral flux, shimmer, etc. -- that are non… ▽ More Despite rapid advancement in recent years, current speech enhancement models often produce speech that differs in perceptual quality from real clean speech. We propose a learning objective that formalizes differences in perceptual quality, by using domain knowledge of acoustic-phonetics. We identify temporal acoustic parameters -- such as spectral tilt, spectral flux, shimmer, etc. -- that are non-differentiable, and we develop a neural network estimator that can accurately predict their time-series values across an utterance. We also model phoneme-specific weights for each feature, as the acoustic parameters are known to show different behavior in different phonemes. We can add this criterion as an auxiliary loss to any model that produces speech, to optimize speech outputs to match the values of clean speech in these features. Experimentally we show that it improves speech enhancement workflows in both time-domain and time-frequency domain, as measured by standard evaluation metrics. We also provide an analysis of phoneme-dependent improvement on acoustic parameters, demonstrating the additional interpretability that our method provides. This analysis can suggest which features are currently the bottleneck for improvement. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: Accepted at ICASSP 2023

arXiv:2302.08088 [pdf, other]

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

Authors: Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

Abstract: Speech enhancement models have greatly progressed in recent years, but still show limits in perceptual quality of their speech outputs. We propose an objective for perceptual quality based on temporal acoustic parameters. These are fundamental speech features that play an essential role in various applications, including speaker recognition and paralinguistic analysis. We provide a differentiable… ▽ More Speech enhancement models have greatly progressed in recent years, but still show limits in perceptual quality of their speech outputs. We propose an objective for perceptual quality based on temporal acoustic parameters. These are fundamental speech features that play an essential role in various applications, including speaker recognition and paralinguistic analysis. We provide a differentiable estimator for four categories of low-level acoustic descriptors involving: frequency-related parameters, energy or amplitude-related parameters, spectral balance parameters, and temporal features. Unlike prior work that looks at aggregated acoustic parameters or a few categories of acoustic parameters, our temporal acoustic parameter (TAP) loss enables auxiliary optimization and improvement of many fine-grain speech characteristics in enhancement workflows. We show that adding TAPLoss as an auxiliary objective in speech enhancement produces speech with improved perceptual quality and intelligibility. We use data from the Deep Noise Suppression 2020 Challenge to demonstrate that both time-domain models and time-frequency domain models can benefit from our method. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: Accepted at ICASSP 2023

arXiv:2301.10815 [pdf, other]

Human-machine Hierarchical Networks for Decision Making under Byzantine Attacks

Authors: Chen Quan, Baocheng Geng, Yunghsiang S. Han, Pramod K. Varshney

Abstract: This paper proposes a belief-updating scheme in a human-machine collaborative decision-making network to combat Byzantine attacks. A hierarchical framework is used to realize the network where local decisions from physical sensors act as reference decisions to improve the quality of human sensor decisions. During the decision-making process, the belief that each physical sensor is malicious is upd… ▽ More This paper proposes a belief-updating scheme in a human-machine collaborative decision-making network to combat Byzantine attacks. A hierarchical framework is used to realize the network where local decisions from physical sensors act as reference decisions to improve the quality of human sensor decisions. During the decision-making process, the belief that each physical sensor is malicious is updated. The case when humans have side information available is investigated, and its impact is analyzed. Simulation results substantiate that the proposed scheme can significantly improve the quality of human sensor decisions, even when most physical sensors are malicious. Moreover, the performance of the proposed method does not necessarily depend on the knowledge of the actual fraction of malicious physical sensors. Consequently, the proposed scheme can effectively defend against Byzantine attacks and improve the quality of human sensors' decisions so that the performance of the human-machine collaborative system is enhanced. △ Less

Submitted 25 January, 2023; originally announced January 2023.

arXiv:2301.01349 [pdf, other]

Quantitative Planning with Action Deception in Concurrent Stochastic Games

Authors: Chongyang Shi, Shuo Han, Jie Fu

Abstract: We study a class of two-player competitive concurrent stochastic games on graphs with reachability objectives. Specifically, player 1 aims to reach a subset $F_1$ of game states, and player 2 aims to reach a subset $F_2$ of game states where $F_2\cap F_1=\emptyset$. Both players aim to satisfy their reachability objectives before their opponent does. Yet, the information players have about the gam… ▽ More We study a class of two-player competitive concurrent stochastic games on graphs with reachability objectives. Specifically, player 1 aims to reach a subset $F_1$ of game states, and player 2 aims to reach a subset $F_2$ of game states where $F_2\cap F_1=\emptyset$. Both players aim to satisfy their reachability objectives before their opponent does. Yet, the information players have about the game dynamics is asymmetric: P1 has a (set of) hidden actions unknown to P2 at the beginning of their interaction. In this setup, we investigate P1's strategic planning of action deception that decides when to deviate from the Nash equilibrium in P2's game model and employ a hidden action, so that P1 can maximize the value of action deception, which is the additional payoff compared to P1's payoff in the game where P2 has complete information. Anticipating that P2 may detect his misperception about the game and adapt his strategy during interaction in unpredictable ways, we construct a planning problem for P1 to augment the game model with an incomplete model about the theory of mind of the opponent P2. While planning in the augmented game, P1 can effectively influence P2's perception so as to entice P2 to take actions that benefit P1. We prove that the proposed deceptive planning algorithm maximizes a lower bound on the value of action deception and demonstrate the effectiveness of our deceptive planning algorithm using a robot motion planning problem inspired by soccer games. △ Less

Submitted 22 March, 2023; v1 submitted 3 January, 2023; originally announced January 2023.

arXiv:2212.00661 [pdf, other]

Hybrid Gate-Pulse Model for Variational Quantum Algorithms

Authors: Zhiding Liang, Zhixin Song, Jinglei Cheng, Zichang He, Ji Liu, Hanrui Wang, Ruiyang Qin, Yiru Wang, Song Han, Xuehai Qian, Yiyu Shi

Abstract: Current quantum programs are mostly synthesized and compiled on the gate-level, where quantum circuits are composed of quantum gates. The gate-level workflow, however, introduces significant redundancy when quantum gates are eventually transformed into control signals and applied on quantum devices. For superconducting quantum computers, the control signals are microwave pulses. Therefore, pulse-l… ▽ More Current quantum programs are mostly synthesized and compiled on the gate-level, where quantum circuits are composed of quantum gates. The gate-level workflow, however, introduces significant redundancy when quantum gates are eventually transformed into control signals and applied on quantum devices. For superconducting quantum computers, the control signals are microwave pulses. Therefore, pulse-level optimization has gained more attention from researchers due to their advantages in terms of circuit duration. Recent works, however, are limited by their poor scalability brought by the large parameter space of control signals. In addition, the lack of gate-level "knowledge" also affects the performance of pure pulse-level frameworks. We present a hybrid gate-pulse model that can mitigate these problems. We propose to use gate-level compilation and optimization for "fixed" part of the quantum circuits and to use pulse-level methods for problem-agnostic parts. Experimental results demonstrate the efficiency of the proposed framework in discrete optimization tasks. We achieve a performance boost at most 8% with 60% shorter pulse duration in the problem-agnostic layer. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: 8 pages, 6 figures

arXiv:2211.13797 [pdf, other]

Data-Driven Distributionally Robust Electric Vehicle Balancing for Autonomous Mobility-on-Demand Systems under Demand and Supply Uncertainties

Authors: Sihong He, Zhili Zhang, Shuo Han, Lynn Pepin, Guang Wang, Desheng Zhang, John Stankovic, Fei Miao

Abstract: Electric vehicles (EVs) are being rapidly adopted due to their economic and societal benefits. Autonomous mobility-on-demand (AMoD) systems also embrace this trend. However, the long charging time and high recharging frequency of EVs pose challenges to efficiently managing EV AMoD systems. The complicated dynamic charging and mobility process of EV AMoD systems makes the demand and supply uncertai… ▽ More Electric vehicles (EVs) are being rapidly adopted due to their economic and societal benefits. Autonomous mobility-on-demand (AMoD) systems also embrace this trend. However, the long charging time and high recharging frequency of EVs pose challenges to efficiently managing EV AMoD systems. The complicated dynamic charging and mobility process of EV AMoD systems makes the demand and supply uncertainties significant when designing vehicle balancing algorithms. In this work, we design a data-driven distributionally robust optimization (DRO) approach to balance EVs for both the mobility service and the charging process. The optimization goal is to minimize the worst-case expected cost under both passenger mobility demand uncertainties and EV supply uncertainties. We then propose a novel distributional uncertainty sets construction algorithm that guarantees the produced parameters are contained in desired confidence regions with a given probability. To solve the proposed DRO AMoD EV balancing problem, we derive an equivalent computationally tractable convex optimization problem. Based on real-world EV data of a taxi system, we show that with our solution the average total balancing cost is reduced by 14.49%, and the average mobility fairness and charging fairness are improved by 15.78% and 34.51%, respectively, compared to solutions that do not consider uncertainties. △ Less

Submitted 24 November, 2022; originally announced November 2022.

Comments: 16 pages

arXiv:2211.11248 [pdf, other]

Video Background Music Generation: Dataset, Method and Evaluation

Authors: Le Zhuo, Zhaokai Wang, Baisen Wang, Yue Liao, Chenxi Bao, Stanley Peng, Songhao Han, Aixi Zhang, Fei Fang, Si Liu

Abstract: Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires music-video datasets, efficient architectures for video-to-music generation, and reasonable metrics, none of which currently exist. To close this gap, we introduce a comp… ▽ More Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires music-video datasets, efficient architectures for video-to-music generation, and reasonable metrics, none of which currently exist. To close this gap, we introduce a complete recipe including dataset, benchmark model, and evaluation metric for video background music generation. We present SymMV, a video and symbolic music dataset with various musical annotations. To the best of our knowledge, it is the first video-music dataset with rich musical annotations. We also propose a benchmark video background music generation framework named V-MusProd, which utilizes music priors of chords, melody, and accompaniment along with video-music relations of semantic, color, and motion features. To address the lack of objective metrics for video-music correspondence, we design a retrieval-based metric VMCP built upon a powerful video-music representation learning model. Experiments show that with our dataset, V-MusProd outperforms the state-of-the-art method in both music quality and correspondence with videos. We believe our dataset, benchmark model, and evaluation metric will boost the development of video background music generation. Our dataset and code are available at https://github.com/zhuole1025/SymMV. △ Less

Submitted 4 August, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: Accepted by ICCV2023

arXiv:2211.09385 [pdf, other]

ComMU: Dataset for Combinatorial Music Generation

Authors: Lee Hyun, Taehyun Kim, Hyolim Kang, Minjoo Ki, Hyeonchan Hwang, Kwanho Park, Sharang Han, Seon Joo Kim

Abstract: Commercial adoption of automatic music composition requires the capability of generating diverse and high-quality music suitable for the desired context (e.g., music for romantic movies, action games, restaurants, etc.). In this paper, we introduce combinatorial music generation, a new task to create varying background music based on given conditions. Combinatorial music generation creates short s… ▽ More Commercial adoption of automatic music composition requires the capability of generating diverse and high-quality music suitable for the desired context (e.g., music for romantic movies, action games, restaurants, etc.). In this paper, we introduce combinatorial music generation, a new task to create varying background music based on given conditions. Combinatorial music generation creates short samples of music with rich musical metadata, and combines them to produce a complete music. In addition, we introduce ComMU, the first symbolic music dataset consisting of short music samples and their corresponding 12 musical metadata for combinatorial music generation. Notable properties of ComMU are that (1) dataset is manually constructed by professional composers with an objective guideline that induces regularity, and (2) it has 12 musical metadata that embraces composers' intentions. Our results show that we can generate diverse high-quality music only with metadata, and that our unique metadata such as track-role and extended chord quality improves the capacity of the automatic composition. We highly recommend watching our video before reading the paper (https://pozalabs.github.io/ComMU). △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 19 pages, 12 figures

Showing 1–50 of 142 results for author: Han, S