Skip to main content

Showing 1–50 of 270 results for author: Wu, H

  1. arXiv:2407.12229  [pdf, other

    eess.AS cs.AI eess.SP

    Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

    Authors: Haibin Wu, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Daniel Tompkins, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Sheng Zhao, Jinyu Li, Naoyuki Kanda

    Abstract: People change their tones of voice, often accompanied by nonverbal vocalizations (NVs) such as laughter and cries, to convey rich emotions. However, most text-to-speech (TTS) systems lack the capability to generate speech with rich emotions, including NVs. This paper introduces EmoCtrl-TTS, an emotion-controllable zero-shot TTS that can generate highly emotional speech with NVs for any speaker. Em… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: See https://aka.ms/emoctrl-tts for demo samples

  2. arXiv:2407.07088  [pdf, other

    cs.AI cs.LO eess.SY

    Safe and Reliable Training of Learning-Based Aerospace Controllers

    Authors: Udayan Mandal, Guy Amir, Haoze Wu, Ieva Daukantas, Fletcher Lee Newell, Umberto Ravaioli, Baoluo Meng, Michael Durling, Kerianne Hobbs, Milan Ganai, Tobey Shim, Guy Katz, Clark Barrett

    Abstract: In recent years, deep reinforcement learning (DRL) approaches have generated highly successful controllers for a myriad of complex domains. However, the opaque nature of these models limits their applicability in aerospace systems and safety-critical domains, in which a single mistake can have dire consequences. In this paper, we present novel advancements in both the training and verification of… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 10 pages, 3 figures

  3. arXiv:2407.05391  [pdf, other

    eess.SP

    Interference Management in MIMO-ISAC Systems: A Transceiver Design Approach

    Authors: Yangyang Niu, Zhiqing Wei, Dingyou Ma, Xiaoyu Yang, Huici Wu, Zhiyong Feng, Jianhua Yuan

    Abstract: The integrated sensing and communication (ISAC) system under multi-input multi-output (MIMO) architecture achieves dual functionalities of sensing and communication on the same platform by utilizing spatial gain, which provides a feasible paradigm facing spectrum congestion. However, the dual functionalities of sensing and communication operating simultaneously in the same platform bring severe in… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  4. arXiv:2407.04737  [pdf, other

    eess.SP cs.AI

    Hierarchical Decoupling Capacitor Optimization for Power Distribution Network of 2.5D ICs with Co-Analysis of Frequency and Time Domains Based on Deep Reinforcement Learning

    Authors: Yuanyuan Duan, Haiyang Feng, Zhiping Yu, Hanming Wu, Leilai Shao, Xiaolei Zhu

    Abstract: With the growing need for higher memory bandwidth and computation density, 2.5D design, which involves integrating multiple chiplets onto an interposer, emerges as a promising solution. However, this integration introduces significant challenges due to increasing data rates and a large number of I/Os, necessitating advanced optimization of the power distribution networks (PDNs) both on-chip and on… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  5. arXiv:2407.03228  [pdf, other

    eess.SP

    Movable Antenna-enabled RIS-aided Integrated Sensing and Communication

    Authors: Haisu Wu, Hong Ren, Cunhua Pan

    Abstract: In this paper, we investigate a movable antenna (MA)-aided integrated sensing and communication (ISAC) system, where a reconfigurable intelligent surface (RIS) is employed to enhance wireless communication and sensing performance in dead zones. Specifically, this paper aims to maximize the minimum beampattern gain at the RIS by jointly optimizing beamforming matrix at the base station (BS), the re… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 14 pages, 11 figures

  6. arXiv:2407.00678  [pdf, other

    eess.IV cs.CV

    A Review of Image Processing Methods in Prostate Ultrasound

    Authors: Haiqiao Wang, Hong Wu, Zhuoyuan Wang, Peiyan Yue, Dong Ni, Pheng-Ann Heng, Yi Wang

    Abstract: Prostate cancer (PCa) poses a significant threat to men's health, with early diagnosis being crucial for improving prognosis and reducing mortality rates. Transrectal ultrasound (TRUS) plays a vital role in the diagnosis and image-guided intervention of PCa.To facilitate physicians with more accurate and efficient computer-assisted diagnosis and interventions, many image processing algorithms in T… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  7. arXiv:2406.17757  [pdf, other

    eess.SY cs.RO

    Automatic Parameter Tuning of Self-Driving Vehicles

    Authors: Hung-Ju Wu, Vladislav Nenchev, Christian Rathgeber

    Abstract: Modern automated driving solutions utilize trajectory planning and control components with numerous parameters that need to be tuned for different driving situations and vehicle types to achieve optimal performance. This paper proposes a method to automatically tune such parameters to resemble expert demonstrations. We utilize a cost function which captures deviations of the closed-loop operation… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Preprint to appear at the 2024 IEEE Conference on Control Technology and Applications (CCTA)

  8. arXiv:2406.14069  [pdf, other

    eess.IV cs.CV

    Towards Multi-modality Fusion and Prototype-based Feature Refinement for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound

    Authors: Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zou, Jianhua Zhou, Yi Wang

    Abstract: Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classifica… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.09356  [pdf, other

    cs.CV eess.IV

    CMC-Bench: Towards a New Paradigm of Visual Signal Compression

    Authors: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin

    Abstract: Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2406.07237  [pdf, other

    eess.AS cs.SD

    CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems

    Authors: Haibin Wu, Yuan Tseng, Hung-yi Lee

    Abstract: Current state-of-the-art (SOTA) codec-based audio synthesis systems can mimic anyone's voice with just a 3-second sample from that specific unseen speaker. Unfortunately, malicious attackers may exploit these technologies, causing misuse and security issues. Anti-spoofing models have been developed to detect fake speech. However, the open question of whether current SOTA anti-spoofing models can e… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024, project page: https://codecfake.github.io/

  11. arXiv:2406.05065  [pdf, ps, other

    eess.AS

    Emo-bias: A Large Scale Evaluation of Social Bias on Speech Emotion Recognition

    Authors: Yi-Cheng Lin, Haibin Wu, Huang-Cheng Chou, Chi-Chun Lee, Hung-yi Lee

    Abstract: The rapid growth of Speech Emotion Recognition (SER) has diverse global applications, from improving human-computer interactions to aiding mental health diagnostics. However, SER models might contain social bias toward gender, leading to unfair outcomes. This study analyzes gender bias in SER models trained with Self-Supervised Learning (SSL) at scale, exploring factors influencing it. SSL-based S… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  12. arXiv:2406.04582  [pdf, other

    eess.AS cs.SD

    Neural Codec-based Adversarial Sample Detection for Speaker Verification

    Authors: Xuanjun Chen, Jiawei Du, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee

    Abstract: Automatic Speaker Verification (ASV), increasingly used in security-critical applications, faces vulnerabilities from rising adversarial attacks, with few effective defenses available. In this paper, we propose a neural codec-based adversarial sample detection method for ASV. The approach leverages the codec's ability to discard redundant perturbations and retain essential information. Specificall… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  13. arXiv:2406.03111  [pdf, other

    eess.AS eess.SP

    Singing Voice Graph Modeling for SingFake Detection

    Authors: Xuanjun Chen, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee

    Abstract: Detecting singing voice deepfakes, or SingFake, involves determining the authenticity and copyright of a singing voice. Existing models for speech deepfake detection have struggled to adapt to unseen attacks in this unique singing voice domain of human vocalization. To bridge the gap, we present a groundbreaking SingGraph model. The model synergizes the capabilities of the MERT acoustic music unde… ▽ More

    Submitted 9 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024; Our code is available at https://github.com/xjchenGit/SingGraph.git

  14. arXiv:2406.00723  [pdf, other

    cs.NI eess.SY

    Throughput and Link Utilization Improvement in Satellite Networks: A Learning-Enabled Approach

    Authors: Hao Wu

    Abstract: Satellite networks provide communication services to global users with an uneven geographical distribution. In densely populated regions, Inter-satellite links (ISLs) often experience congestion, blocking traffic from other links and leading to low link utilization and throughput. In such cases, delay-tolerant traffic can be withheld by moving satellites and carried to navigate congested areas, th… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 5 pages, 6 figures

  15. arXiv:2405.19298  [pdf, other

    cs.CV eess.IV

    Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

    Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

    Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA)… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  16. arXiv:2405.14058  [pdf, other

    cs.AI cs.LG eess.SY

    Formally Verifying Deep Reinforcement Learning Controllers with Lyapunov Barrier Certificates

    Authors: Udayan Mandal, Guy Amir, Haoze Wu, Ieva Daukantas, Fletcher Lee Newell, Umberto J. Ravaioli, Baoluo Meng, Michael Durling, Milan Ganai, Tobey Shim, Guy Katz, Clark Barrett

    Abstract: Deep reinforcement learning (DRL) is a powerful machine learning paradigm for generating agents that control autonomous systems. However, the "black box" nature of DRL agents limits their deployment in real-world safety-critical applications. A promising approach for providing strong guarantees on an agent's behavior is to use Neural Lyapunov Barrier (NLB) certificates, which are learned functions… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  17. arXiv:2405.10825  [pdf, other

    eess.SY cs.LG

    Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

    Authors: Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu

    Abstract: Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks bas… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  18. arXiv:2405.10606  [pdf, other

    eess.SP

    Carrier Aggregation Enabled MIMO-OFDM Integrated Sensing and Communication

    Authors: Haotian Liu, Zhiqing Wei, Jinghui Piao, Huici Wu, Xingwang Li, Zhiyong Feng

    Abstract: In the evolution towards the forthcoming era of sixth-generation (6G) mobile communication systems characterized by ubiquitous intelligence, integrated sensing and communication (ISAC) is in a phase of burgeoning development. However, the capabilities of communication and sensing within single frequency band fall short of meeting the escalating demands. To this end, this paper introduces a carrier… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 13page, 9figures, Submitted to IEEE Transactions on Wireless Communications

  19. arXiv:2405.09179  [pdf, other

    eess.SP

    Integrated Sensing and Communication Enabled Cooperative Passive Sensing Using Mobile Communication System

    Authors: Zhiqing Wei, Haotian Liu, Hujun Li, Wangjun Jiang, Zhiyong Feng, Huici Wu, Ping Zhang

    Abstract: Integrated sensing and communication (ISAC) is a potential technology of the sixth-generation (6G) mobile communication system, which enables communication base station (BS) with sensing capability. However, the performance of single-BS sensing is limited, which can be overcome by multi-BS cooperative sensing. There are three types of multi-BS cooperative sensing, including cooperative active sens… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 11 figures, Submitted to IEEE Transactions on Mobile Computing

  20. arXiv:2405.08745  [pdf, other

    eess.IV cs.CV cs.MM

    Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

    Authors: Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQ… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  21. arXiv:2405.02873  [pdf, other

    eess.SP

    Target Localization with Macro and Micro Base Stations Cooperative Sensing

    Authors: Haotian Liu, Zhiqing Wei, Furong Yang, Huici Wu, Kaifeng Han, Zhiyong Feng

    Abstract: Addressing the communication and sensing demands of sixth-generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered traction in academia and industry. With the sensing limitation of single base station (BS), multi-BS cooperative sensing is regarded as a promising solution. The coexistence and overlapped coverage of macro BS (MBS) and micro BS (MiBS) are… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 7 pages 6 figures, submitted to 2024 IEEE GLOBECOM

  22. arXiv:2405.01258  [pdf, other

    cs.CV cs.RO eess.IV

    Towards Consistent Object Detection via LiDAR-Camera Synergy

    Authors: Kai Luo, Hao Wu, Kefu Yi, Kailun Yang, Wei Hao, Rongdong Hu

    Abstract: As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. However, currently, no model exists that can simultaneously detect an object's position in both point clouds and images and ascertain their corresponding relatio… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: The source code will be made publicly available at https://github.com/xifen523/COD

  23. arXiv:2404.15854  [pdf, other

    cs.CR cs.LG cs.SD eess.AS

    CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning

    Authors: Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu

    Abstract: The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE TDSC

  24. arXiv:2404.14862  [pdf, other

    eess.SP

    Deep Learning Based Multi-Node ISAC 4D Environmental Reconstruction with Uplink- Downlink Cooperation

    Authors: Bohao Lu, Zhiqing Wei, Huici Wu, Xinrui Zeng, Lin Wang, Xi Lu, Dongyang Mei, Zhiyong Feng

    Abstract: Utilizing widely distributed communication nodes to achieve environmental reconstruction is one of the significant scenarios for Integrated Sensing and Communication (ISAC) and a crucial technology for 6G. To achieve this crucial functionality, we propose a deep learning based multi-node ISAC 4D environment reconstruction method with Uplink-Downlink (UL-DL) cooperation, which employs virtual apert… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 13 pages,21 figures,4 tables

  25. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  26. arXiv:2404.09385  [pdf, other

    eess.AS cs.CL eess.SP

    A Large-Scale Evaluation of Speech Foundation Models

    Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

    Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

  27. arXiv:2403.13615  [pdf, other

    cs.IT eess.SP

    MIMO Channel as a Neural Function: Implicit Neural Representations for Extreme CSI Compression in Massive MIMO Systems

    Authors: Haotian Wu, Maojun Zhang, Yulin Shao, Krystian Mikolajczyk, Deniz Gündüz

    Abstract: Acquiring and utilizing accurate channel state information (CSI) can significantly improve transmission performance, thereby holding a crucial role in realizing the potential advantages of massive multiple-input multiple-output (MIMO) technology. Current prevailing CSI feedback approaches improve precision by employing advanced deep-learning methods to learn representative CSI features for a subse… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    MSC Class: 94A24 ACM Class: E.4

  28. arXiv:2403.12620  [pdf, ps, other

    eess.SP

    Near-Field Channel Estimation in Dual-Band XL-MIMO with Side Information-Assisted Compressed Sensing

    Authors: Haochen Wu, Liyang Lu, Zhaocheng Wang

    Abstract: Near-field communication comes to be an indispensable part of the future sixth generation (6G) communications at the arrival of the forth-coming deployment of extremely large-scale multiple-input-multiple-output (XL-MIMO) systems. Due to the substantial number of antennas, the electromagnetic radiation field is modeled by the spherical waves instead of the conventional planar waves, leading to sev… ▽ More

    Submitted 16 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: This paper has been submitted to IEEE Transactions on Communications (IEEE TCOM)

  29. arXiv:2403.10805  [pdf, other

    cs.SD cs.AI cs.CV cs.GR cs.HC eess.AS

    Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference

    Authors: Fan Zhang, Zhaohan Wang, Xin Lyu, Siyuan Zhao, Mengjian Li, Weidong Geng, Naye Ji, Hui Du, Fuxing Gao, Hao Wu, Shunman Li

    Abstract: Speech-driven gesture generation is an emerging field within virtual human creation. However, a significant challenge lies in accurately determining and processing the multitude of input features (such as acoustic, semantic, emotional, personality, and even subtle unknown features). Traditional approaches, reliant on various explicit feature inputs and complex multimodal processing, constrain the… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 12 pages,

  30. arXiv:2403.10613  [pdf, other

    eess.SP cs.IT

    Process-and-Forward: Deep Joint Source-Channel Coding Over Cooperative Relay Networks

    Authors: Chenghong Bian, Yulin Shao, Haotian Wu, Emre Ozfatura, Deniz Gunduz

    Abstract: This paper introduces an innovative deep joint source-channel coding (DeepJSCC) approach to image transmission over a cooperative relay channel. The relay either amplifies and forwards a scaled version of its received signal, referred to as DeepJSCC-AF, or leverages neural networks to extract relevant features about the source signal before forwarding it to the destination, which we call DeepJSCC-… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Submitted for possible IEEE journal

  31. arXiv:2403.02565  [pdf, other

    eess.SP

    Deep Cooperation in ISAC System: Resource, Node and Infrastructure Perspectives

    Authors: Zhiqing Wei, Haotian Liu, Zhiyong Feng, Huici Wu, Fan Liu, Qixun Zhang, Yucong Du

    Abstract: With the emerging Integrated Sensing and Communication (ISAC) technique, exploiting the mobile communication system with multi-domain resources, multiple network elements, and large-scale infrastructures to realize cooperative sensing is a crucial approach satisfying the requirements of high-accuracy and large-scale sensing in IoE. In this article, the deep cooperation in ISAC system including thr… ▽ More

    Submitted 29 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 8 pages and 6 figures, Accepted by IEEE Internet of Things Magazine

  32. arXiv:2403.00897  [pdf, other

    eess.IV astro-ph.GA cs.AI cs.CV cs.LG

    VisRec: A Semi-Supervised Approach to Radio Interferometric Data Reconstruction

    Authors: Ruoqi Wang, Haitao Wang, Qiong Luo, Feng Wang, Hejun Wu

    Abstract: Radio telescopes produce visibility data about celestial objects, but these data are sparse and noisy. As a result, images created on raw visibility data are of low quality. Recent studies have used deep learning models to reconstruct visibility data to get cleaner images. However, these methods rely on a substantial amount of labeled training data, which requires significant labeling effort from… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  33. arXiv:2403.00671  [pdf, other

    eess.IV

    Asymmetric Feature Fusion for Image Retrieval

    Authors: Hui Wu, Min Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li

    Abstract: In asymmetric retrieval systems, models with different capacities are deployed on platforms with different computational and storage resources. Despite the great progress, existing approaches still suffer from a dilemma between retrieval efficiency and asymmetric accuracy due to the limited capacity of the lightweight query model. In this work, we propose an Asymmetric Feature Fusion (AFF) paradig… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  34. arXiv:2403.00648  [pdf, other

    eess.IV

    Structure Similarity Preservation Learning for Asymmetric Image Retrieval

    Authors: Hui Wu, Min Wang, Wengang Zhou, Houqiang Li

    Abstract: Asymmetric image retrieval is a task that seeks to balance retrieval accuracy and efficiency by leveraging lightweight and large models for the query and gallery sides, respectively. The key to asymmetric image retrieval is realizing feature compatibility between different models. Despite the great progress, most existing approaches either rely on classifiers inherited from gallery models or simpl… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  35. arXiv:2402.16907  [pdf, other

    eess.IV cs.CV cs.LG

    Diffusion Posterior Proximal Sampling for Image Restoration

    Authors: Hongjie Wu, Linchao He, Mingqin Zhang, Dongdong Chen, Kunming Luo, Mengting Luo, Ji-Zhe Zhou, Hu Chen, Jiancheng Lv

    Abstract: Diffusion models have demonstrated remarkable efficacy in generating high-quality samples. Existing diffusion-based image restoration algorithms exploit pre-trained diffusion models to leverage data priors, yet they still preserve elements inherited from the unconditional generation paradigm. These strategies initiate the denoising process with pure white noise and incorporate random noise at each… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  36. arXiv:2402.16749  [pdf, other

    cs.CV cs.AI eess.IV

    MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

    Authors: Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, Wenjun Zhang

    Abstract: With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To… ▽ More

    Submitted 17 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 13 page, 11 figures, 4 tables

  37. arXiv:2402.13236  [pdf, other

    eess.AS cs.SD

    Towards audio language modeling -- an overview

    Authors: Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-yi Lee

    Abstract: Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency. Researchers recently discovered the potential of codecs as suitable tokenizers for converting continuous audio into discrete codes, which can be employed to develop audio language models (LMs). Numerous high-performance neural audio codecs and codec-based LMs have been developed.… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  38. arXiv:2402.13071  [pdf, other

    eess.AS cs.SD

    Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

    Authors: Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-yi Lee

    Abstract: The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance. Recent years have witnessed significant developments in codec models. The ideal sound codec should preserve content, paralinguistics, speakers, and audio information. However, the question of which codec achieves optimal sound information preservation remains unanswere… ▽ More

    Submitted 7 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Github: https://github.com/voidful/Codec-SUPERB

  39. arXiv:2402.13018  [pdf, other

    eess.AS cs.SD

    EMO-SUPERB: An In-depth Look at Speech Emotion Recognition

    Authors: Haibin Wu, Huang-Cheng Chou, Kai-Wei Chang, Lucas Goncalves, Jiawei Du, Jyh-Shing Roger Jang, Chi-Chun Lee, Hung-Yi Lee

    Abstract: Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems. However, 80.77% of SER papers yield results that cannot be reproduced. We develop EMO-SUPERB, short for EMOtion Speech Universal PERformance Benchmark, which aims to enhance open-source initiatives for SER. EMO-SUPERB includes a user-friendly codebase to leverage 15 state-of-the-art speech self-supervi… ▽ More

    Submitted 12 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: webpage: https://emosuperb.github.io/

  40. arXiv:2402.09421  [pdf, other

    eess.SP cs.LG

    EEG Based Generative Depression Discriminator

    Authors: Ziming Mao, Hao wu, Yongxi Tan, Yuhe Jin

    Abstract: Depression is a very common but serious mood disorder.In this paper, We built a generative detection network(GDN) in accordance with three physiological laws. Our aim is that we expect the neural network to learn the relevant brain activity based on the EEG signal and, at the same time, to regenerate the target electrode signal based on the brain activity. We trained two generators, the first one… ▽ More

    Submitted 19 January, 2024; originally announced February 2024.

  41. arXiv:2402.08987  [pdf, other

    eess.IV cs.CV

    Multi-modality transrectal ultrasound video classification for identification of clinically significant prostate cancer

    Authors: Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zhou, Jianhua Zhou, Yi Wang

    Abstract: Prostate cancer is the most common noncutaneous cancer in the world. Recently, multi-modality transrectal ultrasound (TRUS) has increasingly become an effective tool for the guidance of prostate biopsies. With the aim of effectively identifying prostate cancer, we propose a framework for the classification of clinically significant prostate cancer (csPCa) from multi-modality TRUS videos. The frame… ▽ More

    Submitted 17 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  42. arXiv:2402.05390  [pdf, other

    cs.NI eess.SP

    Integrated Sensing and Communication Driven Digital Twin for Intelligent Machine Network

    Authors: Zhiqing Wei, Yucong Du, Qixun Zhang, Wangjun Jiang, Yanpeng Cui, Zeyang Meng, Huici Wu, Zhiyong Feng

    Abstract: Intelligent machines (IMs), including industrial machines, unmanned aerial vehicles (UAVs), and unmanned vehicles, etc., could perform effective cooperation in complex environment when they form IM network. The efficient environment sensing and communication are crucial for IM network, enabling the real-time and stable control of IMs. With the emergence of integrated sensing and communication (ISA… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures, 1 Table

    ACM Class: C.2.1

  43. arXiv:2401.10959  [pdf

    eess.SY

    Machine learning classification of power converter control mode

    Authors: Rabah Ouali, Jean-Yves Dieulot, Pascal Yim, Xavier Guillaud, Frédéric Colas, Yang Wu, Heng Wu

    Abstract: To ensure the proper functioning of the current and future electrical grid, it is necessary for Transmission System Operators (TSOs) to verify that energy providers comply with the grid code and specifications provided by TSOs. A lot of energy production are conntected to the grid through a power electronic inverter. Grid Forming (GFM) and Grid Following (GFL) are the two types of operating modes… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  44. arXiv:2401.04935  [pdf, other

    cs.MM cs.CL cs.SD eess.AS

    Learning Audio Concepts from Counterfactual Natural Language

    Authors: Ali Vosoughi, Luca Bondi, Ho-Hsiang Wu, Chenliang Xu

    Abstract: Conventional audio classification relied on predefined classes, lacking the ability to learn from free-form text. Recent methods unlock learning joint audio-text embeddings from raw audio-text pairs describing audio in natural language. Despite recent advancements, there is little exploration of systematic methods to train models for recognizing sound events and sources in alternative scenarios, s… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  45. arXiv:2401.01117  [pdf, other

    cs.CV eess.IV

    Q-Refine: A Perceptual Quality Refiner for AI-Generated Image

    Authors: Chunyi Li, Haoning Wu, Zicheng Zhang, Hongkun Hao, Kaiwei Zhang, Lei Bai, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: With the rapid evolution of the Text-to-Image (T2I) model in recent years, their unsatisfactory generation result has become a challenge. However, uniformly refining AI-Generated Images (AIGIs) of different qualities not only limited optimization capabilities for low-quality AIGIs but also brought negative optimization to high-quality AIGIs. To address this issue, a quality-award refiner named Q-R… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: 6 pages, 5 figures

  46. arXiv:2312.15416  [pdf, other

    eess.SY

    On Completeness of SDP-Based Barrier Certificate Synthesis over Unbounded Domains

    Authors: Hao Wu, Shenghua Feng, Ting Gan, Jie Wang, Bican Xia, Naijun Zhan

    Abstract: Barrier certificates, serving as differential invariants that witness system safety, play a crucial role in the verification of cyber-physical systems (CPS). Prevailing computational methods for synthesizing barrier certificates are based on semidefinite programming (SDP) by exploiting Putinar Positivstellensatz. Consequently, these approaches are limited by the Archimedean condition, which requir… ▽ More

    Submitted 8 July, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

    Comments: Accepted by the 26th international symposium on Formal Methods (FM2024). 18 pages, 1 figure. Updated on 2024.7.9, fix two typos in Lemma 1 and Equation 10

  47. arXiv:2312.08622  [pdf, other

    eess.AS cs.LG cs.SD

    Scalable Ensemble-based Detection Method against Adversarial Attacks for speaker verification

    Authors: Haibin Wu, Heng-Cheng Kuo, Yu Tsao, Hung-yi Lee

    Abstract: Automatic speaker verification (ASV) is highly susceptible to adversarial attacks. Purification modules are usually adopted as a pre-processing to mitigate adversarial noise. However, they are commonly implemented across diverse experimental settings, rendering direct comparisons challenging. This paper comprehensively compares mainstream purification techniques in a unified framework. We find the… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Submitted to 2024 ICASSP

  48. arXiv:2312.02170  [pdf, other

    cs.IT eess.SP

    A 5G DMRS-based Signal for Integrated Sensing and Communication System

    Authors: Zhiqing Wei, Fengyun Li, Haotian Liu, Xu Chen, Huici Wu, Kaifeng Han, Zhiyong Feng

    Abstract: Integrated sensing and communication (ISAC) is considered as the potential key technology of the future mobile communication systems. The signal design is fundamental for the ISAC system. The reference signals in mobile communication systems have good detection performance, which is worth further research. Existing studies applied the single reference signal to radar sensing. In this paper, a mult… ▽ More

    Submitted 2 March, 2024; v1 submitted 1 November, 2023; originally announced December 2023.

  49. arXiv:2312.01785  [pdf

    eess.SY

    Closed-Form Solutions for Grid-Forming Converters: A Design-Oriented Study

    Authors: Fangzhou Zhao, Tianhua Zhu, Lennart Harnefors, Bo Fan, Heng Wu, Zichao Zhou, Yin Sun, Xiongfei Wang

    Abstract: This paper derives closed-form solutions for grid-forming converters with power synchronization control (PSC) by subtly simplifying and factorizing the complex closed-loop models. The solutions can offer clear analytical insights into control-loop interactions, enabling guidelines for robust controller design. It is proved that 1) the proportional gains of PSC and alternating voltage control (AVC)… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  50. Throughput Maximization for Intelligent Refracting Surface Assisted mmWave High-Speed Train Communications

    Authors: Jing Li, Yong Niu, Hao Wu, Bo Ai, Ruisi He, Ning Wang, Sheng Chen

    Abstract: With the increasing demands from passengers for data-intensive services, millimeter-wave (mmWave) communication is considered as an effective technique to release the transmission pressure on high speed train (HST) networks. However, mmWave signals ncounter severe losses when passing through the carriage, which decreases the quality of services on board. In this paper, we investigate an intelligen… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 13 pages, 7 figures, IEEE Internet of Things Journal