Skip to main content

Showing 1–50 of 1,401 results for author: Wu, H

  1. arXiv:2407.13278  [pdf, other

    cs.LG

    Deep Time Series Models: A Comprehensive Survey and Benchmark

    Authors: Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, Jianmin Wang

    Abstract: Time series, characterized by a sequence of data points arranged in a discrete-time order, are ubiquitous in real-world applications. Different from other modalities, time series present unique challenges due to their complex and dynamic nature, including the entanglement of nonlinear patterns and time-variant trends. Analyzing time series data is of great significance in real-world scenarios and… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: \

  2. arXiv:2407.13221  [pdf, other

    cs.CV

    Multimodal Label Relevance Ranking via Reinforcement Learning

    Authors: Taian Guo, Taolin Zhang, Haoqian Wu, Hanjun Li, Ruizhi Qiao, Xing Sun

    Abstract: Conventional multi-label recognition methods often focus on label confidence, frequently overlooking the pivotal role of partial order relations consistent with human preference. To resolve these issues, we introduce a novel method for multimodal label relevance ranking, named Label Relevance Ranking with Proximal Policy Optimization (LR\textsuperscript{2}PPO), which effectively discerns partial o… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV2024

  3. arXiv:2407.13193  [pdf, other

    cs.CL

    Retrieval-Augmented Generation for Natural Language Processing: A Survey

    Authors: Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

    Abstract: Large language models (LLMs) have demonstrated great success in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowledge update issues, and lacking domain-specific expertise. The appearance of retrieval-augmented generation (RAG), which leverages an external knowledge database… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.12229  [pdf, other

    eess.AS cs.AI eess.SP

    Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

    Authors: Haibin Wu, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Daniel Tompkins, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Sheng Zhao, Jinyu Li, Naoyuki Kanda

    Abstract: People change their tones of voice, often accompanied by nonverbal vocalizations (NVs) such as laughter and cries, to convey rich emotions. However, most text-to-speech (TTS) systems lack the capability to generate speech with rich emotions, including NVs. This paper introduces EmoCtrl-TTS, an emotion-controllable zero-shot TTS that can generate highly emotional speech with NVs for any speaker. Em… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: See https://aka.ms/emoctrl-tts for demo samples

  5. arXiv:2407.11882  [pdf, other

    cs.CR

    Enhancing Covert Communication in Relay Systems Using Multi-Antenna Technique

    Authors: He Zhu, Huihui Wu, Wei Su, Xiaohong Jiang

    Abstract: This paper exploits the multi-antenna technique to enhance the covert communication performance in a relay system, where a source S conducts covert communication with a destination D via a relay R, subjecting to the detections of transmissions in the two hops from a single-antenna warden W. To demonstrate the performance gain from adopting the multi-antenna technique, we first consider the scenari… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  6. arXiv:2407.10956  [pdf, other

    cs.AI cs.CL

    Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

    Authors: Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu

    Abstract: Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivit… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 34 pages, 14 figures, 10 tables

  7. arXiv:2407.10810  [pdf, other

    cs.CV cs.AI cs.AR cs.LG

    FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries

    Authors: Yuqi Jiang, Xudong Lu, Qian Jin, Qi Sun, Hanming Wu, Cheng Zhuo

    Abstract: Intelligence is key to advancing integrated circuit (IC) fabrication. Recent breakthroughs in Large Multimodal Models (LMMs) have unlocked unparalleled abilities in understanding images and text, fostering intelligent fabrication. Leveraging the power of LMMs, we introduce FabGPT, a customized IC fabrication large multimodal model for wafer defect knowledge query. FabGPT manifests expertise in con… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  8. arXiv:2407.10179  [pdf, other

    cs.CV

    CLIP-Guided Networks for Transferable Targeted Attacks

    Authors: Hao Fang, Jiawei Kong, Bin Chen, Tao Dai, Hao Wu, Shu-Tao Xia

    Abstract: Transferable targeted adversarial attacks aim to mislead models into outputting adversary-specified predictions in black-box scenarios. Recent studies have introduced \textit{single-target} generative attacks that train a generator for each target class to generate highly transferable perturbations, resulting in substantial computational overhead when handling multiple classes. \textit{Multi-targe… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  9. arXiv:2407.09873  [pdf, other

    cs.IT cs.AI

    Resource Management for Low-latency Cooperative Fine-tuning of Foundation Models at the Network Edge

    Authors: Hai Wu, Xu Chen, Kaibin Huang

    Abstract: The emergence of large-scale foundation models (FoMo's) that can perform human-like intelligence motivates their deployment at the network edge for devices to access state-of-the-art artificial intelligence. For better user experiences, the pre-trained FoMo's need to be adapted to specialized downstream tasks through fine-tuning techniques. To transcend a single device's memory and computation lim… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  10. arXiv:2407.08958  [pdf, other

    cs.SE

    Towards Practical and Useful Automated Program Repair for Debugging

    Authors: Qi Xin, Haojun Wu, Steven P. Reiss, Jifeng Xuan

    Abstract: Current automated program repair (APR) techniques are far from being practical and useful enough to be considered for realistic debugging. They rely on unrealistic assumptions including the requirement of a comprehensive suite of test cases as the correctness criterion and frequent program re-execution for patch validation; they are not fast; and their ability of repairing the commonly arising com… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  11. arXiv:2407.08561  [pdf, other

    cs.CV

    MapLocNet: Coarse-to-Fine Feature Registration for Visual Re-Localization in Navigation Maps

    Authors: Hang Wu, Zhenghao Zhang, Siyuan Lin, Xiangru Mu, Qiang Zhao, Ming Yang, Tong Qin

    Abstract: Robust localization is the cornerstone of autonomous driving, especially in challenging urban environments where GPS signals suffer from multipath errors. Traditional localization approaches rely on high-definition (HD) maps, which consist of precisely annotated landmarks. However, building HD map is expensive and challenging to scale up. Given these limitations, leveraging navigation maps has eme… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: IROS 2024 (Oral)

  12. arXiv:2407.08526  [pdf, other

    cs.CV

    BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

    Authors: Hang Wu, Zhenghao Zhang, Siyuan Lin, Tong Qin, Jin Pan, Qiang Zhao, Chunjing Xu, Ming Yang

    Abstract: Bird's-eye-view (BEV) representation is crucial for the perception function in autonomous driving tasks. It is difficult to balance the accuracy, efficiency and range of BEV representation. The existing works are restricted to a limited perception range within 50 meters. Extending the BEV representation range can greatly benefit downstream tasks such as topology reasoning, scene understanding, and… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: IEEE IV 2024

  13. arXiv:2407.07710  [pdf, ps, other

    cs.CR cs.IT math.NT

    On the differential and Walsh spectra of $x^{2q+1}$ over $\mathbb{F}_{q^2}$

    Authors: Sihem Mesnager, Huawei Wu

    Abstract: Let $q$ be an odd prime power and let $\mathbb{F}_{q^2}$ be the finite field with $q^2$ elements. In this paper, we determine the differential spectrum of the power function $F(x)=x^{2q+1}$ over $\mathbb{F}_{q^2}$. When the characteristic of $\mathbb{F}_{q^2}$ is $3$, we also determine the value distribution of the Walsh spectrum of $F$, showing that it is $4$-valued, and use the obtained result t… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  14. arXiv:2407.07504  [pdf, other

    cs.CV

    Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

    Authors: Kun Wu, Zhiguo Jiang, Kunming Tang, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng

    Abstract: Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images focus on learning patch features, while there is still a lack of available pre-training models for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation… ▽ More

    Submitted 15 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  15. arXiv:2407.07088  [pdf, other

    cs.AI cs.LO eess.SY

    Safe and Reliable Training of Learning-Based Aerospace Controllers

    Authors: Udayan Mandal, Guy Amir, Haoze Wu, Ieva Daukantas, Fletcher Lee Newell, Umberto Ravaioli, Baoluo Meng, Michael Durling, Kerianne Hobbs, Milan Ganai, Tobey Shim, Guy Katz, Clark Barrett

    Abstract: In recent years, deep reinforcement learning (DRL) approaches have generated highly successful controllers for a myriad of complex domains. However, the opaque nature of these models limits their applicability in aerospace systems and safety-critical domains, in which a single mistake can have dire consequences. In this paper, we present novel advancements in both the training and verification of… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 10 pages, 3 figures

  16. arXiv:2407.06546  [pdf, other

    cs.CV cs.RO

    Exploring the Causality of End-to-End Autonomous Driving

    Authors: Jiankun Li, Hao Li, Jiangjiang Liu, Zhikang Zou, Xiaoqing Ye, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

    Abstract: Deep learning-based models are widely deployed in autonomous driving areas, especially the increasingly noticed end-to-end solutions. However, the black-box property of these models raises concerns about their trustworthiness and safety for autonomous driving, and how to debug the causality has become a pressing concern. Despite some existing research on the explainability of autonomous driving, t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  17. arXiv:2407.05679  [pdf, other

    cs.CV cs.AI

    BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

    Authors: Yumeng Zhang, Shi Gong, Kaixin Xiong, Xiaoqing Ye, Xiao Tan, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

    Abstract: World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence… ▽ More

    Submitted 18 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: 10 pages

  18. arXiv:2407.05413  [pdf, other

    cs.AI cs.CL cs.LG

    SBoRA: Low-Rank Adaptation with Regional Weight Updates

    Authors: Lai-Man Po, Yuyang Liu, Haoxuan Wu, Tianqi Zhang, Wing-Yin Yu, Zeyu Jiang, Kun Li

    Abstract: This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models that builds upon the pioneering works of Low-Rank Adaptation (LoRA) and Orthogonal Adaptation. SBoRA further reduces the computational and memory requirements of LoRA while enhancing learning performance. By leveraging orthogonal standard basis vectors to initialize one of… ▽ More

    Submitted 10 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: 15 pages, 2 figures

  19. arXiv:2407.04737  [pdf, other

    eess.SP cs.AI

    Hierarchical Decoupling Capacitor Optimization for Power Distribution Network of 2.5D ICs with Co-Analysis of Frequency and Time Domains Based on Deep Reinforcement Learning

    Authors: Yuanyuan Duan, Haiyang Feng, Zhiping Yu, Hanming Wu, Leilai Shao, Xiaolei Zhu

    Abstract: With the growing need for higher memory bandwidth and computation density, 2.5D design, which involves integrating multiple chiplets onto an interposer, emerges as a promising solution. However, this integration introduces significant challenges due to increasing data rates and a large number of I/Os, necessitating advanced optimization of the power distribution networks (PDNs) both on-chip and on… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  20. arXiv:2407.04217  [pdf, other

    cs.DB cs.IR

    An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models

    Authors: Mengzhao Wang, Haotian Wu, Xiangyu Ke, Yunjun Gao, Xiaoliang Xu, Lu Chen

    Abstract: Retrieval-augmented Large Language Models (LLMs) have reshaped traditional query-answering systems, offering unparalleled user experiences. However, existing retrieval techniques often struggle to handle multi-modal query contexts. In this paper, we present an interactive Multi-modal Query Answering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: This demo paper has been accepted by VLDB 2024

  21. arXiv:2407.04053  [pdf, other

    cs.DC

    Edge AI: A Taxonomy, Systematic Review and Future Directions

    Authors: Sukhpal Singh Gill, Muhammed Golec, Jianmin Hu, Minxian Xu, Junhui Du, Huaming Wu, Guneet Kaur Walia, Subramaniam Subramanian Murugesan, Babar Ali, Mohit Kumar, Kejiang Ye, Prabal Verma, Surendra Kumar, Felix Cuadrado, Steve Uhlig

    Abstract: Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Preprint Version, 18 Figures

  22. arXiv:2407.02235  [pdf

    cs.CL

    Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation

    Authors: Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, Wenting Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, Jiing-Feng Lirng, Kai-Wei Chang, Shih-Hwa Chiou

    Abstract: Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, includin… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 6 figures, 5 supplementary figures, 8 supplementary tables

  23. arXiv:2407.01235  [pdf, other

    cs.CR

    A Fingerprint for Large Language Models

    Authors: Zhiguang Yang, Hanzhou Wu

    Abstract: Recent advances show that scaling a pre-trained language model could achieve state-of-the-art performance on many downstream tasks, prompting large language models (LLMs) to become a hot research topic in the field of artificial intelligence. However, due to the resource-intensive nature of training LLMs from scratch, it is urgent and crucial to protect the intellectual property of LLMs against in… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: https://scholar.google.com/citations?user=IdiF7M0AAAAJ&hl=en

  24. arXiv:2407.01178  [pdf, other

    cs.CL cs.AI cs.LG

    $\text{Memory}^3$: Language Modeling with Explicit Memory

    Authors: Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

    Abstract: The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    MSC Class: 68T50 ACM Class: I.2.7

  25. arXiv:2407.00678  [pdf, other

    eess.IV cs.CV

    A Review of Image Processing Methods in Prostate Ultrasound

    Authors: Haiqiao Wang, Hong Wu, Zhuoyuan Wang, Peiyan Yue, Dong Ni, Pheng-Ann Heng, Yi Wang

    Abstract: Prostate cancer (PCa) poses a significant threat to men's health, with early diagnosis being crucial for improving prognosis and reducing mortality rates. Transrectal ultrasound (TRUS) plays a vital role in the diagnosis and image-guided intervention of PCa.To facilitate physicians with more accurate and efficient computer-assisted diagnosis and interventions, many image processing algorithms in T… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  26. arXiv:2407.00625  [pdf, other

    cs.LO

    Nonlinear Craig Interpolant Generation over Unbounded Domains by Separating Semialgebraic Sets

    Authors: Hao Wu, Jie Wang, Bican Xia, Xiakun Li, Naijun Zhan, Ting Gan

    Abstract: Interpolation-based techniques become popular in recent years, as they can improve the scalability of existing verification techniques due to their inherent modularity and local reasoning capabilities. Synthesizing Craig interpolants is the cornerstone of these techniques. In this paper, we investigate nonlinear Craig interpolant synthesis for two polynomial formulas of the general form, essenti… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 21 pages (with appendix); accepted by the 26th International Symposium on Formal Methods (FM2024)

  27. Parametric Primitive Analysis of CAD Sketches with Vision Transformer

    Authors: Xiaogang Wang, Liang Wang, Hongyu Wu, Guoqiang Xiao, Kai Xu

    Abstract: The design and analysis of Computer-Aided Design (CAD) sketches play a crucial role in industrial product design, primarily involving CAD primitives and their inter-primitive constraints. To address challenges related to error accumulation in autoregressive models and the complexities associated with self-supervised model design for this task, we propose a two-stage network framework. This framewo… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  28. arXiv:2407.00033  [pdf, other

    q-bio.NC cs.AI

    Uncovering cognitive taskonomy through transfer learning in masked autoencoder-based fMRI reconstruction

    Authors: Youzhi Qu, Junfeng Xia, Xinyao Jian, Wendu Li, Kaining Peng, Zhichao Liang, Haiyan Wu, Quanying Liu

    Abstract: Data reconstruction is a widely used pre-training task to learn the generalized features for many downstream tasks. Although reconstruction tasks have been applied to neural signal completion and denoising, neural signal reconstruction is less studied. Here, we employ the masked autoencoder (MAE) model to reconstruct functional magnetic resonance imaging (fMRI) data, and utilize a transfer learnin… ▽ More

    Submitted 24 May, 2024; originally announced July 2024.

  29. arXiv:2406.19482  [pdf, other

    cs.CL

    xTower: A Multilingual LLM for Explaining and Correcting Translation Errors

    Authors: Marcos Treviso, Nuno M. Guerreiro, Sweta Agrawal, Ricardo Rei, José Pombal, Tania Vaz, Helena Wu, Beatriz Silva, Daan van Stigt, André F. T. Martins

    Abstract: While machine translation (MT) systems are achieving increasingly strong performance on benchmarks, they often produce translations with errors and anomalies. Understanding these errors can potentially help improve the translation quality and user experience. This paper introduces xTower, an open large language model (LLM) built on top of TowerBase designed to provide free-text explanations for tr… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  30. arXiv:2406.18530  [pdf, other

    cs.CV

    MatchTime: Towards Automatic Soccer Game Commentary Generation

    Authors: Jiayuan Rao, Haoning Wu, Chang Liu, Yanfeng Wang, Weidi Xie

    Abstract: Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Technical Report; Project Page: https://haoningwu3639.github.io/MatchTime/

  31. arXiv:2406.18145  [pdf, other

    cs.CR cs.LG

    Beyond Statistical Estimation: Differentially Private Individual Computation via Shuffling

    Authors: Shaowei Wang, Changyu Dong, Xiangfu Song, Jin Li, Zhili Zhou, Di Wang, Han Wu

    Abstract: In data-driven applications, preserving user privacy while enabling valuable computations remains a critical challenge. Technologies like Differential Privacy (DP) have been pivotal in addressing these concerns. The shuffle model of DP requires no trusted curators and can achieve high utility by leveraging the privacy amplification effect yielded from shuffling. These benefits have led to signific… ▽ More

    Submitted 11 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  32. arXiv:2406.17974  [pdf, other

    cs.CL cs.CV

    Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts

    Authors: Xuyang Wu, Yuan Wang, Hsin-Tai Wu, Zhiqiang Tao, Yi Fang

    Abstract: Large vision-language models (LVLMs) have recently achieved significant progress, demonstrating strong capabilities in open-world visual understanding. However, it is not yet clear how LVLMs address demographic biases in real life, especially the disparities across attributes such as gender, skin tone, and age. In this paper, we empirically investigate \emph{visual fairness} in several mainstream… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  33. arXiv:2406.17757  [pdf, other

    eess.SY cs.RO

    Automatic Parameter Tuning of Self-Driving Vehicles

    Authors: Hung-Ju Wu, Vladislav Nenchev, Christian Rathgeber

    Abstract: Modern automated driving solutions utilize trajectory planning and control components with numerous parameters that need to be tuned for different driving situations and vehicle types to achieve optimal performance. This paper proposes a method to automatically tune such parameters to resemble expert demonstrations. We utilize a cost function which captures deviations of the closed-loop operation… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Preprint to appear at the 2024 IEEE Conference on Control Technology and Applications (CCTA)

  34. arXiv:2406.16137  [pdf, other

    cs.CV

    MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling

    Authors: Jian Yang, Jiakun Li, Guoming Li, Zhen Shen, Huai-Yu Wu, Zhaoxin Fan, Heng Huang

    Abstract: Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed fo… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  35. arXiv:2406.15045  [pdf, other

    cs.CL

    Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction

    Authors: Jinge Wu, Zhaolong Wu, Abul Hasan, Yunsoo Kim, Jason P. Y. Cheung, Teng Zhang, Honghan Wu

    Abstract: This study proposes an approach for error correction in clinical radiology reports, leveraging large language models (LLMs) and retrieval-augmented generation (RAG) techniques. The proposed framework employs internal and external retrieval mechanisms to extract relevant medical entities and relations from the report and external knowledge sources. A three-stage inference process is introduced, dec… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  36. arXiv:2406.14540  [pdf, other

    cs.RO cs.AI cs.CV

    IRASim: Learning Interactive Real-Robot Action Simulators

    Authors: Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, Tao Kong

    Abstract: Scalable robot learning in the real world is limited by the cost and safety issues of real robots. In addition, rolling out robot trajectories in the real world can be time-consuming and labor-intensive. In this paper, we propose to learn an interactive real-robot action simulator as an alternative. We introduce a novel method, IRASim, which leverages the power of generative models to generate ext… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Opensource, project website: https://gen-irasim.github.io

  37. arXiv:2406.14312  [pdf, other

    cs.CL cs.AI

    Infusing clinical knowledge into tokenisers for language models

    Authors: Abul Hasan, Jinge Wu, Quang Ngoc Nguyen, Salomé Andres, Imane Guellil, Huayu Zhang, Arlene Casey, Beatrice Alex, Bruce Guthrie, Honghan Wu

    Abstract: This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 6 figures

  38. arXiv:2406.14069  [pdf, other

    eess.IV cs.CV

    Towards Multi-modality Fusion and Prototype-based Feature Refinement for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound

    Authors: Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zou, Jianhua Zhou, Yi Wang

    Abstract: Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classifica… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  39. arXiv:2406.13698  [pdf, other

    cs.CL

    MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

    Authors: Shun Wang, Ge Zhang, Han Wu, Tyler Loakman, Wenhao Huang, Chenghua Lin

    Abstract: Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data. However, these mainstream evaluation methods mainly focus on fluency and factual reliability, whilst paying little attention to figurative quality. In this paper… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  40. arXiv:2406.13177  [pdf, other

    cs.CR

    Transferable Watermarking to Self-supervised Pre-trained Graph Encoders by Trigger Embeddings

    Authors: Xiangyu Zhao, Hanzhou Wu, Xinpeng Zhang

    Abstract: Recent years have witnessed the prosperous development of Graph Self-supervised Learning (GSSL), which enables to pre-train transferable foundation graph encoders. However, the easy-to-plug-in nature of such encoders makes them vulnerable to copyright infringement. To address this issue, we develop a novel watermarking framework to protect graph encoders in GSSL settings. The key idea is to force… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: https://scholar.google.com/citations?user=IdiF7M0AAAAJ&hl=en

  41. arXiv:2406.12814  [pdf, other

    cs.LG cs.CL cs.CR cs.CV

    Adversarial Attacks on Multimodal Agents

    Authors: Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan

    Abstract: Vision-enabled language models (VLMs) are now used to build autonomous multimodal agents capable of taking actions in real environments. In this paper, we show that multimodal agents raise new safety risks, even though attacking agents is more challenging than prior attacks due to limited access to and knowledge about the environment. Our attacks use adversarial text strings to guide gradient-base… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 19 pages

  42. arXiv:2406.12799  [pdf, ps, other

    cs.DS

    Sample-Based Matroid Prophet Inequalities

    Authors: Hu Fu, Pinyan Lu, Zhihao Gavin Tang, Hongxun Wu, Jinzhao Wu, Qianfan Zhang

    Abstract: We study matroid prophet inequalities when distributions are unknown and accessible only through samples. While single-sample prophet inequalities for special matroids are known, no constant-factor competitive algorithm with even a sublinear number of samples was known for general matroids. Adding more to the stake, the single-sample version of the question for general matroids has close (two-way)… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: To appear at EC'24

  43. arXiv:2406.12577  [pdf, other

    cs.CV

    Cephalometric Landmark Detection across Ages with Prototypical Network

    Authors: Han Wu, Chong Wang, Lanzhuju Mei, Tong Yang, Min Zhu, Dingggang Shen, Zhiming Cui

    Abstract: Automated cephalometric landmark detection is crucial in real-world orthodontic diagnosis. Current studies mainly focus on only adult subjects, neglecting the clinically crucial scenario presented by adolescents whose landmarks often exhibit significantly different appearances compared to adults. Hence, an open question arises about how to develop a unified and effective detection algorithm across… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  44. arXiv:2406.12375  [pdf, other

    cs.LG cs.AI

    GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory

    Authors: Haoze Wu, Zihan Qiu, Zili Wang, Hang Zhao, Jie Fu

    Abstract: Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty c… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  45. arXiv:2406.11077  [pdf, other

    cs.CV

    Learning Relighting and Intrinsic Decomposition in Neural Radiance Fields

    Authors: Yixiong Yang, Shilin Hu, Haoyu Wu, Ramon Baldrich, Dimitris Samaras, Maria Vanrell

    Abstract: The task of extracting intrinsic components, such as reflectance and shading, from neural radiance fields is of growing interest. However, current methods largely focus on synthetic scenes and isolated objects, overlooking the complexities of real scenes with backgrounds. To address this gap, our research introduces a method that combines relighting with intrinsic decomposition. By leveraging ligh… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024 Workshop Neural Rendering Intelligence(NRI)

  46. arXiv:2406.10498  [pdf, other

    cs.LG cs.SI

    A Unified Graph Selective Prompt Learning for Graph Neural Networks

    Authors: Bo Jiang, Hao Wu, Ziyan Zhang, Beibei Wang, Jin Tang

    Abstract: In recent years, graph prompt learning/tuning has garnered increasing attention in adapting pre-trained models for graph representation learning. As a kind of universal graph prompt learning method, Graph Prompt Feature (GPF) has achieved remarkable success in adapting pre-trained models for Graph Neural Networks (GNNs). By fixing the parameters of a pre-trained GNN model, the aim of GPF is to mod… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  47. arXiv:2406.09356  [pdf, other

    cs.CV eess.IV

    CMC-Bench: Towards a New Paradigm of Visual Signal Compression

    Authors: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin

    Abstract: Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  48. arXiv:2406.09103  [pdf, other

    cs.CL

    Chain-of-Though (CoT) prompting strategies for medical error detection and correction

    Authors: Zhaolong Wu, Abul Hasan, Jinge Wu, Yunsoo Kim, Jason P. Y. Cheung, Teng Zhang, Honghan Wu

    Abstract: This paper describes our submission to the MEDIQA-CORR 2024 shared task for automatically detecting and correcting medical errors in clinical notes. We report results for three methods of few-shot In-Context Learning (ICL) augmented with Chain-of-Thought (CoT) and reason prompts using a large language model (LLM). In the first method, we manually analyse a subset of train and validation dataset to… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: accepted as NAACL workshop

  49. arXiv:2406.08930  [pdf

    cs.LG cs.AI

    Efficient Multi-View Fusion and Flexible Adaptation to View Missing in Cardiovascular System Signals

    Authors: Qihan Hu, Daomiao Wang, Hong Wu, Jian Liu, Cuiwei Yang

    Abstract: The progression of deep learning and the widespread adoption of sensors have facilitated automatic multi-view fusion (MVF) about the cardiovascular system (CVS) signals. However, prevalent MVF model architecture often amalgamates CVS signals from the same temporal step but different views into a unified representation, disregarding the asynchronous nature of cardiovascular events and the inherent… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 16 pages,12 figures

  50. arXiv:2406.08587  [pdf, other

    cs.CL cs.AI cs.LG

    CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

    Authors: Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma GongQue, Jianing Yu, Qiuna Tan, Weiran Xu

    Abstract: Computer Science (CS) stands as a testament to the intricacies of human intelligence, profoundly advancing the development of artificial intelligence and modern society. However, the current community of large language models (LLMs) overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer scie… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Work in progress