Skip to main content

Showing 1–50 of 5,923 results for author: Wang, J

  1. arXiv:2407.13520  [pdf, other

    cs.CV

    EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting

    Authors: Yuchen Weng, Zhengwen Shen, Ruofan Chen, Qi Wang, Jun Wang

    Abstract: 3D deblurring reconstruction techniques have recently seen significant advancements with the development of Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although these techniques can recover relatively clear 3D reconstructions from blurry image inputs, they still face limitations in handling severe blurring and complex camera motion. To address these issues, we propose Event-ass… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13284  [pdf, other

    cs.IR

    Semantic-aware Representation Learning for Homography Estimation

    Authors: Yuhan Liu, Qianxin Huang, Siqi Hui, Jingwen Fu, Sanping Zhou, Kangyi Wu, Pengna Li, Jinjun Wang

    Abstract: Homography estimation is the task of determining the transformation from an image pair. Our approach focuses on employing detector-free feature matching methods to address this issue. Previous work has underscored the importance of incorporating semantic information, however there still lacks an efficient way to utilize semantic information. Previous methods suffer from treating the semantics as a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.13278  [pdf, other

    cs.LG

    Deep Time Series Models: A Comprehensive Survey and Benchmark

    Authors: Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, Jianmin Wang

    Abstract: Time series, characterized by a sequence of data points arranged in a discrete-time order, are ubiquitous in real-world applications. Different from other modalities, time series present unique challenges due to their complex and dynamic nature, including the entanglement of nonlinear patterns and time-variant trends. Analyzing time series data is of great significance in real-world scenarios and… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: \

  4. arXiv:2407.13220  [pdf, other

    eess.AS cs.SD

    MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

    Authors: Huadai Liu, Jialei Wang, Rongjie Huang, Yang Liu, Jiayang Xu, Zhou Zhao

    Abstract: Text-guided diffusion models catalyze a paradigm shift in audio generation, facilitating the adaptability of source audio to conform to specific textual prompts. Recent advancements introduce inversion techniques, like DDIM inversion, to zero-shot editing, exploiting pre-trained diffusion models for audio modification. Nonetheless, our investigation exposes that DDIM inversion suffers from an accu… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2407.13201  [pdf, other

    cs.SE

    $μ$Drive: User-Controlled Autonomous Driving

    Authors: Kun Wang, Christopher M. Poskitt, Yang Sun, Jun Sun, Jingyi Wang, Peng Cheng, Jiming Chen

    Abstract: Autonomous Vehicles (AVs) rely on sophisticated Autonomous Driving Systems (ADSs) to provide passengers a satisfying and safe journey. The individual preferences of riders plays a crucial role in shaping the perception of safety and comfort while they are in the car. Existing ADSs, however, lack mechanisms to systematically capture and integrate rider preferences into their planning modules. To br… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  6. arXiv:2407.13137  [pdf, other

    cs.CV

    OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

    Authors: Jian Sun, Yuqi Dai, Chi-Man Vong, Qing Xu, Shengbo Eben Li, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems. It realizes ego-vehicle surrounding environment perception by projecting 2D multi-view images into 3D world space. Recently, BEV segmentation has made notable progress, attributed to better view transformation modules, larger image encoders, or more temporal information. However, there are still two issu… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  7. arXiv:2407.12962  [pdf, other

    cs.RO

    NAS: N-step computation of All Solutions to the footstep planning problem

    Authors: Jiayi Wang, Saeid Samadi, Hefan Wang, Pierre Fernbach, Olivier Stasse, Sethu Vijayakumar, Steve Tonneau

    Abstract: How many ways are there to climb a staircase in a given number of steps? Infinitely many, if we focus on the continuous aspect of the problem. A finite, possibly large number if we consider the discrete aspect, i.e. on which surface which effectors are going to step and in what order. We introduce NAS, an algorithm that considers both aspects simultaneously and computes all the possible solutions… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Submitted to Humanoids 2024

  8. arXiv:2407.12940  [pdf, other

    cs.RO cs.CV

    KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation

    Authors: Jianbo Zhao, Jiaheng Zhuang, Qibin Zhou, Taiyu Ban, Ziyao Xu, Hangning Zhou, Junhe Wang, Guoan Wang, Zhiheng Li, Bin Li

    Abstract: Trajectory generation is a pivotal task in autonomous driving. Recent studies have introduced the autoregressive paradigm, leveraging the state transition model to approximate future trajectory distributions. This paradigm closely mirrors the real-world trajectory generation process and has achieved notable success. However, its potential is limited by the ineffective representation of realistic t… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  9. arXiv:2407.12797  [pdf, other

    cs.PF cs.LG

    CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines

    Authors: Wenbo Sun, Jiaqi Wang, Qiming Guo, Ziyu Li, Wenlu Wang, Rihan Hai

    Abstract: Online Large Language Model (LLM) services such as ChatGPT and Claude 3 have transformed business operations and academic research by effortlessly enabling new opportunities. However, due to data-sharing restrictions, sectors such as healthcare and finance prefer to deploy local LLM applications using costly hardware resources. This scenario requires a balance between the effectiveness advantages… ▽ More

    Submitted 20 June, 2024; originally announced July 2024.

  10. arXiv:2407.12791  [pdf, other

    cs.CL cs.AI

    TourLLM: Enhancing LLMs with Tourism Knowledge

    Authors: Qikai Wei, Mingzhi Yang, Jinqiang Wang, Wenwei Mao, Jiabo Xu, Huansheng Ning

    Abstract: Recently, large language models (LLMs) have demonstrated their effectiveness in various natural language processing (NLP) tasks. However, the lack of tourism knowledge limits the performance of LLMs in tourist attraction presentations and travel planning. To address this challenge, we constructed a supervised fine-tuning dataset for the culture and tourism domain, named Cultour. This dataset consi… ▽ More

    Submitted 18 June, 2024; originally announced July 2024.

  11. arXiv:2407.12758  [pdf, other

    cs.CV

    Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification

    Authors: Zhizhong Zhang, Jiangming Wang, Xin Tan, Yanyun Qu, Junping Wang, Yong Xie, Yuan Xie

    Abstract: Unsupervised visible infrared person re-identification (USVI-ReID) is a challenging retrieval task that aims to retrieve cross-modality pedestrian images without using any label information. In this task, the large cross-modality variance makes it difficult to generate reliable cross-modality labels, and the lack of annotations also provides additional difficulties for learning modality-invariant… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  12. arXiv:2407.12565  [pdf, other

    cs.AR

    SigDLA: A Deep Learning Accelerator Extension for Signal Processing

    Authors: Fangfa Fu, Wenyu Zhang, Zesong Jiang, Zhiyu Zhu, Guoyu Li, Bing Yang, Cheng Liu, Liyi Xiao, Jinxiang Wang, Huawei Li, Xiaowei Li

    Abstract: Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSPs) for signal processing and build deep learning frameworks on this basis. While deep learning is usually much more computing-intensive than signal processing, the computing efficiency of deep learnin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  13. arXiv:2407.12491  [pdf, other

    cs.CV

    Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving

    Authors: Yuqi Dai, Jian Sun, Shengbo Eben Li, Qing Xu, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Perception is essential for autonomous driving system. Recent approaches based on Bird's-eye-view (BEV) and deep learning have made significant progress. However, there exists challenging issues including lengthy development cycles, poor reusability, and complex sensor setups in perception algorithm development process. To tackle the above challenges, this paper proposes a novel hierarchical Bird'… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  14. arXiv:2407.12423  [pdf, other

    cs.HC cs.AI

    StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT Interactions

    Authors: Zixin Chen, Jiachen Wang, Meng Xia, Kento Shigyo, Dingdong Liu, Rong Zhang, Huamin Qu

    Abstract: The integration of Large Language Models (LLMs), especially ChatGPT, into education is poised to revolutionize students' learning experiences by introducing innovative conversational learning methodologies. To empower students to fully leverage the capabilities of ChatGPT in educational scenarios, understanding students' interaction patterns with ChatGPT is crucial for instructors. However, this e… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 11 pages. To be published at IEEE Visualization 2024

  15. arXiv:2407.12309  [pdf, other

    cs.CL

    MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models

    Authors: Thao Minh Nguyen Phan, Cong-Tinh Dao, Chenwei Wu, Jian-Zhe Wang, Shun Liu, Jun-En Ding, David Restrepo, Feng Liu, Fang-Ming Hung, Wen-Chih Peng

    Abstract: Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they eithe… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  16. Performance Analysis and Blocklength Minimization of Uplink RSMA for Short Packet Transmissions in URLLC

    Authors: Yixin Zhang, Wenchi Cheng, Jingqing Wang, Wei Zhang

    Abstract: Rate splitting multiple access (RSMA) is one of the most promising techniques for ultra-reliable and low-latency communications (URLLC) with stringent requirements on delay and reliability of multiple access. To fully explore the delay performance enhancement brought by uplink RSMA to URLLC, in this paper, we evaluate the performance of two-user uplink RSMA and propose the corresponding blocklengt… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Journal ref: GLOBECOM 2023 - 2023 IEEE Global Communications Conference

  17. arXiv:2407.12237  [pdf, other

    cs.IT

    Delay Tradeoff and Adaptive Finite Blocklength Framework for URLLC

    Authors: Yixin Zhang, Wenchi Cheng, Jingqing Wang, Wei Zhang

    Abstract: With various time-sensitive tasks to be served, ultra-reliable and low-latency communications (URLLC) has become one of the most important scenarios for the fifth generation (5G) wireless communications. The end-to-end delay from the sub-millisecond-level to the second-level is first put forward for a wide range of delay-sensitive tasks in the future sixth generation (6G) communication networks, w… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  18. arXiv:2407.11691  [pdf, other

    cs.CV

    VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

    Authors: Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen

    Abstract: We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 70 different large multi-modality models, including both proprietary… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  19. arXiv:2407.11569  [pdf, other

    cs.CV

    SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds

    Authors: Yanbo Wang, Wentao Zhao, Chuan Cao, Tianchen Deng, Jingchuan Wang, Weidong Chen

    Abstract: Although LiDAR semantic segmentation advances rapidly, state-of-the-art methods often incorporate specifically designed inductive bias derived from benchmarks originating from mechanical spinning LiDAR. This can limit model generalizability to other kinds of LiDAR technologies and make hyperparameter tuning more complex. To tackle these issues, we propose a generalized framework to accommodate var… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  20. arXiv:2407.11419  [pdf, other

    cs.CV

    TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs

    Authors: Chenfan Xu, Zhentao Liu, Yuan Liu, Yulong Dou, Jiamin Wu, Jiepeng Wang, Minjiao Wang, Dinggang Shen, Zhiming Cui

    Abstract: Orthodontic treatment usually requires regular face-to-face examinations to monitor dental conditions of the patients. When in-person diagnosis is not feasible, an alternative is to utilize five intra-oral photographs for remote dental monitoring. However, it lacks of 3D information, and how to reconstruct 3D dental models from such sparse view photographs is a challenging problem. In this study,… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: MICCAI2024

  21. arXiv:2407.11335  [pdf, other

    cs.CV

    LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

    Authors: Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, Gang Zhang, Errui Ding, Yan Wang, Jingdong Wang, Si Liu

    Abstract: Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with the open vo… ▽ More

    Submitted 18 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  22. arXiv:2407.11096  [pdf, other

    cs.LG cs.AI

    Static and multivariate-temporal attentive fusion transformer for readmission risk prediction

    Authors: Zhe Sun, Runzhi Li, Jing Wang, Gang Chen, Siyu Yan, Lihong Ma

    Abstract: Background: Accurate short-term readmission prediction of ICU patients is significant in improving the efficiency of resource assignment by assisting physicians in making discharge decisions. Clinically, both individual static static and multivariate temporal data collected from ICU monitors play critical roles in short-term readmission prediction. Informative static and multivariate temporal feat… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  23. arXiv:2407.11062  [pdf, other

    cs.LG cs.AI cs.CL

    EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

    Authors: Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo

    Abstract: Large language models (LLMs) are integral to modern natural language processing and artificial intelligence. However, they face challenges in managing their significant memory requirements. Although quantization-aware training (QAT) offers a solution by reducing memory consumption through low-bit representations with minimal accuracy loss, it demands substantial training resources to optimize mode… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: An efficient and effective quantization technical to improve the performance of low-bits LMMs and LVLMs

  24. arXiv:2407.10937  [pdf, other

    cs.CV

    IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

    Authors: Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang

    Abstract: Significant advances have been made in human-centric video generation, yet the joint video-depth generation problem remains underexplored. Most existing monocular depth estimation methods may not generalize well to synthesized images or videos, and multi-view-based methods have difficulty controlling the human appearance and motion. In this work, we present IDOL (unIfied Dual-mOdal Latent diffusio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://yhzhai.github.io/idol/

  25. arXiv:2407.10753  [pdf, other

    cs.CV

    OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

    Authors: Jinghua Hou, Tong Wang, Xiaoqing Ye, Zhe Liu, Shi Gong, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai

    Abstract: Accurate depth information is crucial for enhancing the performance of multi-view 3D object detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise depth supervision, they overlook two significant phenomena: 1) the depth supervision obtained from LiDAR points is usually distributed on the surface of the object, which is not so friendly to existing DETR-based 3D… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  26. arXiv:2407.10749  [pdf, other

    cs.CV

    SEED: A Simple and Effective 3D DETR in Point Clouds

    Authors: Zhe Liu, Jinghua Hou, Xiaoqing Ye, Tong Wang, Jingdong Wang, Xiang Bai

    Abstract: Recently, detection transformers (DETRs) have gradually taken a dominant position in 2D detection thanks to their elegant framework. However, DETR-based detectors for 3D point clouds are still difficult to achieve satisfactory performance. We argue that the main challenges are twofold: 1) How to obtain the appropriate object queries is challenging due to the high sparsity and uneven distribution o… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  27. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  28. arXiv:2407.10655  [pdf, other

    cs.CV

    OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer

    Authors: Yu Wang, Xiangbo Su, Qiang Chen, Xinyu Zhang, Teng Xi, Kun Yao, Errui Ding, Gang Zhang, Jingdong Wang

    Abstract: Open-vocabulary object detection focusing on detecting novel categories guided by natural language. In this report, we propose Open-Vocabulary Light-Weighted Detection Transformer (OVLW-DETR), a deployment friendly open-vocabulary detector with strong performance and low latency. Building upon OVLW-DETR, we provide an end-to-end training recipe that transferring knowledge from vision-language mode… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 4 pages

  29. arXiv:2407.10456  [pdf, other

    cs.CL

    Don't Throw Away Data: Better Sequence Knowledge Distillation

    Authors: Jun Wang, Eleftheria Briakou, Hamid Dadkhahi, Rishabh Agarwal, Colin Cherry, Trevor Cohn

    Abstract: A critical component in knowledge distillation is the means of coupling the teacher and student. The predominant sequence knowledge distillation method involves supervised learning of the student against teacher-decoded outputs, and is exemplified by the current state of the art, which incorporates minimum Bayes risk (MBR) decoding. In this paper we seek to integrate MBR more tightly in distillati… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  30. arXiv:2407.09698  [pdf, other

    cs.LG

    RIO-CPD: A Riemannian Geometric Method for Correlation-aware Online Change Point Detection

    Authors: Chengyuan Deng, Zhengzhang Chen, Xujiang Zhao, Haoyu Wang, Junxiang Wang, Haifeng Chen, Jie Gao

    Abstract: The objective of change point detection is to identify abrupt changes at potentially multiple points within a data sequence. This task is particularly challenging in the online setting where various types of changes can occur, including shifts in both the marginal and joint distributions of the data. This paper tackles these challenges by sequentially tracking correlation matrices on the Riemannia… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  31. arXiv:2407.09562  [pdf, other

    cs.CV eess.IV

    Edge AI-Enabled Chicken Health Detection Based on Enhanced FCOS-Lite and Knowledge Distillation

    Authors: Qiang Tong, Jinrui Wang, Wenshuang Yang, Songtao Wu, Wenqi Zhang, Chen Sun, Kuanhong Xu

    Abstract: The utilization of AIoT technology has become a crucial trend in modern poultry management, offering the potential to optimize farming operations and reduce human workloads. This paper presents a real-time and compact edge-AI enabled detector designed to identify chickens and their healthy statuses using frames captured by a lightweight and intelligent camera equipped with an edge-AI enabled CMOS… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  32. arXiv:2407.09450  [pdf, other

    cs.AI cs.CL cs.LG q-bio.NC

    Human-like Episodic Memory for Infinite Context LLMs

    Authors: Zafeirios Fountas, Martin A Benfeghoul, Adnan Oomerjee, Fenia Christopoulou, Gerasimos Lampouras, Haitham Bou-Ammar, Jun Wang

    Abstract: Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences. In contrast, the human brain excels at organising and retrieving episodic experiences across vast temporal scales, spanning a lifetime. In this work, we introduce EM-LLM, a novel approach that integrat… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  33. arXiv:2407.08961  [pdf

    eess.IV cs.CV

    Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT

    Authors: Jie Zheng, Ru Wen, Haiqin Hu, Lina Wei, Kui Su, Wei Chen, Chen Liu, Jun Wang

    Abstract: Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream model… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  34. arXiv:2407.08914  [pdf, other

    cs.NI eess.SP

    Multi-objective Aerial Collaborative Secure Communication Optimization via Generative Diffusion Model-enabled Deep Reinforcement Learning

    Authors: Chuang Zhang, Geng Sun, Jiahui Li, Qingqing Wu, Jiacheng Wang, Dusit Niyato, Yuanwei Liu

    Abstract: Due to flexibility and low-cost, unmanned aerial vehicles (UAVs) are increasingly crucial for enhancing coverage and functionality of wireless networks. However, incorporating UAVs into next-generation wireless communication systems poses significant challenges, particularly in sustaining high-rate and long-range secure communications against eavesdropping attacks. In this work, we consider a UAV… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Transactions on Mobile Computing

  35. arXiv:2407.08733  [pdf, other

    cs.CL

    Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist

    Authors: Zihao Zhou, Shudong Liu, Maizhen Ning, Wei Liu, Jindong Wang, Derek F. Wong, Xiaowei Huang, Qiufeng Wang, Kaizhu Huang

    Abstract: Exceptional mathematical reasoning ability is one of the key features that demonstrate the power of large language models (LLMs). How to comprehensively define and evaluate the mathematical abilities of LLMs, and even reflect the user experience in real-world scenarios, has emerged as a critical issue. Current benchmarks predominantly concentrate on problem-solving capabilities, which presents a s… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 35 pages, 10 figures, preprint

  36. arXiv:2407.08713  [pdf, other

    cs.CL cs.AI

    GTA: A Benchmark for General Tool Agents

    Authors: Jize Wang, Zerun Ma, Yining Li, Songyang Zhang, Cailian Chen, Kai Chen, Xinyi Le

    Abstract: Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, fa… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Github repo: https://github.com/open-compass/GTA

  37. arXiv:2407.08664  [pdf, other

    cs.CE eess.SY

    MBD-NODE: Physics-informed data-driven modeling and simulation of constrained multibody systems

    Authors: Jingquan Wang, Shu Wang, Huzaifa Mustafa Unjhawala, Jinlong Wu, Dan Negrut

    Abstract: We describe a framework that can integrate prior physical information, e.g., the presence of kinematic constraints, to support data-driven simulation in multi-body dynamics. Unlike other approaches, e.g., Fully-connected Neural Network (FCNN) or Recurrent Neural Network (RNN)-based methods that are used to model the system states directly, the proposed approach embraces a Neural Ordinary Different… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  38. arXiv:2407.08462  [pdf, other

    cs.LG cs.NI

    Distributed Deep Reinforcement Learning Based Gradient Quantization for Federated Learning Enabled Vehicle Edge Computing

    Authors: Cui Zhang, Wenjun Zhang, Qiong Wu, Pingyi Fan, Qiang Fan, Jiangzhou Wang, Khaled B. Letaief

    Abstract: Federated Learning (FL) can protect the privacy of the vehicles in vehicle edge computing (VEC) to a certain extent through sharing the gradients of vehicles' local models instead of local data. The gradients of vehicles' local models are usually large for the vehicular artificial intelligence (AI) applications, thus transmitting such large gradients would cause large per-round latency. Gradient q… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/Distributed-Deep-Reinforcement-Learning-Based-Gradient Quantization-for-Federated-Learning-Enabled-Vehicle-Edge-Computing

  39. arXiv:2407.08442  [pdf, other

    cs.LG cs.AI

    How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation

    Authors: Linglong Qian, Tao Wang, Jun Wang, Hugh Logan Ellis, Robin Mitra, Richard Dobson, Zina Ibrahim

    Abstract: We introduce a novel classification framework for time-series imputation using deep learning, with a particular focus on clinical data. By identifying conceptual gaps in the literature and existing reviews, we devise a taxonomy grounded on the inductive bias of neural imputation frameworks, resulting in a classification of existing deep imputation strategies based on their suitability for specific… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  40. arXiv:2407.08428  [pdf, other

    cs.CV cs.AI

    A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights

    Authors: Wentao Lei, Jinting Wang, Fengji Ma, Guanjie Huang, Li Liu

    Abstract: Human video generation is a dynamic and rapidly evolving task that aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. With the potential for wide-ranging applications in film, gaming, and virtual communication, the ability to generate natural and realistic human video is critical. Recent advancements in generative models… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  41. arXiv:2407.08333  [pdf, other

    cs.CV

    SR-Mamba: Effective Surgical Phase Recognition with State Space Model

    Authors: Rui Cao, Jiangliu Wang, Yun-Hui Liu

    Abstract: Surgical phase recognition is crucial for enhancing the efficiency and safety of computer-assisted interventions. One of the fundamental challenges involves modeling the long-distance temporal relationships present in surgical videos. Inspired by the recent success of Mamba, a state space model with linear scalability in sequence length, this paper presents SR-Mamba, a novel attention-free model s… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Technical Report

  42. arXiv:2407.08020  [pdf, other

    cs.CV

    Interactive Segmentation Model for Placenta Segmentation from 3D Ultrasound images

    Authors: Hao Li, Baris Oguz, Gabriel Arenas, Xing Yao, Jiacheng Wang, Alison Pouch, Brett Byram, Nadav Schwartz, Ipek Oguz

    Abstract: Placenta volume measurement from 3D ultrasound images is critical for predicting pregnancy outcomes, and manual annotation is the gold standard. However, such manual annotation is expensive and time-consuming. Automated segmentation algorithms can often successfully segment the placenta, but these methods may not consistently produce robust segmentations suitable for practical use. Recently, inspi… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  43. arXiv:2407.07930  [pdf

    q-bio.BM cs.LG

    Token-Mol 1.0: Tokenized drug design with large language model

    Authors: Jike Wang, Rui Qin, Mingyang Wang, Meijing Fang, Yangyang Zhang, Yuchen Zhu, Qun Su, Qiaolin Gou, Chao Shen, Odin Zhang, Zhenxing Wu, Dejun Jiang, Xujun Zhang, Huifeng Zhao, Xiaozhe Wan, Zhourui Wu, Liwei Liu, Yu Kang, Chang-Yu Hsieh, Tingjun Hou

    Abstract: Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  44. arXiv:2407.07921  [pdf, other

    cs.CR cs.AI cs.LG eess.SP

    A Trustworthy AIoT-enabled Localization System via Federated Learning and Blockchain

    Authors: Junfei Wang, He Huang, Jingze Feng, Steven Wong, Lihua Xie, Jianfei Yang

    Abstract: There is a significant demand for indoor localization technology in smart buildings, and the most promising solution in this field is using RF sensors and fingerprinting-based methods that employ machine learning models trained on crowd-sourced user data gathered from IoT devices. However, this raises security and privacy issues in practice. Some researchers propose to use federated learning to pa… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  45. arXiv:2407.07575  [pdf, other

    cs.LG cs.NI

    Resource Allocation for Twin Maintenance and Computing Task Processing in Digital Twin Vehicular Edge Computing Network

    Authors: Yu Xie, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

    Abstract: As a promising technology, vehicular edge computing (VEC) can provide computing and caching services by deploying VEC servers near vehicles. However, VEC networks still face challenges such as high vehicle mobility. Digital twin (DT), an emerging technology, can predict, estimate, and analyze real-time states by digitally modeling objects in the physical world. By integrating DT with VEC, a virtua… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at:https://github.com/qiongwu86/Resource-allocation-for-twin-maintenance-and-computing-tasks-in-digital-twin-mobile-edge-network

  46. arXiv:2407.07476  [pdf, other

    cs.DC

    A Transverse-Read-assisted Valid-Bit Collection to Accelerate Stochastic Conmputing MAC for Energy-Efficient in-RTM DNNs

    Authors: Jihe Wang, Zhiying Zhang, Xingwu Dong, Danghui Wang

    Abstract: It looks very attractive to coordinate racetrack-memory(RM) and stochastic-computing (SC) jointly to build an ultra-low power neuron-architecture.However,the above combination has always been questioned in a fatal weakness that the narrow bit-view of the RM-MTJ structure,a.k.a.shift-and-access pattern,cannot physically match the great throughput of direct-stored stochastic sequences.Fortunately,a… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  47. arXiv:2407.07330  [pdf

    cs.CL cs.AI

    Interpretable Differential Diagnosis with Dual-Inference Large Language Models

    Authors: Shuang Zhou, Sirui Ding, Jiashuo Wang, Mingquan Lin, Genevieve B. Melton, Rui Zhang

    Abstract: Methodological advancements to automate the generation of differential diagnosis (DDx) to predict a list of potential diseases as differentials given patients' symptom descriptions are critical to clinical reasoning and applications such as decision support. However, providing reasoning or interpretation for these differential diagnoses is more meaningful. Fortunately, large language models (LLMs)… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 15 pages

  48. arXiv:2407.07307  [pdf, other

    cs.CV

    Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken

    Authors: Peifu Liu, Tingfa Xu, Jie Wang, Huan Chen, Huiyan Bai, Jianan Li

    Abstract: Hyperspectral image classification, a task that assigns pre-defined classes to each pixel in a hyperspectral image of remote sensing scenes, often faces challenges due to the neglect of correlations between spectrally similar pixels. This oversight can lead to inaccurate edge definitions and difficulties in managing minor spectral variations in contiguous areas. To address these issues, we introdu… ▽ More

    Submitted 13 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  49. arXiv:2407.06853  [pdf, other

    cs.CR

    TimeTravel: Real-time Timing Drift Attack on System Time Using Acoustic Waves

    Authors: Jianshuo Liu, Hong Li, Haining Wang, Mengjie Sun, Hui Wen, Jinfa Wang, Limin Sun

    Abstract: Real-time Clock (RTC) has been widely used in various real-time systems to provide precise system time. In this paper, we reveal a new security vulnerability of the RTC circuit, where the internal storage time or timestamp can be arbitrarily modified forward or backward. The security threat of dynamic modifications of system time caused by this vulnerability is called TimeTravel. Based on acoustic… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by USENIX Security 2024 winter cycle and will appear in USENIX Security 2025

  50. arXiv:2407.06597  [pdf, other

    cs.AI

    TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

    Authors: Renjie Liang, Li Li, Chongzhi Zhang, Jing Wang, Xizhou Zhu, Aixin Sun

    Abstract: In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we dev… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.