Skip to main content

Showing 1–50 of 1,389 results for author: Wang, P

  1. arXiv:2407.21438  [pdf, other

    cs.CV

    A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap

    Authors: Lijun Zhang, Wei Suo, Peng Wang, Yanning Zhang

    Abstract: Human-object interactions (HOI) detection aims at capturing human-object pairs in images and corresponding actions. It is an important step toward high-level visual reasoning and scene understanding. However, due to the natural bias from the real world, existing methods mostly struggle with rare human-object pairs and lead to sub-optimal results. Recently, with the development of the generative mo… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  2. arXiv:2407.21408  [pdf, other

    cs.CV

    Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model

    Authors: Zhichao Zhang, Xinyue Li, Wei Sun, Jun Jia, Xiongkuo Min, Zicheng Zhang, Chunyi Li, Zijian Chen, Puyi Wang, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Guangtao Zhai

    Abstract: In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in stable diffusion and large language model techniques. Thus, there is a great demand for accurate video quality assessment (VQA) models to measure the perceptual quality of AI-generated content (AIGC) videos as well as optimize video generation techniques. However, assessi… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  3. arXiv:2407.21118  [pdf, other

    cs.AI cs.LG

    Palu: Compressing KV-Cache with Low-Rank Projection

    Authors: Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu

    Abstract: KV-Cache compression methods generally sample a KV-Cache of effectual tokens or quantize it into lower bits. However, these methods cannot exploit the redundancy of the hidden dimension of KV tensors. This paper investigates a unique hidden dimension approach called Palu, a novel KV-Cache compression framework that utilizes low-rank projection. Palu decomposes the linear layers into low-rank matri… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  4. arXiv:2407.20824  [pdf, other

    cs.LG

    DyGKT: Dynamic Graph Learning for Knowledge Tracing

    Authors: Ke Cheng, Linzhi Peng, Pengyang Wang, Junchen Ye, Leilei Sun, Bowen Du

    Abstract: Knowledge Tracing aims to assess student learning states by predicting their performance in answering questions. Different from the existing research which utilizes fixed-length learning sequence to obtain the student states and regards KT as a static problem, this work is motivated by three dynamical characteristics: 1) The scales of students answering records are constantly growing; 2) The seman… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  5. arXiv:2407.19890  [pdf, other

    quant-ph cs.LG

    Quantum Dynamics of Machine Learning

    Authors: Peng Wang, Maimaitiniyazi Maimaitiabudula

    Abstract: The quantum dynamic equation (QDE) of machine learning is obtained based on Schrödinger equation and potential energy equivalence relationship. Through Wick rotation, the relationship between quantum dynamics and thermodynamics is also established in this paper. This equation reformulates the iterative process of machine learning into a time-dependent partial differential equation with a clear mat… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  6. arXiv:2407.17915  [pdf, other

    cs.CR cs.AI

    The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models

    Authors: Zihui Wu, Haichang Gao, Jianping He, Ping Wang

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, but their power comes with significant security considerations. While extensive research has been conducted on the safety of LLMs in chat mode, the security implications of their function calling feature have been largely overlooked. This paper uncovers a critical vulnerability in the function calling process of LLMs, introduc… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  7. arXiv:2407.17485  [pdf, other

    physics.chem-ph cs.ET physics.data-an

    Application of the Digital Annealer Unit in Optimizing Chemical Reaction Conditions for Enhanced Production Yields

    Authors: Shih-Cheng Li, Pei-Hwa Wang, Jheng-Wei Su, Wei-Yin Chiang, Shih-Hsien Huang, Yen-Chu Lin, Chia-Ho Ou, Chih-Yu Chen

    Abstract: Finding appropriate reaction conditions that yield high product rates in chemical synthesis is crucial for the chemical and pharmaceutical industries. However, due to the vast chemical space, conducting experiments for each possible reaction condition is impractical. Consequently, models such as QSAR (Quantitative Structure-Activity Relationship) or ML (Machine Learning) have been developed to pre… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  8. arXiv:2407.16709  [pdf, other

    q-bio.GN cs.LG

    LSTM Autoencoder-based Deep Neural Networks for Barley Genotype-to-Phenotype Prediction

    Authors: Guanjin Wang, Junyu Xuan, Penghao Wang, Chengdao Li, Jie Lu

    Abstract: Artificial Intelligence (AI) has emerged as a key driver of precision agriculture, facilitating enhanced crop productivity, optimized resource use, farm sustainability, and informed decision-making. Also, the expansion of genome sequencing technology has greatly increased crop genomic resources, deepening our understanding of genetic variation and enhancing desirable crop traits to optimize perfor… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  9. arXiv:2407.16142  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Diffusion Models as Optimizers for Efficient Planning in Offline RL

    Authors: Renming Huang, Yunqiang Pei, Guoqing Wang, Yangming Zhang, Yang Yang, Peng Wang, Hengtao Shen

    Abstract: Diffusion models have shown strong competitiveness in offline reinforcement learning tasks by formulating decision-making as sequential generation. However, the practicality of these methods is limited due to the lengthy inference processes they require. In this paper, we address this problem by decomposing the sampling process of diffusion models into two decoupled subprocesses: 1) generating a f… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: The paper was accepted by ECCV2024

  10. arXiv:2407.15448  [pdf, other

    eess.SP cs.IT

    Movable Antenna-Enhanced Wireless Communications: General Architectures and Implementation Methods

    Authors: Boyu Ning, Songjie Yang, Yafei Wu, Peilan Wang, Weidong Mei, Chau Yuen, Emil Björnson

    Abstract: Movable antennas (MAs), traditionally explored in antenna design, have recently garnered significant attention in wireless communications due to their ability to dynamically adjust the antenna positions to changes in the propagation environment. However, previous research has primarily focused on characterizing the performance limits of various MA-assisted wireless communication systems, with less… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  11. arXiv:2407.15017  [pdf, other

    cs.CL cs.AI cs.CV cs.HC cs.LG

    Knowledge Mechanisms in Large Language Models: A Survey and Perspective

    Authors: Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang

    Abstract: Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for advancing towards trustworthy AGI. This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution. Knowledge utilization delves into the mechanism of memorization, comprehension and application, and creation. Knowledge evolution focuses on the dynamic progression o… ▽ More

    Submitted 31 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Ongoing work (v2); add Section 5: Application of Knowledge Mechanism; revise Section 6 and 7; fix typos

  12. arXiv:2407.14138  [pdf, other

    cs.CV

    Visual Text Generation in the Wild

    Authors: Yuanzhi Zhu, Jiawei Liu, Feiyu Gao, Wenyu Liu, Xinggang Wang, Peng Wang, Fei Huang, Cong Yao, Zhibo Yang

    Abstract: Recently, with the rapid advancements of generative models, the field of visual text generation has witnessed significant progress. However, it is still challenging to render high-quality text images in real-world scenarios, as three critical criteria should be satisfied: (1) Fidelity: the generated text images should be photo-realistic and the contents are expected to be the same as specified in… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  13. arXiv:2407.14055  [pdf, other

    quant-ph cs.AI cs.CV cs.LG

    Quantum Hamiltonian Embedding of Images for Data Reuploading Classifiers

    Authors: Peiyong Wang, Casey R. Myers, Lloyd C. L. Hollenberg, Udaya Parampalli

    Abstract: When applying quantum computing to machine learning tasks, one of the first considerations is the design of the quantum machine learning model itself. Conventionally, the design of quantum machine learning algorithms relies on the ``quantisation" of classical learning algorithms, such as using quantum linear algebra to implement important subroutines of classical algorithms, if not the entire algo… ▽ More

    Submitted 31 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: 11 figures, 31 pages. Code available on https://github.com/peiyong-addwater/HamEmbedding. Author affiliation updated for v2. Acknowledgements and funding information added for v2

  14. arXiv:2407.13362  [pdf, other

    cs.CV

    Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation

    Authors: Pengfei Wang, Yuxi Wang, Shuai Li, Zhaoxiang Zhang, Zhen Lei, Lei Zhang

    Abstract: The scarcity of large-scale 3D-text paired data poses a great challenge on open vocabulary 3D scene understanding, and hence it is popular to leverage internet-scale 2D data and transfer their open vocabulary capabilities to 3D models through knowledge distillation. However, the existing distillation-based 3D scene understanding approaches rely on the representation capacity of 2D models, disregar… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  15. arXiv:2407.13331  [pdf, other

    cs.LG

    Reconstruct the Pruned Model without Any Retraining

    Authors: Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu Wang

    Abstract: Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost. This retraining-free paradigm involves (1) pruning criteria to define the architecture and (2) distortion reconstruction to restore performance. However, existing methods often emphasize pruning criteria while usi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 18 pages

  16. arXiv:2407.13113  [pdf, other

    cs.AI

    Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II

    Authors: Rixin Wu, Ran Wang, Jie Hao, Qiang Wu, Ping Wang, Dusit Niyato

    Abstract: This paper proposes a weight-aware deep reinforcement learning (WADRL) approach designed to address the multiobjective vehicle routing problem with time windows (MOVRPTW), aiming to use a single deep reinforcement learning (DRL) model to solve the entire multiobjective optimization problem. The Non-dominated sorting genetic algorithm-II (NSGA-II) method is then employed to optimize the outcomes pr… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 13 pages; Under Review; Submitted to IEEE Transactions on Intelligent Transportation Systems

  17. arXiv:2407.12504  [pdf, other

    cs.CL

    Case2Code: Learning Inductive Reasoning with Synthetic Data

    Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

    Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  18. arXiv:2407.11946  [pdf, other

    cs.CV

    Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

    Authors: Ping Wang, Yulun Zhang, Lishun Wang, Xin Yuan

    Abstract: Transformers have achieved the state-of-the-art performance on solving the inverse problem of Snapshot Compressive Imaging (SCI) for video, whose ill-posedness is rooted in the mixed degradation of spatial masking and temporal aliasing. However, previous Transformers lack an insight into the degradation and thus have limited performance and efficiency. In this work, we tailor an efficient reconstr… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  19. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  20. arXiv:2407.10233  [pdf, other

    cs.CV cs.AI

    Visual Prompt Selection for In-Context Learning Segmentation

    Authors: Wei Suo, Lanqing Lai, Mengyang Sun, Hanwang Zhang, Peng Wang, Yanning Zhang

    Abstract: As a fundamental and extensively studied task in computer vision, image segmentation aims to locate and identify different semantic concepts at the pixel level. Recently, inspired by In-Context Learning (ICL), several generalist segmentation frameworks have been proposed, providing a promising paradigm for segmenting specific objects. However, existing works mostly ignore the value of visual promp… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accept by ECCV2024

  21. arXiv:2407.09285  [pdf, other

    cs.CV

    MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results

    Authors: Jiangpeng He, Yuhao Chen, Gautham Vinod, Talha Ibn Mahmud, Fengqing Zhu, Edward Delp, Alexander Wong, Pengcheng Xi, Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva, Jiadong Tang, Dianyi Yang, Yu Gao, Zhaoxiang Liang, Yawei Jueluo, Chengyu Shi, Pengyu Wang

    Abstract: The increasing interest in computer vision applications for nutrition and dietary monitoring has led to the development of advanced 3D reconstruction techniques for food items. However, the scarcity of high-quality data and limited collaboration between industry and academia have constrained progress in this field. Building on recent advancements in 3D reconstruction, we host the MetaFood Workshop… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Technical report for MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction. arXiv admin note: substantial text overlap with arXiv:2407.01717

  22. arXiv:2407.09051  [pdf, other

    cs.CV

    DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects

    Authors: Peng Wang, Yongcai Wang, Deying Li

    Abstract: Multi-object tracking (MOT) on static platforms, such as by surveillance cameras, has achieved significant progress, with various paradigms providing attractive performances. However, the effectiveness of traditional MOT methods is significantly reduced when it comes to dynamic platforms like drones. This decrease is attributed to the distinctive challenges in the MOT-on-drone scenario: (1) object… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures, ICRA 2024

  23. arXiv:2407.08959  [pdf, other

    cs.CL

    Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification

    Authors: Ke Ji, Peng Wang, Wenjun Ke, Guozheng Li, Jiajun Liu, Jingsheng Gao, Ziyu Shang

    Abstract: Recently, various pre-trained language models (PLMs) have been proposed to prove their impressive performances on a wide range of few-shot tasks. However, limited by the unstructured prior knowledge in PLMs, it is difficult to maintain consistent performance on complex structured scenarios, such as hierarchical text classification (HTC), especially when the downstream data is extremely scarce. The… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages, 2 figures, Accepted by IJCAI2024

  24. arXiv:2407.08150  [pdf, other

    cs.CV

    Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

    Authors: Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, Ping Wang

    Abstract: Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within… ▽ More

    Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MULTIMEDIA 2024

  25. arXiv:2407.08130  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning

    Authors: Wenrui Li, Penghong Wang, Ruiqin Xiong, Xiaopeng Fan

    Abstract: The spiking neural networks (SNNs) that efficiently encode temporal sequences have shown great potential in extracting audio-visual joint feature representations. However, coupling SNNs (binary spike sequences) with transformers (float-point sequences) to jointly explore the temporal-semantic information still facing challenges. In this paper, we introduce a novel Spiking Tucker Fusion Transformer… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by TIP

  26. arXiv:2407.07457  [pdf, other

    cs.LG cs.CL

    GLBench: A Comprehensive Benchmark for Graph with Large Language Models

    Authors: Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li

    Abstract: The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehen… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.10280 by other authors

  27. arXiv:2407.07174  [pdf, other

    cs.CV

    CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

    Authors: Xiaoding Yuan, Shitao Tang, Kejie Li, Alan Yuille, Peng Wang

    Abstract: This paper introduces Camera-free Diffusion (CamFreeDiff) model for 360-degree image outpainting from a single camera-free image and text description. This method distinguishes itself from existing strategies, such as MVDiffusion, by eliminating the requirement for predefined camera poses. Instead, our model incorporates a mechanism for predicting homography directly within the multi-view diffusio… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  28. arXiv:2407.06904  [pdf, other

    cs.AI

    Hypergraph based Understanding for Document Semantic Entity Recognition

    Authors: Qiwei Li, Zuchao Li, Ping Wang, Haojun Ai, Hai Zhao

    Abstract: Semantic entity recognition is an important task in the field of visually-rich document understanding. It distinguishes the semantic types of text by analyzing the position relationship between text nodes and the relation between text content. The existing document understanding models mainly focus on entity categories while ignoring the extraction of entity boundaries. We build a novel hypergraph… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  29. arXiv:2407.06043  [pdf, other

    cs.CV

    Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts

    Authors: Puzuo Wang, Wei Yao, Jie Shao, Zhiyi He

    Abstract: Domain adaptation (DA) techniques help deep learning models generalize across data shifts for point cloud semantic segmentation (PCSS). Test-time adaptation (TTA) allows direct adaptation of a pre-trained model to unlabeled data during inference stage without access to source data or additional training, avoiding privacy issues and large computational resources. We address TTA for geospatial PCSS… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  30. arXiv:2407.05705  [pdf, other

    cs.AI

    Fast and Continual Knowledge Graph Embedding via Incremental LoRA

    Authors: Jiajun Liu, Wenjun Ke, Peng Wang, Jiahao Wang, Jinhua Gao, Ziyu Shang, Guozheng Li, Zijie Xu, Ke Ji, Yining Li

    Abstract: Continual Knowledge Graph Embedding (CKGE) aims to efficiently learn new knowledge and simultaneously preserve old knowledge. Dominant approaches primarily focus on alleviating catastrophic forgetting of old knowledge but neglect efficient learning for the emergence of new knowledge. However, in real-world scenarios, knowledge graphs (KGs) are continuously growing, which brings a significant chall… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI2024

  31. arXiv:2407.02896  [pdf, other

    cs.HC cs.CY

    Predicting and Understanding Turn-Taking Behavior in Open-Ended Group Activities in Virtual Reality

    Authors: Portia Wang, Eugy Han, Anna C. M. Queiroz, Cyan DeVeaux, Jeremy N. Bailenson

    Abstract: In networked virtual reality (VR), user behaviors, individual differences, and group dynamics can serve as important signals into future speech behaviors, such as who the next speaker will be and the timing of turn-taking behaviors. The ability to predict and understand these behaviors offers opportunities to provide adaptive and personalized assistance, for example helping users with varying sens… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  32. arXiv:2407.02873  [pdf, other

    cs.RO

    Robot Shape and Location Retention in Video Generation Using Diffusion Models

    Authors: Peng Wang, Zhihao Guo, Abdul Latheef Sait, Minh Huy Pham

    Abstract: Diffusion models have marked a significant milestone in the enhancement of image and video generation technologies. However, generating videos that precisely retain the shape and location of moving objects such as robots remains a challenge. This paper presents diffusion models specifically tailored to generate videos that accurately maintain the shape and location of mobile robots. This developme… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 pages, 10 figures

  33. arXiv:2407.02408  [pdf, other

    cs.CL cs.LG

    CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

    Authors: Song Wang, Peng Wang, Tong Zhou, Yushun Dong, Zhen Tan, Jundong Li

    Abstract: As Large Language Models (LLMs) are increasingly deployed to handle various natural language processing (NLP) tasks, concerns regarding the potential negative societal impacts of LLM-generated content have also arisen. To evaluate the biases exhibited by LLMs, researchers have recently proposed a variety of datasets. However, existing bias evaluation efforts often focus on only a particular type o… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 37 pages, 32 figures

  34. arXiv:2407.02174  [pdf, other

    cs.CV

    BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream

    Authors: Wenpu Li, Pian Wan, Peng Wang, Jinghang Li, Yi Zhou, Peidong Liu

    Abstract: Neural implicit representation of visual scenes has attracted a lot of attention in recent research of computer vision and graphics. Most prior methods focus on how to reconstruct 3D scene representation from a set of images. In this work, we demonstrate the possibility to recover the neural radiance fields (NeRF) from a single blurry image and its corresponding event stream. We model the camera m… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  35. arXiv:2407.00898  [pdf, other

    cs.RO

    Residual-MPPI: Online Policy Customization for Continuous Control

    Authors: Pengcheng Wang, Chenran Li, Catherine Weaver, Kenta Kawamoto, Masayoshi Tomizuka, Chen Tang, Wei Zhan

    Abstract: Policies learned through Reinforcement Learning (RL) and Imitation Learning (IL) have demonstrated significant potential in achieving advanced performance in continuous control tasks. However, in real-world environments, it is often necessary to further customize a trained policy when there are additional requirements that were unforeseen during the original training phase. It is possible to fine-… ▽ More

    Submitted 11 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  36. arXiv:2406.19959  [pdf, other

    cs.SD eess.AS

    RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

    Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

    Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  37. arXiv:2406.19143  [pdf, other

    cs.DB cs.DS

    QSketch: An Efficient Sketch for Weighted Cardinality Estimation in Streams

    Authors: Yiyan Qi, Rundong Li, Pinghui Wang, Yufang Sun, Rui Xing

    Abstract: Estimating cardinality, i.e., the number of distinct elements, of a data stream is a fundamental problem in areas like databases, computer networks, and information retrieval. This study delves into a broader scenario where each element carries a positive weight. Unlike traditional cardinality estimation, limited research exists on weighted cardinality, with current methods requiring substantial m… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 10 figures, accepted by KDD 2024

  38. arXiv:2406.16992  [pdf, other

    cs.LG cs.AI

    Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction

    Authors: Yicheng Zhou, Pengfei Wang, Hao Dong, Denghui Zhang, Dingqi Yang, Yanjie Fu, Pengyang Wang

    Abstract: Urban traffic speed prediction aims to estimate the future traffic speed for improving urban transportation services. Enormous efforts have been made to exploit Graph Neural Networks (GNNs) for modeling spatial correlations and temporal dependencies of traffic speed evolving patterns, regularized by graph topology.While achieving promising results, current traffic speed prediction methods still su… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted to IJCAI 2024

  39. arXiv:2406.16633  [pdf, other

    cs.CV

    MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network

    Authors: Yuming Zhang, Shouxin Zhang, Peizhe Wang, Feiyu Zhu, Dongzhi Guan, Jiabin Liu, Changpeng Cai

    Abstract: End-to-end (E2E) training approaches are commonly plagued by high memory consumption, reduced efficiency in training, challenges in model parallelization, and suboptimal biocompatibility. Local learning is considered a novel interactive training method that holds promise as an alternative to E2E. Nonetheless, conventional local learning methods fall short in achieving high model accuracy due to in… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  40. arXiv:2406.15885  [pdf, other

    cs.SD cs.AI eess.AS

    The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

    Authors: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai Zhao

    Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-Findings 2024

  41. arXiv:2406.14653  [pdf, other

    cs.RO cs.AI

    LLM Granularity for On-the-Fly Robot Control

    Authors: Peng Wang, Mattia Robbiani, Zhihao Guo

    Abstract: Assistive robots have attracted significant attention due to their potential to enhance the quality of life for vulnerable individuals like the elderly. The convergence of computer vision, large language models, and robotics has introduced the `visuolinguomotor' mode for assistive robots, where visuals and linguistics are incorporated into assistive robots to enable proactive and interactive assis… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  42. arXiv:2406.14024  [pdf, other

    cs.CL

    LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

    Authors: Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Junjie Hu, Tianyu Liu, Baobao Chang

    Abstract: Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale la… ▽ More

    Submitted 8 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 9 pages

  43. arXiv:2406.12316  [pdf, other

    cs.CV cs.AI cs.MM

    Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning

    Authors: Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Meng Liu, Peng Wang

    Abstract: The Visible-Infrared Person Re-identification (VI ReID) aims to match visible and infrared images of the same pedestrians across non-overlapped camera views. These two input modalities contain both invariant information, such as shape, and modality-specific details, such as color. An ideal model should utilize valuable information from both modalities during training for enhanced representational… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepyed by ACM International Conference on Multimedia Retrieval (ICMR'24)

    Journal ref: ICMR'24: Proceedings of the 2024 International Conference on Multimedia Retrieval (2024) 579 - 588

  44. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  45. arXiv:2406.10789  [pdf, other

    cs.CV

    Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses

    Authors: Zhiwen Fan, Pu Wang, Yang Zhao, Yibo Zhao, Boris Ivanovic, Zhangyang Wang, Marco Pavone, Hao Frank Yang

    Abstract: The increasing rate of road accidents worldwide results not only in significant loss of life but also imposes billions financial burdens on societies. Current research in traffic crash frequency modeling and analysis has predominantly approached the problem as classification tasks, focusing mainly on learning-based classification or ensemble learning methods. These approaches often overlook the in… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  46. arXiv:2406.10708  [pdf, other

    cs.CV cs.DB eess.SP

    MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception

    Authors: M. Mahbubur Rahman, Ryoma Yataka, Sorachi Kato, Pu Perry Wang, Peizhao Li, Adriano Cardace, Petros Boufounos

    Abstract: Compared with an extensive list of automotive radar datasets that support autonomous driving, indoor radar datasets are scarce at a smaller scale in the format of low-resolution radar point clouds and usually under an open-space single-room setting. In this paper, we scale up indoor radar data collection using multi-view high-resolution radar heatmap in a multi-day, multi-room, and multi-subject s… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: 26 pages, 25 figures, 10 tables; See https://doi.org/10.5281/zenodo.12611978 to access the MMVR dataset

  47. arXiv:2406.10276  [pdf, other

    cs.CL cs.SD eess.AS

    Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

    Authors: Peidong Wang, Jian Xue, Jinyu Li, Junkun Chen, Aswin Shanmugam Subramanian

    Abstract: Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In some cases, the input language can be given or estimated. Our goal is to use this additional language information while preserving the quality of the o… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  48. arXiv:2406.09390  [pdf, other

    cs.CV cs.LG

    LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living

    Authors: Rajatsubhra Chakraborty, Arkaprava Sinha, Dominick Reilly, Manish Kumar Govind, Pu Wang, Francois Bremond, Srijan Das

    Abstract: Large Language Vision Models (LLVMs) have demonstrated effectiveness in processing internet videos, yet they struggle with the visually perplexing dynamics present in Activities of Daily Living (ADL) due to limited pertinent datasets and models tailored to relevant cues. To this end, we propose a framework for curating ADL multiview datasets to fine-tune LLVMs, resulting in the creation of ADL-X,… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  49. arXiv:2406.09161  [pdf, other

    cs.SD eess.AS

    Complex Image-Generative Diffusion Transformer for Audio Denoising

    Authors: Junhui Li, Pu Wang, Jialu Li, Youshan Zhang

    Abstract: The audio denoising technique has captured widespread attention in the deep neural network field. Recently, the audio denoising problem has been converted into an image generation task, and deep learning-based approaches have been applied to tackle this problem. However, its performance is still limited, leaving room for further improvement. In order to enhance audio denoising performance, this pa… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  50. arXiv:2406.09154  [pdf, other

    cs.SD cs.CL eess.AS

    Diffusion Gaussian Mixture Audio Denoise

    Authors: Pu Wang, Junhui Li, Jialu Li, Liangdong Guo, Youshan Zhang

    Abstract: Recent diffusion models have achieved promising performances in audio-denoising tasks. The unique property of the reverse process could recover clean signals. However, the distribution of real-world noises does not comply with a single Gaussian distribution and is even unknown. The sampling of Gaussian noise conditions limits its application scenarios. To overcome these challenges, we propose a Di… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024