Skip to main content

Showing 1–50 of 540 results for author: Gao, S

  1. arXiv:2407.13675  [pdf, other

    cs.CV

    MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis

    Authors: Ziming Zhong, Yanxu Xu, Jing Li, Jiale Xu, Zhengxin Li, Chaohui Yu, Shenghua Gao

    Abstract: We present MeshSegmenter, a simple yet effective framework designed for zero-shot 3D semantic segmentation. This model successfully extends the powerful capabilities of 2D segmentation models to 3D meshes, delivering accurate 3D segmentation across diverse meshes and segment descriptions. Specifically, our model leverages the Segment Anything Model (SAM) model to segment the target regions from im… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: The paper was accepted by ECCV2024

  2. arXiv:2407.13111  [pdf, other

    cs.MM cs.CV

    PG-Attack: A Precision-Guided Adversarial Attack Framework Against Vision Foundation Models for Autonomous Driving

    Authors: Jiyuan Fu, Zhaoyu Chen, Kaixun Jiang, Haijing Guo, Shuyong Gao, Wenqiang Zhang

    Abstract: Vision foundation models are increasingly employed in autonomous driving systems due to their advanced capabilities. However, these models are susceptible to adversarial attacks, posing significant risks to the reliability and safety of autonomous vehicles. Adversaries can exploit these vulnerabilities to manipulate the vehicle's perception of its surroundings, leading to erroneous decisions and p… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: First-Place in the CVPR 2024 Workshop Challenge: Black-box Adversarial Attacks on Vision Foundation Models

  3. arXiv:2407.09645  [pdf, other

    eess.SY cs.LG cs.RO

    Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey

    Authors: Milan Ganai, Sicun Gao, Sylvia Herbert

    Abstract: Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was limited to verifying lo… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  4. arXiv:2407.07372  [pdf, other

    eess.IV cs.CV

    Trustworthy Contrast-enhanced Brain MRI Synthesis

    Authors: Jiyao Liu, Yuxin Li, Shangqi Gao, Yuncheng Zhou, Xin Gao, Ningsheng Xu, Xiao-Yong Zhang, Xiahai Zhuang

    Abstract: Contrast-enhanced brain MRI (CE-MRI) is a valuable diagnostic technique but may pose health risks and incur high costs. To create safer alternatives, multi-modality medical image translation aims to synthesize CE-MRI images from other available modalities. Although existing methods can generate promising predictions, they still face two challenges, i.e., exhibiting over-confidence and lacking inte… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

  5. arXiv:2407.07291  [pdf, other

    cs.LG cs.AI stat.ML

    Causal Discovery in Semi-Stationary Time Series

    Authors: Shanyun Gao, Raghavendra Addanki, Tong Yu, Ryan A. Rossi, Murat Kocaoglu

    Abstract: Discovering causal relations from observational time series without making the stationary assumption is a significant challenge. In practice, this challenge is common in many areas, such as retail sales, transportation systems, and medical science. Here, we consider this problem for a class of non-stationary time series. The structural causal model (SCM) of this type of time series, called the sem… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    ACM Class: I.2.6, G.3

  6. arXiv:2407.07290  [pdf, other

    cs.LG cs.AI stat.ML

    Causal Discovery-Driven Change Point Detection in Time Series

    Authors: Shanyun Gao, Raghavendra Addanki, Tong Yu, Ryan A. Rossi, Murat Kocaoglu

    Abstract: Change point detection in time series seeks to identify times when the probability distribution of time series changes. It is widely applied in many areas, such as human-activity sensing and medical science. In the context of multivariate time series, this typically involves examining the joint distribution of high-dimensional data: If any one variable changes, the whole time series is assumed to… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    ACM Class: I.2.6, G.3

  7. arXiv:2407.06483  [pdf, other

    cs.LG cs.CL

    Composable Interventions for Language Models

    Authors: Arinbjorn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen, Thomas Hartvigsen

    Abstract: Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventi… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  8. arXiv:2407.03379  [pdf, other

    stat.ME cs.LG stat.ML

    missForestPredict -- Missing data imputation for prediction settings

    Authors: Elena Albu, Shan Gao, Laure Wynants, Ben Van Calster

    Abstract: Prediction models are used to predict an outcome based on input variables. Missing data in input variables often occurs at model development and at prediction time. The missForestPredict R package proposes an adaptation of the missForest imputation algorithm that is fast, user-friendly and tailored for prediction settings. The algorithm iteratively imputes variables using random forests until a co… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  9. arXiv:2407.03007  [pdf, other

    cs.CL cs.AI

    What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks

    Authors: Chengrui Huang, Zhengliang Shi, Yuntao Wen, Xiuying Chen, Peng Han, Shen Gao, Shuo Shang

    Abstract: Tool learning methods have enhanced the ability of large language models (LLMs) to interact with real-world applications. Many existing works fine-tune LLMs or design prompts to enable LLMs to select appropriate tools and correctly invoke them to meet user requirements. However, it is observed in previous works that the performance of tool learning varies from tasks, datasets, training settings, a… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 19 pages, 9 figures

  10. arXiv:2407.00896  [pdf, other

    eess.SP cs.AI

    Channel Modeling Aided Dataset Generation for AI-Enabled CSI Feedback: Advances, Challenges, and Solutions

    Authors: Yupeng Li, Gang Li, Zirui Wen, Shuangfeng Han, Shijian Gao, Guangyi Liu, Jiangzhou Wang

    Abstract: The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation metho… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  11. arXiv:2406.19966  [pdf, other

    cs.CL

    Simulating Financial Market via Large Language Model based Agents

    Authors: Shen Gao, Yuntao Wen, Minghang Zhu, Jianing Wei, Yuhan Cheng, Qunzi Zhang, Shuo Shang

    Abstract: Most economic theories typically assume that financial market participants are fully rational individuals and use mathematical models to simulate human behavior in financial markets. However, human behavior is often not entirely rational and is challenging to predict accurately with mathematical models. In this paper, we propose \textbf{A}gent-based \textbf{S}imulated \textbf{F}inancial \textbf{M}… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  12. arXiv:2406.19644  [pdf, other

    cs.AI

    Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs

    Authors: Zichao Shen, Tianchen Zhu, Qingyun Sun, Shiqi Gao, Jianxin Li

    Abstract: Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks due to the difficulty in designing comprehensive and precise reward functions. This inherent difficulty curtails the broader application of RL within game environments characterized by diverse constraints. Preference-based reinforcement learning (PbRL) presents a pioneering framework that cap… ▽ More

    Submitted 30 June, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: accepted by IJCAI 2024 GAAMAL

  13. arXiv:2406.19126  [pdf, other

    physics.optics cs.AI

    Super-resolution imaging using super-oscillatory diffractive neural networks

    Authors: Hang Chen, Sheng Gao, Zejia Zhao, Zhengyang Duan, Haiou Zhang, Gordon Wetzstein, Xing Lin

    Abstract: Optical super-oscillation enables far-field super-resolution imaging beyond diffraction limits. However, the existing super-oscillatory lens for the spatial super-resolution imaging system still confronts critical limitations in performance due to the lack of a more advanced design method and the limited design degree of freedom. Here, we propose an optical super-oscillatory diffractive neural net… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures, 1 table

  14. arXiv:2406.19014  [pdf, other

    cs.GT

    The Impact of Autonomous Vehicles on Ride-Hailing Platforms with Strategic Human Drivers

    Authors: Shuqin Gao, Xinyuan Wu, Antonis Dimakis, Costas Courcoubetis

    Abstract: Motivated by the rapid development of autonomous vehicle technology, this work focuses on the challenges of introducing them in ride-hailing platforms with conventional strategic human drivers. We consider a ride-hailing platform that operates a mixed fleet of autonomous vehicles (AVs) and conventional vehicles (CVs), where AVs are fully controlled by the platform and CVs are operated by self-inte… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: This is a working paper. 60 pages

  15. arXiv:2406.18181  [pdf, ps, other

    cs.SE

    An Empirical Study of Unit Test Generation with Large Language Models

    Authors: Lin Yang, Chen Yang, Shutao Gao, Weijing Wang, Bo Wang, Qihao Zhu, Xiao Chu, Jianyi Zhou, Guangtai Liang, Qianxiang Wang, Junjie Chen

    Abstract: Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting s… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  16. arXiv:2406.18085  [pdf, other

    cs.CL

    Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints

    Authors: Ran Song, Shizhu He, Shengxiang Gao, Li Cai, Kang Liu, Zhengtao Yu, Jun Zhao

    Abstract: Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 11 pages, ACL 2023

  17. arXiv:2406.18062  [pdf, other

    cs.LG cs.AI

    Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

    Authors: Chung-En Sun, Sicun Gao, Tsui-Wei Weng

    Abstract: Robustness remains a paramount concern in deep reinforcement learning (DRL), with randomized smoothing emerging as a key technique for enhancing this attribute. However, a notable gap exists in the performance of current smoothed DRL agents, often characterized by significantly low clean rewards and weak robustness. In response to this challenge, our study introduces innovative algorithms aimed at… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Published in ICML 2024

  18. arXiv:2406.16473  [pdf, other

    cs.CV cs.AI

    Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

    Authors: Haoran Wang, Xinji Mai, Zeng Tao, Xuan Tong, Junxiong Lin, Yan Wang, Jiawen Yu, Boyang Wang, Shaoqi Yan, Qing Zhao, Ziheng Zhou, Shuyong Gao, Wenqiang Zhang

    Abstract: The contemporary state-of-the-art of Dynamic Facial Expression Recognition (DFER) technology facilitates remarkable progress by deriving emotional mappings of facial expressions from video content, underpinned by training on voluminous datasets. Yet, the DFER datasets encompass a substantial volume of noise data. Noise arises from low-quality captures that defy logical labeling, and instances that… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  19. arXiv:2406.16459  [pdf, other

    cs.CV

    Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

    Authors: Junxiong Lin, Zeng Tao, Xuan Tong, Xinji Mai, Haoran Wang, Boyang Wang, Yan Wang, Qing Zhao, Jiawen Yu, Yuxuan Lin, Shaoqi Yan, Shuyong Gao, Wenqiang Zhang

    Abstract: The problem of blind image super-resolution aims to recover high-resolution (HR) images from low-resolution (LR) images with unknown degradation modes. Most existing methods model the image degradation process using blur kernels. However, this explicit modeling approach struggles to cover the complex and varied degradation processes encountered in the real world, such as high-order combinations of… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  20. arXiv:2406.15848  [pdf, other

    cs.CV

    Quality-guided Skin Tone Enhancement for Portrait Photography

    Authors: Shiqi Gao, Huiyu Duan, Xinyue Li, Kang Fu, Yicong Peng, Qihang Xu, Yuanyuan Chang, Jia Wang, Xiongkuo Min, Guangtao Zhai

    Abstract: In recent years, learning-based color and tone enhancement methods for photos have become increasingly popular. However, most learning-based image enhancement methods just learn a mapping from one distribution to another based on one dataset, lacking the ability to adjust images continuously and controllably. It is important to enable the learning-based enhancement models to adjust an image contin… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  21. arXiv:2406.14891  [pdf, other

    cs.CL cs.IR

    Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering

    Authors: Zhengliang Shi, Shuo Zhang, Weiwei Sun, Shen Gao, Pengjie Ren, Zhumin Chen, Zhaochun Ren

    Abstract: Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable no… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: ACL 2024 (main conference)

  22. arXiv:2406.12178  [pdf, other

    cs.CV

    FCA-RAC: First Cycle Annotated Repetitive Action Counting

    Authors: Jiada Lu, WeiWei Zhou, Xiang Qian, Dongze Lian, Yanyu Xu, Weifeng Wang, Lina Cao, Shenghua Gao

    Abstract: Repetitive action counting quantifies the frequency of specific actions performed by individuals. However, existing action-counting datasets have limited action diversity, potentially hampering model performance on unseen actions. To address this issue, we propose a framework called First Cycle Annotated Repetitive Action Counting (FCA-RAC). This framework contains 4 parts: 1) a labeling technique… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  23. arXiv:2406.12042  [pdf, other

    cs.CV cs.LG

    Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models

    Authors: Alireza Ganjdanesh, Reza Shirkavand, Shangqian Gao, Heng Huang

    Abstract: Text-to-image (T2I) diffusion models have demonstrated impressive image generation capabilities. Still, their computational intensity prohibits resource-constrained organizations from deploying T2I models after fine-tuning them on their internal target data. While pruning techniques offer a potential solution to reduce the computational burden of T2I models, static pruning methods use the same pru… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  24. arXiv:2406.11228  [pdf, other

    cs.CL

    ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark

    Authors: Hiromi Wakaki, Yuki Mitsufuji, Yoshinori Maeda, Yukiko Nishimura, Silin Gao, Mengjie Zhao, Keiichi Yamada, Antoine Bosselut

    Abstract: We propose a new benchmark, ComperDial, which facilitates the training and evaluation of evaluation metrics for open-domain dialogue systems. ComperDial consists of human-scored responses for 10,395 dialogue turns in 1,485 conversations collected from 99 dialogue agents submitted to the Commonsense Persona-grounded Dialogue (CPD) challenge. As a result, for any dialogue, our benchmark includes mul… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  25. arXiv:2406.10813  [pdf, other

    cs.CL

    Self-Evolution Fine-Tuning for Policy Optimization

    Authors: Ruijun Chen, Jiehao Liang, Shiping Gao, Fanqi Wan, Xiaojun Quan

    Abstract: The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. Current alignment methodologies face considerable challenges. For instance, supervised fine-tuning (SFT) requires extensive, high-quality annotated samples, while reinforcement lea… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  26. arXiv:2406.07843  [pdf, other

    cs.CV q-bio.NC

    Incremental Learning and Self-Attention Mechanisms Improve Neural System Identification

    Authors: Isaac Lin, Tianye Wang, Shang Gao, Shiming Tang, Tai Sing Lee

    Abstract: Convolutional neural networks (CNNs) have been shown to be the state-of-the-art approach for modeling the transfer functions of visual cortical neurons. Cortical neurons in the primary visual cortex are are sensitive to contextual information mediated by extensive horizontal and feedback connections. Standard CNNs can integrate global spatial image information to model such contextual modulation v… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Preprint NeurIPS 2024

  27. arXiv:2406.05361  [pdf, other

    cs.CL

    Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization

    Authors: Xiuying Chen, Shen Gao, Mingzhe Li, Qingqing Zhu, Xin Gao, Xiangliang Zhang

    Abstract: Nowadays, neural text generation has made tremendous progress in abstractive summarization tasks. However, most of the existing summarization models take in the whole document all at once, which sometimes cannot meet the needs in practice. Practically, social text streams such as news events and tweets keep growing from time to time, and can only be fed to the summarization system step by step. He… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 figures, published in TASLP

  28. arXiv:2406.05360  [pdf, other

    cs.CL

    Flexible and Adaptable Summarization via Expertise Separation

    Authors: Xiuying Chen, Mingzhe Li, Shen Gao, Xin Cheng, Qingqing Zhu, Rui Yan, Xin Gao, Xiangliang Zhang

    Abstract: A proficient summarization model should exhibit both flexibility -- the capacity to handle a range of in-domain summarization tasks, and adaptability -- the competence to acquire new knowledge and adjust to unseen out-of-domain tasks. Unlike large language models (LLMs) that achieve this through parameter scaling, we propose a more parameter-efficient approach in this study. Our motivation rests o… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures, published in SIGIR 2024

  29. arXiv:2406.04938  [pdf, other

    cs.LG cs.AI

    SpanGNN: Towards Memory-Efficient Graph Neural Networks via Spanning Subgraph Training

    Authors: Xizhi Gu, Hongzheng Li, Shihong Gao, Xinyan Zhang, Lei Chen, Yingxia Shao

    Abstract: Graph Neural Networks (GNNs) have superior capability in learning graph data. Full-graph GNN training generally has high accuracy, however, it suffers from large peak memory usage and encounters the Out-of-Memory problem when handling large graphs. To address this memory problem, a popular solution is mini-batch GNN training. However, mini-batch GNN training increases the training variance and sac… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  30. arXiv:2406.04151  [pdf, other

    cs.AI cs.CL

    AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

    Authors: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervis… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project site: https://agentgym.github.io

  31. arXiv:2406.00617  [pdf, other

    cs.DB cs.SI

    Maximum $k$-Plex Search: An Alternated Reduction-and-Bound Method

    Authors: Shuohao Gao, Kaiqiang Yu, Shengxin Liu, Cheng Long

    Abstract: $k$-plexes relax cliques by allowing each vertex to disconnect to at most $k$ vertices. Finding a maximum $k… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  32. arXiv:2406.00587  [pdf, other

    cs.CV

    Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024

    Authors: Biao Wu, Diankai Zhang, Si Gao, Chengjian Zheng, Shaoli Liu, Ning Wang

    Abstract: Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Compared with image scene parsing, video scene parsing introduces temporal information, which can effectively improve the consistency and accuracy of prediction,because the real-world is actually video-based rather th… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Champion Solution for CVPR 2024 PVUW VSS Track. arXiv admin note: text overlap with arXiv:2306.02894

  33. arXiv:2406.00500  [pdf, other

    cs.CV

    2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation

    Authors: Biao Wu, Diankai Zhang, Si Gao, Chengjian Zheng, Shaoli Liu, Ning Wang

    Abstract: Video Panoptic Segmentation (VPS) is a challenging task that is extends from image panoptic segmentation.VPS aims to simultaneously classify, track, segment all objects in a video, including both things and stuff. Due to its wide application in many downstream tasks such as video understanding, video editing, and autonomous driving. In order to deal with the task of video panoptic segmentation in… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 2nd Place Solution for CVPR 2024 PVUW VPS Track

  34. arXiv:2406.00494  [pdf, other

    cs.LG cs.AI

    Activation-Descent Regularization for Input Optimization of ReLU Networks

    Authors: Hongzhan Yu, Sicun Gao

    Abstract: We present a new approach for input optimization of ReLU networks that explicitly takes into account the effect of changes in activation patterns. We analyze local optimization steps in both the input space and the space of activation patterns to propose methods with superior local descent properties. To accomplish this, we convert the discrete space of activation patterns into differentiable repr… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: ICML'24 Proceedings

  35. arXiv:2406.00378  [pdf

    physics.app-ph cs.NE

    Real-Time State Modulation and Acquisition Circuit in Neuromorphic Memristive Systems

    Authors: Shengbo Wang, Cong Li, Tongming Pu, Jian Zhang, Weihao Ma, Luigi Occhipinti, Arokia Nathan, Shuo Gao

    Abstract: Memristive neuromorphic systems are designed to emulate human perception and cognition, where the memristor states represent essential historical information to perform both low-level and high-level tasks. However, current systems face challenges with the separation of state modulation and acquisition, leading to undesired time delays that impact real-time performance. To overcome this issue, we i… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 5 pages, 8 figures

  36. arXiv:2405.19327  [pdf, other

    cs.CL cs.AI cs.LG

    MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

    Authors: Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu , et al. (20 additional authors not shown)

    Abstract: Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparabl… ▽ More

    Submitted 10 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: https://map-neo.github.io/

  37. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  38. arXiv:2405.18416  [pdf, other

    cs.CV

    3D StreetUnveiler with Semantic-Aware 2DGS

    Authors: Jingwei Xu, Yikai Wang, Yiqun Zhao, Yanwei Fu, Shenghua Gao

    Abstract: Unveiling an empty street from crowded observations captured by in-car cameras is crucial for autonomous driving. However, removing all temporarily static objects, such as stopped vehicles and standing pedestrians, presents a significant challenge. Unlike object-centric 3D inpainting, which relies on thorough observation in a small scene, street scene cases involve long trajectories that differ fr… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://streetunveiler.github.io

  39. arXiv:2405.17832  [pdf, other

    cs.LG cs.AI cs.RO

    Mollification Effects of Policy Gradient Methods

    Authors: Tao Wang, Sylvia Herbert, Sicun Gao

    Abstract: Policy gradient methods have enabled deep reinforcement learning (RL) to approach challenging continuous control problems, even when the underlying systems involve highly nonlinear dynamics that generate complex non-smooth optimization landscapes. We develop a rigorous framework for understanding how policy gradient methods mollify non-smooth optimization landscapes to enable effective policy sear… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 19 pages, 41 figures

  40. arXiv:2405.17398  [pdf, other

    cs.CV cs.AI

    Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

    Authors: Shenyuan Gao, Jiazhi Yang, Li Chen, Kashyap Chitta, Yihang Qiu, Andreas Geiger, Jun Zhang, Hongyang Li

    Abstract: World models can foresee the outcomes of different actions, which is of paramount importance for autonomous driving. Nevertheless, existing driving world models still have limitations in generalization to unseen environments, prediction fidelity of critical details, and action controllability for flexible application. In this paper, we present Vista, a generalizable driving world model with high f… ▽ More

    Submitted 6 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Code and model: https://github.com/OpenDriveLab/Vista, video demos: https://vista-demo.github.io

  41. arXiv:2405.16533  [pdf, other

    cs.CL

    Chain of Tools: Large Language Model is an Automatic Multi-tool Learner

    Authors: Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Zhumin Chen, Suzan Verberne, Zhaochun Ren

    Abstract: Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, empowering them to solve practical tasks. Existing work typically empowers LLMs as tool users with a manually designed workflow, where the LLM plans a series of tools in a step-by-step manner, and sequentially executes each tool to obtain intermediate results until deriving the… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Work in progress

  42. arXiv:2405.16437  [pdf, other

    cs.CV

    Incremental Pseudo-Labeling for Black-Box Unsupervised Domain Adaptation

    Authors: Yawen Zou, Chunzhi Gu, Jun Yu, Shangce Gao, Chao Zhang

    Abstract: Black-Box unsupervised domain adaptation (BBUDA) learns knowledge only with the prediction of target data from the source model without access to the source data and source model, which attempts to alleviate concerns about the privacy and security of data. However, incorrect pseudo-labels are prevalent in the prediction generated by the source model due to the cross-domain discrepancy, which may s… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  43. arXiv:2405.14347  [pdf, other

    eess.SP cs.AI

    Doubly-Dynamic ISAC Precoding for Vehicular Networks: A Constrained Deep Reinforcement Learning (CDRL) Approach

    Authors: Zonghui Yang, Shijian Gao, Xiang Cheng

    Abstract: Integrated sensing and communication (ISAC) technology is essential for enabling the vehicular networks. However, the communication channel in this scenario exhibits time-varying characteristics, and the potential targets may move rapidly, creating a doubly-dynamic phenomenon. This nature poses a challenge for real-time precoder design. While optimization-based solutions are widely researched, the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  44. arXiv:2405.14169  [pdf, other

    cs.CV

    Towards Transferable Attacks Against Vision-LLMs in Autonomous Driving with Typography

    Authors: Nhat Chung, Sensen Gao, Tuan-Anh Vu, Jie Zhang, Aishan Liu, Yun Lin, Jin Song Dong, Qing Guo

    Abstract: Vision-Large-Language-Models (Vision-LLMs) are increasingly being integrated into autonomous driving (AD) systems due to their advanced visual-language reasoning capabilities, targeting the perception, prediction, planning, and control mechanisms. However, Vision-LLMs have demonstrated susceptibilities against various types of adversarial attacks, which would compromise their reliability and safet… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 12 pages, 5 tables, 5 figures, work in progress

  45. arXiv:2405.13872  [pdf, other

    cs.AI cs.CL cs.CV

    Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models

    Authors: Qiji Zhou, Ruochen Zhou, Zike Hu, Panzhong Lu, Siyang Gao, Yue Zhang

    Abstract: Recent advancements in Chain-of-Thought (CoT) and related rationale-based works have significantly improved the performance of Large Language Models (LLMs) in complex reasoning tasks. With the evolution of Multimodal Large Language Models (MLLMs), enhancing their capability to tackle complex multimodal reasoning problems is a crucial frontier. However, incorporating multimodal rationales in CoT ha… ▽ More

    Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Correct the case title

  46. arXiv:2405.10504  [pdf

    cs.CV

    Multi-scale Semantic Prior Features Guided Deep Neural Network for Urban Street-view Image

    Authors: Jianshun Zeng, Wang Li, Yanjie Lv, Shuai Gao, YuChu Qin

    Abstract: Street-view image has been widely applied as a crucial mobile mapping data source. The inpainting of street-view images is a critical step for street-view image processing, not only for the privacy protection, but also for the urban environment mapping applications. This paper presents a novel Deep Neural Network (DNN), multi-scale semantic prior Feature guided image inpainting Network (MFN) for i… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  47. arXiv:2405.05708  [pdf

    cs.IT

    Characteristic-Mode Based Conformal Design of Ultra-Wideband Antenna Array

    Authors: Zhan Chen, Wei Hu, Yuchen Gao, Qi Luo, Xiangbo Wang, Steven Gao

    Abstract: An innovative design method of conformal array antennas is presented by utilizing characteristic mode analysis (CMA) in this work. A single-layer continuous perfect electric conductor under bending conditions is conducted by CMA to evaluate the variations in operating performance. By using this method, the design process of a conformal array is simplified. The results indicate that the operating p… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  48. arXiv:2405.02982  [pdf, other

    cs.CV

    Paintings and Drawings Aesthetics Assessment with Rich Attributes for Various Artistic Categories

    Authors: Xin Jin, Qianqian Qiao, Yi Lu, Shan Gao, Heng Huang, Guangdong Li

    Abstract: Image aesthetic evaluation is a highly prominent research domain in the field of computer vision. In recent years, there has been a proliferation of datasets and corresponding evaluation methodologies for assessing the aesthetic quality of photographic works, leading to the establishment of a relatively mature research environment. However, in contrast to the extensive research in photographic aes… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  49. arXiv:2405.02942  [pdf, other

    physics.optics cs.CV cs.RO eess.IV

    Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

    Authors: Shaohua Gao, Qi Jiang, Yiqi Liao, Yi Qiu, Wanglei Ying, Kailun Yang, Kaiwei Wang, Benhao Zhang, Jian Bai

    Abstract: We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360°x(35°~110°) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 len… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to Optics & Laser Technology

  50. arXiv:2405.02561  [pdf, other

    cs.LG math.NA

    Understanding the Difficulty of Solving Cauchy Problems with PINNs

    Authors: Tao Wang, Bo Zhao, Sicun Gao, Rose Yu

    Abstract: Physics-Informed Neural Networks (PINNs) have gained popularity in scientific computing in recent years. However, they often fail to achieve the same level of accuracy as classical methods in solving differential equations. In this paper, we identify two sources of this issue in the case of Cauchy problems: the use of $L^2$ residuals as objective functions and the approximation gap of neural netwo… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 13 pages and 18 figures