Skip to main content

Showing 1–50 of 7,451 results for author: Li, Y

  1. arXiv:2407.13372  [pdf, other

    cs.CV

    Any Image Restoration with Efficient Automatic Degradation Adaptation

    Authors: Bin Ren, Eduard Zamfir, Yawei Li, Zongwei Wu, Danda Pani Paudel, Radu Timofte, Nicu Sebe, Luc Van Gool

    Abstract: With the emergence of mobile devices, there is a growing demand for an efficient model to restore any degraded image for better perceptual quality. However, existing models often require specific learning modules tailored for each degradation, resulting in complex architectures and high computation costs. Different from previous work, in this paper, we propose a unified manner to achieve joint emb… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Efficient Any Image Restoration

  2. arXiv:2407.13246  [pdf, other

    cs.CV

    STS MICCAI 2023 Challenge: Grand challenge on 2D and 3D semi-supervised tooth segmentation

    Authors: Yaqi Wang, Yifan Zhang, Xiaodiao Chen, Shuai Wang, Dahong Qian, Fan Ye, Feng Xu, Hongyuan Zhang, Qianni Zhang, Chengyu Wu, Yunxiang Li, Weiwei Cui, Shan Luo, Chengkai Wang, Tianhao Li, Yi Liu, Xiang Feng, Huiyu Zhou, Dongyun Liu, Qixuan Wang, Zhouhao Lin, Wei Song, Yuanlin Li, Bing Wang, Chunshi Wang , et al. (2 additional authors not shown)

    Abstract: Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.13195  [pdf, other

    cs.LG cs.AI cs.HC cs.IT stat.ML

    Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation

    Authors: Yingru Li, Jiawei Xu, Zhi-Quan Luo

    Abstract: Foundation models often struggle with uncertainty when faced with new situations in online decision-making, necessitating scalable and efficient exploration to resolve this uncertainty. We introduce GPT-HyperAgent, an augmentation of GPT with HyperAgent for uncertainty-aware, scalable exploration in contextual bandits, a fundamental online decision problem involving natural language input. We prov… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 41 pages

  4. arXiv:2407.13168  [pdf, other

    cs.AI cs.CL

    SciCode: A Research Coding Benchmark Curated by Scientists

    Authors: Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, Shengyan Liu, Di Luo, Yutao Ma, Hao Tong, Kha Trinh, Chenyu Tian, Zihan Wang, Bohao Wu, Yanyu Xiong, Shengzhu Yin, Minhui Zhu, Kilian Lieret, Yanxin Lu, Genglin Liu, Yufeng Du , et al. (5 additional authors not shown)

    Abstract: Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 25 pages, 9 figures, 7 tables

  5. arXiv:2407.12813  [pdf, other

    cs.CL cs.AI

    Data Generation using Large Language Models for Text Classification: An Empirical Case Study

    Authors: Yinheng Li, Rogerio Bonatti, Sara Abdali, Justin Wagle, Kazuhito Koishida

    Abstract: Using Large Language Models (LLMs) to generate synthetic data for model training has become increasingly popular in recent years. While LLMs are capable of producing realistic training data, the effectiveness of data generation is influenced by various factors, including the choice of prompt, task complexity, and the quality, quantity, and diversity of the generated data. In this work, we focus ex… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Accepted by DMLR @ ICML 2024

  6. arXiv:2407.12788  [pdf, other

    cs.CV cs.AI

    SS-ADA: A Semi-Supervised Active Domain Adaptation Framework for Semantic Segmentation

    Authors: Weihao Yan, Yeqiang Qian, Yueyuan Li, Tao Li, Chunxiang Wang, Ming Yang

    Abstract: Semantic segmentation plays an important role in intelligent vehicles, providing pixel-level semantic information about the environment. However, the labeling budget is expensive and time-consuming when semantic segmentation model is applied to new driving scenarios. To reduce the costs, semi-supervised semantic segmentation methods have been proposed to leverage large quantities of unlabeled imag… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

    Comments: 12 pages,13 figures,8 tables

  7. arXiv:2407.12550  [pdf, other

    cs.LG

    UniTE: A Survey and Unified Pipeline for Pre-training ST Trajectory Embeddings

    Authors: Yan Lin, Zeyu Zhou, Yicheng Liu, Haochen Lv, Haomin Wen, Tianyi Li, Yushuai Li, Christian S. Jensen, Shengnan Guo, Youfang Lin, Huaiyu Wan

    Abstract: Spatio-temporal (ST) trajectories are sequences of timestamped locations, which enable a variety of analyses that in turn enable important real-world applications. It is common to map trajectories to vectors, called embeddings, before subsequent analyses. Thus, the qualities of embeddings are very important. Methods for pre-training embeddings, which leverage unlabeled trajectories for training un… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  8. arXiv:2407.12538  [pdf, other

    eess.IV cs.CV

    High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion

    Authors: Juan Song, Jiaxiang He, Mingtao Feng, Keyan Wang, Yunsong Li, Ajmal Mian

    Abstract: Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in image compression applications. To address this issue, we propose an efficient Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compressio… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  9. arXiv:2407.12470  [pdf, other

    cs.CL

    Continual Learning for Temporal-Sensitive Question Answering

    Authors: Wanqi Yang, Yunqiu Xu, Yanda Li, Kunze Wang, Binbin Huang, Ling Chen

    Abstract: In this study, we explore an emerging research area of Continual Learning for Temporal Sensitive Question Answering (CLTSQA). Previous research has primarily focused on Temporal Sensitive Question Answering (TSQA), often overlooking the unpredictable nature of future events. In real-world applications, it's crucial for models to continually acquire knowledge over time, rather than relying on a sta… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCNN 2024

  10. arXiv:2407.12274  [pdf, other

    cs.CV

    MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics

    Authors: Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu, Yongwei Li

    Abstract: Deception detection has garnered increasing attention in recent years due to the significant growth of digital media and heightened ethical and security concerns. It has been extensively studied using multimodal methods, including video, audio, and text. In addition, individual differences in deception production and detection are believed to play a crucial role.Although some studies have utilized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Code and data are available; Submitted to NeurIPS 2024 Datasets and Benchmarks Track

  11. arXiv:2407.12164  [pdf, other

    cs.CV cs.AI cs.LG

    Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning

    Authors: Yanting Miao, William Loh, Suraj Kothawade, Pascal Poupart, Abdullah Rashwan, Yeqing Li

    Abstract: Text-to-image generative models have recently attracted considerable interest, enabling the synthesis of high-quality images from textual prompts. However, these models often lack the capability to generate specific subjects from given reference images or to synthesize novel renditions under varying conditions. Methods like DreamBooth and Subject-driven Text-to-Image (SuTI) have made significant p… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  12. arXiv:2407.12051  [pdf, other

    q-bio.GN cs.AI cs.LG

    Dy-mer: An Explainable DNA Sequence Representation Scheme using Sparse Recovery

    Authors: Zhiyuan Peng, Yuanbo Tang, Yang Li

    Abstract: DNA sequences encode vital genetic and biological information, yet these unfixed-length sequences cannot serve as the input of common data mining algorithms. Hence, various representation schemes have been developed to transform DNA sequences into fixed-length numerical representations. However, these schemes face difficulties in learning high-quality representations due to the complexity and spar… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  13. arXiv:2407.12038  [pdf, ps, other

    eess.AS cs.AI

    ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

    Authors: Ruibo Fu, Rui Liu, Chunyu Qiang, Yingming Gao, Yi Lu, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Shuchen Shi, Xin Qi, Guanjun Li

    Abstract: The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: ISCSLP 2024 Challenge description

  14. arXiv:2407.12003  [pdf, other

    cs.HC

    Evaluation and Continual Improvement for an Enterprise AI Assistant

    Authors: Akash V. Maharaj, Kun Qian, Uttaran Bhattacharya, Sally Fang, Horia Galatanu, Manas Garg, Rachel Hanessian, Nishant Kapoor, Ken Russell, Shivakumar Vaithyanathan, Yunyao Li

    Abstract: The development of conversational AI assistants is an iterative process with multiple components. As such, the evaluation and continual improvement of these assistants is a complex and multifaceted problem. This paper introduces the challenges in evaluating and improving a generative AI assistant for enterprises, which is under active development, and how we address these challenges. We also share… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

    Comments: Accepted to DaSH Workshop at NAACL 2024

  15. arXiv:2407.11966  [pdf, other

    cs.CV cs.AI cs.LG

    Efficient Training with Denoised Neural Weights

    Authors: Yifan Gong, Zheng Zhan, Yanyu Li, Yerlan Idelbayev, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren

    Abstract: Good weight initialization serves as an effective measure to reduce the training cost of a deep neural network (DNN) model. The choice of how to initialize parameters is challenging and may require manual tuning, which can be time-consuming and prone to human error. To overcome such limitations, this work takes a novel step towards building a weight generator to synthesize the neural weights for i… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project Page: https://yifanfanfanfan.github.io/denoised-weights/

  16. arXiv:2407.11965  [pdf, other

    cs.CV

    UrbanWorld: An Urban World Model for 3D City Generation

    Authors: Yu Shang, Jiansheng Chen, Hangyu Fan, Jingtao Ding, Jie Feng, Yong Li

    Abstract: Cities, as the most fundamental environment of human life, encompass diverse physical elements such as buildings, roads and vegetation with complex interconnection. Crafting realistic, interactive 3D urban environments plays a crucial role in constructing AI agents capable of perceiving, decision-making, and acting like humans in real-world environments. However, creating high-fidelity 3D urban en… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 11 pages

  17. arXiv:2407.11853  [pdf, other

    cs.ET

    A Case for Application-Aware Space Radiation Tolerance in Orbital Computing

    Authors: Meiqi Wang, Han Qiu, Longnv Xu, Di Wang, Yuanjie Li, Tianwei Zhang, Jun Liu, Hewu Li

    Abstract: We are witnessing a surge in the use of commercial off-the-shelf (COTS) hardware for cost-effective in-orbit computing, such as deep neural network (DNN) based on-satellite sensor data processing, Earth object detection, and task decision.However, once exposed to harsh space environments, COTS hardware is vulnerable to cosmic radiation and suffers from exhaustive single-event upsets (SEUs) and mul… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  18. arXiv:2407.11784  [pdf, other

    cs.AI cs.CV cs.LG

    Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development

    Authors: Daoyuan Chen, Haibin Wang, Yilun Huang, Ce Ge, Yaliang Li, Bolin Ding, Jingren Zhou

    Abstract: The emergence of large-scale multi-modal generative models has drastically advanced artificial intelligence, introducing unprecedented levels of performance and functionality. However, optimizing these models remains challenging due to historically isolated paths of model-centric and data-centric developments, leading to suboptimal outcomes and inefficient resource utilization. In response, we pre… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 26 pages, 9 figures, 5 tables

  19. arXiv:2407.11644  [pdf, other

    cs.CV cs.RO

    Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

    Authors: Guoliang You, Xiaomeng Chu, Yifan Duan, Wenyu Zhang, Xingchen Li, Sha Zhang, Yao Li, Jianmin Ji, Yanyong Zhang

    Abstract: When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  20. arXiv:2407.11541  [pdf, other

    eess.IV cs.CV

    Uniformly Accelerated Motion Model for Inter Prediction

    Authors: Zhuoyuan Li, Yao Li, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

    Abstract: Inter prediction is a key technology to reduce the temporal redundancy in video coding. In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly. In Versatile Video Coding (VVC), existing inter prediction methods usually assume uniform speed motion between consecutive frames and use the linear… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  21. arXiv:2407.11326  [pdf, other

    cs.RO

    HEROS: Hierarchical Exploration with Online Subregion Updating for 3D Environment Coverage

    Authors: Shijun Long, Ying Li, Chenming Wu, Bin Xu, Wei Fan

    Abstract: We present an autonomous exploration system for efficient coverage of unknown environments. First, a rapid environment preprocessing method is introduced to provide environmental information for subsequent exploration planning. Then, the whole exploration space is divided into multiple subregion cells, each with varying levels of detail. The subregion cells are capable of decomposition and updatin… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  22. arXiv:2407.11077  [pdf, other

    cs.LG cs.AI

    Deep reinforcement learning with symmetric data augmentation applied for aircraft lateral attitude tracking control

    Authors: Yifei Li, Erik-jan van Kampen

    Abstract: Symmetry is an essential property in some dynamical systems that can be exploited for state transition prediction and control policy optimization. This paper develops two symmetry-integrated Reinforcement Learning (RL) algorithms based on standard Deep Deterministic Policy Gradient (DDPG),which leverage environment symmetry to augment explored transition samples of a Markov Decision Process(MDP).… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  23. arXiv:2407.10947  [pdf, other

    cs.CV

    Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

    Authors: Yaoting Wang, Peiwen Sun, Yuanchao Li, Honggang Zhang, Di Hu

    Abstract: The Audio-Visual Segmentation (AVS) task aims to segment sounding objects in the visual space using audio cues. However, in this work, it is recognized that previous AVS methods show a heavy reliance on detrimental segmentation preferences related to audible objects, rather than precise audio guidance. We argue that the primary reason is that audio lacks robust semantics compared to vision, especi… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  24. arXiv:2407.10926  [pdf, other

    eess.IV cs.CV

    In-Loop Filtering via Trained Look-Up Tables

    Authors: Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

    Abstract: In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures

  25. arXiv:2407.10878  [pdf, other

    cs.LG cs.AI

    Deep Causal Learning to Explain and Quantify The Geo-Tension's Impact on Natural Gas Market

    Authors: Philipp Kai Peter, Yulin Li, Ziyue Li, Wolfgang Ketter

    Abstract: Natural gas demand is a crucial factor for predicting natural gas prices and thus has a direct influence on the power system. However, existing methods face challenges in assessing the impact of shocks, such as the outbreak of the Russian-Ukrainian war. In this context, we apply deep neural network-based Granger causality to identify important drivers of natural gas demand. Furthermore, the result… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE PES ISGT Europe 2024 conference

  26. arXiv:2407.10695  [pdf, other

    cs.CV

    IE-NeRF: Inpainting Enhanced Neural Radiance Fields in the Wild

    Authors: Shuaixian Wang, Haoran Xu, Yaokun Li, Jiwei Chen, Guang Tan

    Abstract: We present a novel approach for synthesizing realistic novel views using Neural Radiance Fields (NeRF) with uncontrolled photos in the wild. While NeRF has shown impressive results in controlled settings, it struggles with transient objects commonly found in dynamic and time-varying scenes. Our framework called \textit{Inpainting Enhanced NeRF}, or \ours, enhances the conventional NeRF by drawing… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  27. arXiv:2407.10660  [pdf, other

    cs.RO

    HPHS: Hierarchical Planning based on Hybrid Frontier Sampling for Unknown Environments Exploration

    Authors: Shijun Long, Ying Li, Chenming Wu, Bin Xu, Wei Fan

    Abstract: Rapid sampling from the environment to acquire available frontier points and timely incorporating them into subsequent planning to reduce fragmented regions are critical to improve the efficiency of autonomous exploration. We propose HPHS, a fast and effective method for the autonomous exploration of unknown environments. In this work, we efficiently sample frontier points directly from the LiDAR… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  28. arXiv:2407.10471  [pdf, other

    cs.CR cs.AI cs.SD eess.AS

    GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis

    Authors: Weizhi Liu, Yue Li, Dongdong Lin, Hui Tian, Haizhou Li

    Abstract: Amid the burgeoning development of generative models like diffusion models, the task of differentiating synthesized audio from its natural counterpart grows more daunting. Deepfake detection offers a viable solution to combat this challenge. Yet, this defensive measure unintentionally fuels the continued refinement of generative models. Watermarking emerges as a proactive and sustainable tactic, p… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  29. arXiv:2407.10459  [pdf, other

    cs.CV

    DiffStega: Towards Universal Training-Free Coverless Image Steganography with Diffusion Models

    Authors: Yiwei Yang, Zheyuan Liu, Jun Jia, Zhongpai Gao, Yunhao Li, Wei Sun, Xiaohong Liu, Guangtao Zhai

    Abstract: Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized text prompts as keys in CIS through diffusion models. However, this approach faces three challenges: invalidated when private prompt is guessed, c… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures; reference added; accepted at IJCAI2024 main track

  30. arXiv:2407.10048  [pdf, other

    cs.SD eess.AS

    Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification

    Authors: Li Zhang, Ning Jiang, Qing Wang, Yue Li, Quan Lu, Lei Xie

    Abstract: Trained on 680,000 hours of massive speech data, Whisper is a multitasking, multilingual speech foundation model demonstrating superior performance in automatic speech recognition, translation, and language identification. However, its applicability in speaker verification (SV) tasks remains unexplored, particularly in low-data-resource scenarios where labeled speaker data in specific domains are… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  31. arXiv:2407.10039  [pdf, other

    cs.SE cs.CR cs.PL

    OpenTracer: A Dynamic Transaction Trace Analyzer for Smart Contract Invariant Generation and Beyond

    Authors: Zhiyang Chen, Ye Liu, Sidi Mohamed Beillahi, Yi Li, Fan Long

    Abstract: Smart contracts, self-executing programs on the blockchain, facilitate reliable value exchanges without centralized oversight. Despite the recent focus on dynamic analysis of their transaction histories in both industry and academia, no open-source tool currently offers comprehensive tracking of complete transaction information to extract user-desired data such as invariant-related data. This pape… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  32. arXiv:2407.10005  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond

    Authors: Yingcong Li, Ankit Singh Rawat, Samet Oymak

    Abstract: Recent research has shown that Transformers with linear attention are capable of in-context learning (ICL) by implementing a linear estimator through gradient descent steps. However, the existing results on the optimization landscape apply under stylized settings where task and feature vectors are assumed to be IID and the attention weights are fully parameterized. In this work, we develop a stron… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  33. arXiv:2407.09811  [pdf, other

    cs.AI cs.HC q-bio.GN

    CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis

    Authors: Yihang Xiao, Jinyi Liu, Yan Zheng, Xiaohan Xie, Jianye Hao, Mingzhi Li, Ruitao Wang, Fei Ni, Yuxiao Li, Jintian Luo, Shaoqing Jiao, Jiajie Peng

    Abstract: Single-cell RNA sequencing (scRNA-seq) data analysis is crucial for biological research, as it enables the precise characterization of cellular heterogeneity. However, manual manipulation of various tools to achieve desired outcomes can be labor-intensive for researchers. To address this, we introduce CellAgent (http://cell.agent4science.cn/), an LLM-driven multi-agent framework, specifically desi… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  34. arXiv:2407.09732  [pdf, other

    eess.AS cs.LG cs.SD

    Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

    Authors: Xilin Jiang, Yinghao Aaron Li, Adrian Nicolas Florea, Cong Han, Nima Mesgarani

    Abstract: It is too early to conclude that Mamba is a better alternative to transformers for speech before comparing Mamba with transformers in terms of both performance and efficiency in multiple speech-related tasks. To reach this conclusion, we propose and evaluate three models for three tasks: Mamba-TasNet for speech separation, ConMamba for speech recognition, and VALL-M for speech synthesis. We compar… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  35. arXiv:2407.09546  [pdf, other

    q-fin.TR cs.SI

    A Reflective LLM-based Agent to Guide Zero-shot Cryptocurrency Trading

    Authors: Yuan Li, Bingqiao Luo, Qian Wang, Nuo Chen, Xu Liu, Bingsheng He

    Abstract: The utilization of Large Language Models (LLMs) in financial trading has primarily been concentrated within the stock market, aiding in economic and financial decisions. Yet, the unique opportunities presented by the cryptocurrency market, noted for its on-chain data's transparency and the critical influence of off-chain signals like news, remain largely untapped by LLMs. This work aims to bridge… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  36. arXiv:2407.09427  [pdf, other

    astro-ph.SR astro-ph.GA cs.LG

    Flow-Based Generative Emulation of Grids of Stellar Evolutionary Models

    Authors: Marc Hon, Yaguang Li, Joel Ong

    Abstract: We present a flow-based generative approach to emulate grids of stellar evolutionary models. By interpreting the input parameters and output properties of these models as multi-dimensional probability distributions, we train conditional normalizing flows to learn and predict the complex relationships between grid inputs and outputs in the form of conditional joint distributions. Leveraging the exp… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 27 pages, 18 figures. Accepted for publication in ApJ. Code, animation, and interactive visualizations are available at https://github.com/mtyhon/modelflows/. Table 4 is also available as ancillary file attached to this submission

  37. arXiv:2407.09251  [pdf, other

    cs.LG cs.AI eess.SP

    Deep Adversarial Defense Against Multilevel-Lp Attacks

    Authors: Ren Wang, Yuxuan Li, Alfred Hero

    Abstract: Deep learning models have shown considerable vulnerability to adversarial attacks, particularly as attacker strategies become more sophisticated. While traditional adversarial training (AT) techniques offer some resilience, they often focus on defending against a single type of attack, e.g., the $\ell_\infty$-norm attack, which can fail for other types. This paper introduces a computationally effi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  38. arXiv:2407.09164  [pdf, other

    cs.CR cs.AI

    TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs

    Authors: Yuchen Yang, Hongwei Yao, Bingrun Yang, Yiling He, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren

    Abstract: Recently, code-oriented large language models (Code LLMs) have been widely and successfully used to simplify and facilitate code programming. With these tools, developers can easily generate desired complete functional codes based on incomplete code and natural language prompts. However, a few pioneering works revealed that these Code LLMs are also vulnerable, e.g., against backdoor and adversaria… ▽ More

    Submitted 15 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  39. arXiv:2407.08850  [pdf, other

    cs.HC cs.AI

    UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset

    Authors: Peitong Duan, Chin-yi Chen, Gang Li, Bjoern Hartmann, Yang Li

    Abstract: Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that aut… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  40. arXiv:2407.08801  [pdf, other

    cs.CV

    DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

    Authors: Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu, Meili Wang, Lizhuang Ma, Jian Chang, Jian Jun Zhang

    Abstract: Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  41. arXiv:2407.08713  [pdf, other

    cs.CL cs.AI

    GTA: A Benchmark for General Tool Agents

    Authors: Jize Wang, Zerun Ma, Yining Li, Songyang Zhang, Cailian Chen, Kai Chen, Xinyi Le

    Abstract: Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, fa… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Github repo: https://github.com/open-compass/GTA

  42. arXiv:2407.08683  [pdf, other

    cs.CV

    SEED-Story: Multimodal Long Story Generation with Large Language Model

    Authors: Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen

    Abstract: With the remarkable advancements in image generation and open-form text generation, the creation of interleaved image-text content has become an increasingly intriguing field. Multimodal story generation, characterized by producing narrative texts and vivid images in an interleaved manner, has emerged as a valuable and practical task with broad applications. However, this task poses significant ch… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Our models, codes and datasets are released in https://github.com/TencentARC/SEED-Story

  43. arXiv:2407.08634  [pdf, other

    cs.CV

    RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation

    Authors: Tao Jiang, Xinchen Xie, Yining Li

    Abstract: Whole-body pose estimation is a challenging task that requires simultaneous prediction of keypoints for the body, hands, face, and feet. Whole-body pose estimation aims to predict fine-grained pose information for the human body, including the face, torso, hands, and feet, which plays an important role in the study of human-centric perception and generation and in various applications. In this wor… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  44. arXiv:2407.08583  [pdf, other

    cs.AI cs.CV cs.LG

    The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

    Authors: Zhen Qin, Daoyuan Chen, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li, Shuiguang Deng

    Abstract: The rapid development of large language models (LLMs) has been witnessed in recent years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from text to a broader spectrum of domains, attracting widespread attention due to the broader range of application scenarios. As LLMs and MLLMs rely on vast amounts of model parameters and data to achieve emergent capabilities, the impo… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Ongoing work. 31 pages. Related materials are continually maintained and available at https://github.com/modelscope/data-juicer/blob/main/docs/awesome_llm_data.md

  45. arXiv:2407.08515  [pdf, other

    cs.CV cs.AI

    15M Multimodal Facial Image-Text Dataset

    Authors: Dawei Dai, YuTang Li, YingGe Liu, Mingming Jia, Zhang YuanHui, Guoyin Wang

    Abstract: Currently, image-text-driven multi-modal deep learning models have demonstrated their outstanding potential in many fields. In practice, tasks centered around facial images have broad application prospects. This paper presents \textbf{FaceCaption-15M}, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This d… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 15 pages, 8 figures

  46. arXiv:2407.08457  [pdf, other

    cs.CV

    Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending

    Authors: Delong Wu, Hao Zhu, Qi Zhang, You Li, Zhan Ma, Xun Cao

    Abstract: Implicit Neural Representation (INR) has become a popular method for representing visual signals (e.g., 2D images and 3D scenes), demonstrating promising results in various downstream applications. Given its potential as a medium for visual signals, exploring the development of a neural blending method that utilizes INRs is a natural progression. Neural blending involves merging two INRs to create… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: accepted by ECCV 2024

  47. arXiv:2407.08299  [pdf, other

    cs.SI eess.SY

    Evolving Network Modeling Driven by the Degree Increase and Decrease Mechanism

    Authors: Yuhan Li, Minyu Feng, Jürgen Kurths

    Abstract: Ever since the Barabási-Albert (BA) scale-free network has been proposed, network modeling has been studied intensively in light of the network growth and the preferential attachment (PA). However, numerous real systems are featured with a dynamic evolution including network reduction in addition to network growth. In this paper, we propose a novel mechanism for evolving networks from the perspect… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  48. arXiv:2407.08214  [pdf, other

    cs.LG cs.AI

    Towards stable training of parallel continual learning

    Authors: Li Yuepan, Fan Lyu, Yuyang Li, Wei Feng, Guangcan Liu, Fanhua Shang

    Abstract: Parallel Continual Learning (PCL) tasks investigate the training methods for continual learning with multi-source input, where data from different tasks are learned as they arrive. PCL offers high training efficiency and is well-suited for complex multi-source data systems, such as autonomous vehicles equipped with multiple sensors. However, at any time, multiple tasks need to be trained simultane… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  49. arXiv:2407.08136  [pdf, other

    cs.CV

    EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions

    Authors: Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma

    Abstract: The area of portrait image animation, propelled by audio input, has witnessed notable progress in the generation of lifelike and dynamic portraits. Conventional methods are limited to utilizing either audios or facial key points to drive images into videos, while they can yield satisfactory results, certain issues exist. For instance, methods driven solely by audios can be unstable at times due to… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  50. arXiv:2407.07959  [pdf, other

    cs.SE cs.AI

    Source Code Summarization in the Era of Large Language Models

    Authors: Weisong Sun, Yun Miao, Yuekang Li, Hongyu Zhang, Chunrong Fang, Yi Liu, Gelei Deng, Yang Liu, Zhenyu Chen

    Abstract: To support software developers in understanding and maintaining programs, various automatic (source) code summarization techniques have been proposed to generate a concise natural language summary (i.e., comment) for a given code snippet. Recently, the emergence of large language models (LLMs) has led to a great boost in the performance of code-related tasks. In this paper, we undertake a systemat… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Just accepted to the 47th International Conference on Software Engineering (ICSE 2025)

    MSC Class: 68-04 ACM Class: D.2.3; I.2.7