Skip to main content

Showing 1–50 of 375 results for author: Song, D

  1. arXiv:2407.13698  [pdf, other

    q-fin.ST cs.CE cs.LG

    International Trade Flow Prediction with Bilateral Trade Provisions

    Authors: Zijie Pan, Stepan Gordeev, Jiahui Zhao, Ziyi Meng, Caiwen Ding, Sandro Steinbach, Dongjin Song

    Abstract: This paper presents a novel methodology for predicting international bilateral trade flows, emphasizing the growing importance of Preferential Trade Agreements (PTAs) in the global trade landscape. Acknowledging the limitations of traditional models like the Gravity Model of Trade, this study introduces a two-stage approach combining explainable machine learning and factorization models. The first… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  2. arXiv:2407.12784  [pdf, other

    cs.LG cs.CR cs.IR

    AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

    Authors: Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, Bo Li

    Abstract: LLM agents have demonstrated remarkable performance across various applications, primarily due to their advanced capabilities in reasoning, utilizing external knowledge and tools, calling APIs, and executing actions to interact with environments. Current agents typically utilize a memory module or a retrieval-augmented generation (RAG) mechanism, retrieving past knowledge and instances with simila… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 22 pages, 13 figures, 7 tables

  3. arXiv:2407.12504  [pdf, other

    cs.CL

    Case2Code: Learning Inductive Reasoning with Synthetic Data

    Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

    Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  4. arXiv:2407.04929  [pdf, other

    cs.RO

    Toward Precise Robotic Weed Flaming Using a Mobile Manipulator with a Flamethrower

    Authors: Di Wang, Chengsong Hu, Shuangyu Xie, Joe Johnson, Hojun Ji, Yingtao Jiang, Muthukumar Bagavathiannan, Dezhen Song

    Abstract: Robotic weed flaming is a new and environmentally friendly approach to weed removal in the agricultural field. Using a mobile manipulator equipped with a flamethrower, we design a new system and algorithm to enable effective weed flaming, which requires robotic manipulation with a soft and deformable end effector, as the thermal coverage of the flame is affected by dynamic or unknown environmental… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: IROS 2024

  5. arXiv:2407.04787  [pdf, other

    cs.CL cs.AI cs.LG

    Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning

    Authors: Eric Pasewark, Kyle Montgomery, Kefei Duan, Dawn Song, Chenguang Wang

    Abstract: We present a new method for large language models to solve compositional tasks. Although they have shown strong performance on traditional language understanding tasks, large language models struggle to solve compositional tasks, where the solution depends on solving smaller instances of the same problem. We propose a natural approach to solve compositional tasks recursively. Our method, Re-Tuning… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024

  6. arXiv:2407.03374  [pdf

    cs.AI cs.SE eess.SP eess.SY

    An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges

    Authors: Laifa Tao, Shangyu Li, Haifei Liu, Qixuan Huang, Liang Ma, Guoao Ning, Yiling Chen, Yunlong Wu, Bin Li, Weiwei Zhang, Zhengduo Zhao, Wenchao Zhan, Wenyan Cao, Chao Wang, Hongmei Liu, Jian Ma, Mingliang Suo, Yujie Cheng, Yu Ding, Dengwei Song, Chen Lu

    Abstract: Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Larg… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2407.00717  [pdf, other

    cs.LG cs.AI eess.SY

    Learning System Dynamics without Forgetting

    Authors: Xikun Zhang, Dongjin Song, Yushan Jiang, Yixin Chen, Dacheng Tao

    Abstract: Predicting the trajectories of systems with unknown dynamics (\textit{i.e.} the governing rules) is crucial in various research fields, including physics and biology. This challenge has gathered significant attention from diverse communities. Most existing works focus on learning fixed system dynamics within one single system. However, real-world applications often involve multiple systems with di… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  8. arXiv:2406.18900  [pdf, other

    cs.CY cs.AI

    The Rise of Artificial Intelligence in Educational Measurement: Opportunities and Ethical Challenges

    Authors: Okan Bulut, Maggie Beiting-Parrish, Jodi M. Casabianca, Sharon C. Slater, Hong Jiao, Dan Song, Christopher M. Ormerod, Deborah Gbemisola Fabiyi, Rodica Ivan, Cole Walsh, Oscar Rios, Joshua Wilson, Seyma N. Yildirim-Erbasli, Tarid Wongvorachan, Joyce Xinle Liu, Bin Tan, Polina Morilova

    Abstract: The integration of artificial intelligence (AI) in educational measurement has revolutionized assessment methods, enabling automated scoring, rapid content analysis, and personalized feedback through machine learning and natural language processing. These advancements provide timely, consistent feedback and valuable insights into student performance, thereby enhancing the assessment experience. Ho… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 59 pages, 3 figures, a joint work of the Special Interest Group on Artificial Intelligence in Measurement and Education (AIME) from the National Council of Measurement in Education (NCME)

  9. arXiv:2406.17864  [pdf, other

    cs.CY cs.AI

    AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies

    Authors: Yi Zeng, Kevin Klyman, Andy Zhou, Yu Yang, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li

    Abstract: We present a comprehensive AI risk taxonomy derived from eight government policies from the European Union, United States, and China and 16 company policies worldwide, making a significant step towards establishing a unified language for generative AI safety evaluation. We identify 314 unique risk categories organized into a four-tiered taxonomy. At the highest level, this taxonomy encompasses Sys… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  10. arXiv:2406.17092  [pdf, other

    cs.CR cs.AI

    BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

    Authors: Yi Zeng, Weiyu Sun, Tran Ngoc Huynh, Dawn Song, Bo Li, Ruoxi Jia

    Abstract: Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions. The high dimensionality of potential triggers in the token space and the diverse range of malicious behaviors make this a critical challenge. We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relati… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  11. arXiv:2406.13951  [pdf, other

    cs.CV

    Towards the in-situ Trunk Identification and Length Measurement of Sea Cucumbers via Bézier Curve Modelling

    Authors: Shuaixin Liu, Kunqian Li, Yilin Ding, Kuangwei Xu, Qianli Jiang, Q. M. Jonathan Wu, Dalei Song

    Abstract: We introduce a novel vision-based framework for in-situ trunk identification and length measurement of sea cucumbers, which plays a crucial role in the monitoring of marine ranching resources and mechanized harvesting. To model sea cucumber trunk curves with varying degrees of bending, we utilize the parametric Bézier curve due to its computational simplicity, stability, and extensive range of tra… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  12. arXiv:2406.11011  [pdf, other

    cs.LG cs.CL stat.ML

    Data Shapley in One Training Run

    Authors: Jiachen T. Wang, Prateek Mittal, Dawn Song, Ruoxi Jia

    Abstract: Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing their application to large-scale models. Furthermore, they produce the same attribution score for any models produced by running the learning algorithm, m… ▽ More

    Submitted 29 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  13. arXiv:2406.09187  [pdf, other

    cs.LG

    GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

    Authors: Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, Bo Li

    Abstract: The rapid advancement of large language models (LLMs) has catalyzed the deployment of LLM-powered agents across numerous applications, raising new concerns regarding their safety and trustworthiness. Existing methods for enhancing the safety of LLMs are not directly transferable to LLM-powered agents due to their diverse objectives and output modalities. In this paper, we propose GuardAgent, the f… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  14. arXiv:2406.08731  [pdf, other

    cs.SE

    Where Do Large Language Models Fail When Generating Code?

    Authors: Zhijie Wang, Zijie Zhou, Da Song, Yuheng Huang, Shengmai Chen, Lei Ma, Tianyi Zhang

    Abstract: Large Language Models (LLMs) have shown great potential in code generation. However, current LLMs still cannot reliably generate correct code. Moreover, it is unclear what kinds of code generation errors LLMs can make. To address this, we conducted an empirical study to analyze incorrect code snippets generated by six popular LLMs on the HumanEval dataset. We analyzed these errors alongside two di… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Extended from our MAPS 2023 paper. Our data is available at https://llm-code-errors.cs.purdue.edu

  15. arXiv:2406.04531  [pdf, other

    cs.SE

    TESTEVAL: Benchmarking Large Language Models for Test Case Generation

    Authors: Wenhan Wang, Chenyuan Yang, Zhijie Wang, Yuheng Huang, Zhaoyang Chu, Da Song, Lingming Zhang, An Ran Chen, Lei Ma

    Abstract: Testing plays a crucial role in the software development cycle, enabling the detection of bugs, vulnerabilities, and other undesirable behaviors. To perform software testing, testers need to write code snippets that execute the program under test. Recently, researchers have recognized the potential of large language models (LLMs) in software testing. However, there remains a lack of fair compariso… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  16. arXiv:2406.01879  [pdf, other

    cs.CL

    Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check

    Authors: Haiming Wu, Hanqing Zhang, Richeng Xuan, Dawei Song

    Abstract: Chinese Spelling Check (CSC) aims to detect and correct potentially misspelled characters in Chinese sentences. Naturally, it involves the detection and correction subtasks, which interact with each other dynamically. Such interactions are bi-directional, i.e., the detection result would help reduce the risk of over-correction and under-correction while the knowledge learnt from correction would h… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures

  17. arXiv:2405.19524  [pdf, other

    cs.CR cs.AI

    AI Risk Management Should Incorporate Both Safety and Security

    Authors: Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal

    Abstract: The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this pape… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  18. arXiv:2405.19265  [pdf, other

    cs.CL

    AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data

    Authors: Zifan Song, Yudong Wang, Wenwei Zhang, Kuikun Liu, Chengqi Lyu, Demin Song, Qipeng Guo, Hang Yan, Dahua Lin, Kai Chen, Cairong Zhao

    Abstract: Open-source Large Language Models (LLMs) and their specialized variants, particularly Code LLMs, have recently delivered impressive performance. However, previous Code LLMs are typically fine-tuned on single-source data with limited quality and diversity, which may insufficiently elicit the potential of pre-trained Code LLMs. In this paper, we present AlchemistCoder, a series of Code LLMs with enh… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Preprint with 20 pages and 20 figures. Source code and models at https://github.com/InternLM/AlchemistCoder

  19. arXiv:2405.16783  [pdf, other

    cs.CR cs.AI cs.LG

    TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models

    Authors: Yuzhou. Nie, Yanting. Wang, Jinyuan. Jia, Michael J. De Lucia, Nathaniel D. Bastian, Wenbo. Guo, Dawn. Song

    Abstract: One key challenge in backdoor attacks against large foundation models is the resource limits. Backdoor attacks usually require retraining the target model, which is impractical for very large foundation models. Existing backdoor attacks are mainly designed for supervised classifiers or small foundation models (e.g., BERT). None of these attacks has successfully compromised a very large foundation… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  20. arXiv:2405.07260  [pdf

    cs.LG cs.AI eess.SP

    A Supervised Information Enhanced Multi-Granularity Contrastive Learning Framework for EEG Based Emotion Recognition

    Authors: Xiang Li, Jian Song, Zhigang Zhao, Chunxiao Wang, Dawei Song, Bin Hu

    Abstract: This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 5 pages, 3 figures, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  21. arXiv:2404.18532  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    MileBench: Benchmarking MLLMs in Long Context

    Authors: Dingjie Song, Shunian Chen, Guiming Hardy Chen, Fei Yu, Xiang Wan, Benyou Wang

    Abstract: Despite the advancements and impressive performance of Multimodal Large Language Models (MLLMs) on benchmarks, their effectiveness in real-world, long-context, and multi-image tasks is unclear due to the benchmarks' limited scope. Existing benchmarks often focus on single-image and short-text samples, and when assessing multi-image tasks, they either limit the image count or focus on specific task… ▽ More

    Submitted 15 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: 31 pages, 13 figures, 14 tables; We add results of GPT-4o in this version

  22. arXiv:2404.14897  [pdf, other

    cs.CL cs.AI

    Beyond the Speculative Game: A Survey of Speculative Execution in Large Language Models

    Authors: Chen Zhang, Zhuorui Liu, Dawei Song

    Abstract: With the increasingly giant scales of (causal) large language models (LLMs), the inference efficiency comes as one of the core concerns along the improved performance. In contrast to the memory footprint, the latency bottleneck seems to be of greater importance as there can be billions of requests to a LLM (e.g., GPT-4) per day. The bottleneck is mainly due to the autoregressive innateness of LLMs… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures, 1 table, rejected from IJCAI 2024, revision in progress

  23. arXiv:2404.13161  [pdf, other

    cs.CR cs.LG

    CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

    Authors: Manish Bhatt, Sahana Chennabasappa, Yue Li, Cyrus Nikolaidis, Daniel Song, Shengye Wan, Faizan Ahmad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil, David Molnar, Spencer Whitman, Joshua Saxe

    Abstract: Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities. We introduce two new areas for testing: prompt injection and code interpreter abuse. We evaluated multiple state-of-the-art (SOTA) LLMs, including GPT-4, Mistral,… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  24. arXiv:2404.08517  [pdf, other

    cs.SE cs.AI cs.CL cs.CR cs.LG

    Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward

    Authors: Xuan Xie, Jiayang Song, Zhehua Zhou, Yuheng Huang, Da Song, Lei Ma

    Abstract: While Large Language Models (LLMs) have seen widespread applications across numerous fields, their limited interpretability poses concerns regarding their safe operations from multiple aspects, e.g., truthfulness, robustness, and fairness. Recent research has started developing quality assurance methods for LLMs, introducing techniques such as offline detector-based or uncertainty estimation metho… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  25. arXiv:2404.03187  [pdf, other

    cs.CV

    AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales

    Authors: Tianrui Guan, Ruiqi Xian, Xijun Wang, Xiyang Wu, Mohamed Elnoor, Daeun Song, Dinesh Manocha

    Abstract: We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps. AGL-NET tackles two critical challenges: bridging the representation gap between image and points modalities for robust feature matching, and handling inherent scale discrepancies between global view and local view. To address these challenges, AGL-NET leverages a unified network… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  26. arXiv:2404.02935  [pdf, other

    cs.CL cs.AI cs.LG

    KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking

    Authors: Jiawei Zhang, Chejian Xu, Yu Gai, Freddy Lecue, Dawn Song, Bo Li

    Abstract: This paper introduces KnowHalu, a novel approach for detecting hallucinations in text generated by large language models (LLMs), utilizing step-wise reasoning, multi-formulation query, multi-form knowledge for factual checking, and fusion-based detection mechanism. As LLMs are increasingly applied across various domains, ensuring that their outputs are not hallucinated is critical. Recognizing the… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  27. arXiv:2404.00210  [pdf, other

    cs.RO

    VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models

    Authors: Daeun Song, Jing Liang, Amirreza Payandeh, Xuesu Xiao, Dinesh Manocha

    Abstract: We propose VLM-Social-Nav, a novel Vision-Language Model (VLM) based navigation approach to compute a robot's motion in human-centered environments. Our goal is to make real-time decisions on robot actions that are socially compliant with human expectations. We utilize a perception model to detect important social entities and prompt a VLM to generate guidance for socially compliant robot behavior… ▽ More

    Submitted 7 July, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

  28. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  29. arXiv:2403.15447  [pdf, other

    cs.CL cs.AI

    Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

    Authors: Junyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian Bartoldson, Ajay Jaiswal, Kaidi Xu, Bhavya Kailkhura, Dan Hendrycks, Dawn Song, Zhangyang Wang, Bo Li

    Abstract: Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation o… ▽ More

    Submitted 4 June, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted to ICML'24

  30. Foundation Models for Time Series Analysis: A Tutorial and Survey

    Authors: Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, Qingsong Wen

    Abstract: Time series analysis stands as a focal point within the data mining community, serving as a cornerstone for extracting valuable insights crucial to a myriad of real-world applications. Recent advances in Foundation Models (FMs) have fundamentally reshaped the paradigm of model design for time series analysis, boosting various downstream tasks in practice. These innovative approaches often leverage… ▽ More

    Submitted 18 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'24)

  31. arXiv:2403.13031  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

    Authors: Zhuowen Yuan, Zidi Xiong, Yi Zeng, Ning Yu, Ruoxi Jia, Dawn Song, Bo Li

    Abstract: Recent advancements in Large Language Models (LLMs) have showcased remarkable capabilities across various tasks in different domains. However, the emergence of biases and the potential for generating harmful content in LLMs, particularly under malicious inputs, pose significant challenges. Current mitigation strategies, while effective, are not resilient under adversarial attacks. This paper intro… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  32. arXiv:2403.10499  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study

    Authors: Chenguang Wang, Ruoxi Jia, Xin Liu, Dawn Song

    Abstract: Pre-training image representations from the raw text about images enables zero-shot vision transfer to downstream tasks. Through pre-training on millions of samples collected from the internet, multimodal foundation models, such as CLIP, produce state-of-the-art zero-shot results that often reach competitiveness with fully supervised methods without the need for task-specific training. Besides the… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  33. arXiv:2403.09953  [pdf, other

    cs.LG

    Online GNN Evaluation Under Test-time Graph Distribution Shifts

    Authors: Xin Zheng, Dongjin Song, Qingsong Wen, Bo Du, Shirui Pan

    Abstract: Evaluating the performance of a well-trained GNN model on real-world graphs is a pivotal step for reliable GNN online deployment and serving. Due to a lack of test node labels and unknown potential training-test graph data distribution shifts, conventional model evaluation encounters limitations in calculating performance metrics (e.g., test error) and measuring graph data-level discrepancies, par… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by ICLR-2024

  34. arXiv:2403.09900  [pdf, other

    cs.RO

    DTG : Diffusion-based Trajectory Generation for Mapless Global Navigation

    Authors: Jing Liang, Amirreza Payandeh, Daeun Song, Xuesu Xiao, Dinesh Manocha

    Abstract: We present a novel end-to-end diffusion-based trajectory generation method, DTG, for mapless global navigation in challenging outdoor scenarios with occlusions and unstructured off-road features like grass, buildings, bushes, etc. Given a distant goal, our approach computes a trajectory that satisfies the following goals: (1) minimize the travel distance to the goal; (2) maximize the traversabilit… ▽ More

    Submitted 24 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 10 pages

  35. arXiv:2403.08453  [pdf, other

    cs.CV

    Better Fit: Accommodate Variations in Clothing Types for Virtual Try-on

    Authors: Xuanpu Zhang, Dan Song, Pengxin Zhan, Qingguo Chen, Kuilong Liu, Anan Liu

    Abstract: Image-based virtual try-on aims to transfer target in-shop clothing to a dressed model image, the objectives of which are totally taking off original clothing while preserving the contents outside of the try-on area, naturally wearing target clothing and correctly inpainting the gap between target clothing and original clothing. Tremendous efforts have been made to facilitate this popular research… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  36. arXiv:2403.07918  [pdf, other

    cs.CY cs.AI cs.LG

    On the Societal Impact of Open Foundation Models

    Authors: Sayash Kapoor, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Aspen Hopkins, Kevin Bankston, Stella Biderman, Miranda Bogen, Rumman Chowdhury, Alex Engler, Peter Henderson, Yacine Jernite, Seth Lazar, Stefano Maffulli, Alondra Nelson, Joelle Pineau, Aviya Skowron, Dawn Song, Victor Storchan, Daniel Zhang, Daniel E. Ho, Percy Liang, Arvind Narayanan

    Abstract: Foundation models are powerful technologies: how they are released publicly directly shapes their societal impact. In this position paper, we focus on open foundation models, defined here as those with broadly available model weights (e.g. Llama 2, Stable Diffusion XL). We identify five distinctive properties (e.g. greater customizability, poor monitoring) of open foundation models that lead to bo… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

  37. arXiv:2403.05798  [pdf, other

    cs.LG

    $\textbf{S}^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting

    Authors: Zijie Pan, Yushan Jiang, Sahil Garg, Anderson Schneider, Yuriy Nevmyvaka, Dongjin Song

    Abstract: Recently, there has been a growing interest in leveraging pre-trained large language models (LLMs) for various time series applications. However, the semantic space of LLMs, established through the pre-training, is still underexplored and may help yield more distinctive and informative representations to facilitate time series forecasting. To this end, we propose Semantic Space Informed Prompt lea… ▽ More

    Submitted 7 July, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  38. arXiv:2403.05796  [pdf, other

    cs.CV

    Weakly Supervised Change Detection via Knowledge Distillation and Multiscale Sigmoid Inference

    Authors: Binghao Lu, Caiwen Ding, Jinbo Bi, Dongjin Song

    Abstract: Change detection, which aims to detect spatial changes from a pair of multi-temporal images due to natural or man-made causes, has been widely applied in remote sensing, disaster management, urban management, etc. Most existing change detection approaches, however, are fully supervised and require labor-intensive pixel-level labels. To address this, we develop a novel weakly supervised change dete… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: code is available: https://github.com/BinghaoLu/KD-MSI

  39. PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement

    Authors: Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, Tianyi Zhang

    Abstract: The recent advancements in Generative AI have significantly advanced the field of text-to-image generation. The state-of-the-art text-to-image model, Stable Diffusion, is now capable of synthesizing high-quality images with a strong sense of aesthetics. Crafting text prompts that align with the model's interpretation and the user's intent thus becomes crucial. However, prompting remains challengin… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: To appear in the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  40. arXiv:2403.01999  [pdf, other

    cs.CL

    LLM-Oriented Retrieval Tuner

    Authors: Si Sun, Hanqing Zhang, Zhiyuan Liu, Jie Bao, Dawei Song

    Abstract: Dense Retrieval (DR) is now considered as a promising tool to enhance the memorization capacity of Large Language Models (LLM) such as GPT3 and GPT-4 by incorporating external memories. However, due to the paradigm discrepancy between text generation of LLM and DR, it is still an open challenge to integrate the retrieval and generation tasks in a shared LLM. In this paper, we propose an efficient… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 16 pages, 8 figures, 5 tables

  41. arXiv:2402.13013  [pdf, other

    cs.CL

    Code Needs Comments: Enhancing Code LLMs with Comment Augmentation

    Authors: Demin Song, Honglin Guo, Yunhua Zhou, Shuhao Xing, Yudong Wang, Zifan Song, Wenwei Zhang, Qipeng Guo, Hang Yan, Xipeng Qiu, Dahua Lin

    Abstract: The programming skill is one crucial ability for Large Language Models (LLMs), necessitating a deep understanding of programming languages (PLs) and their correlation with natural languages (NLs). We examine the impact of pre-training data on code-focused LLMs' performance by assessing the comment density as a measure of PL-NL alignment. Given the scarcity of code-comment aligned data in pre-train… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  42. arXiv:2402.12722  [pdf, other

    cs.LG

    Structural Knowledge Informed Continual Multivariate Time Series Forecasting

    Authors: Zijie Pan, Yushan Jiang, Dongjin Song, Sahil Garg, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka

    Abstract: Recent studies in multivariate time series (MTS) forecasting reveal that explicitly modeling the hidden dependencies among different time series can yield promising forecasting performance and reliable explanations. However, modeling variable dependencies remains underexplored when MTS is continuously accumulated under different regimes (stages). Due to the potential distribution and dependency di… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  43. arXiv:2402.12590  [pdf, other

    cs.CL cs.CY

    Evolving AI Collectives to Enhance Human Diversity and Enable Self-Regulation

    Authors: Shiyang Lai, Yujin Potter, Junsol Kim, Richard Zhuang, Dawn Song, James Evans

    Abstract: Large language model behavior is shaped by the language of those with whom they interact. This capacity and their increasing prevalence online portend that they will intentionally or unintentionally "program" one another and form emergent AI subjectivities, relationships, and collectives. Here, we call upon the research community to investigate these "societies" of interacting artificial intellige… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  44. arXiv:2402.11565  [pdf, other

    cs.LG cs.AI

    Continual Learning on Graphs: Challenges, Solutions, and Opportunities

    Authors: Xikun Zhang, Dongjin Song, Dacheng Tao

    Abstract: Continual learning on graph data has recently attracted paramount attention for its aim to resolve the catastrophic forgetting problem on existing tasks while adapting the sequentially updated model to newly emerged graph tasks. While there have been efforts to summarize progress on continual learning research over Euclidean data, e.g., images and texts, a systematic review of progress in continua… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  45. arXiv:2402.03182  [pdf, other

    cs.LG

    Empowering Time Series Analysis with Large Language Models: A Survey

    Authors: Yushan Jiang, Zijie Pan, Xikun Zhang, Sahil Garg, Anderson Schneider, Yuriy Nevmyvaka, Dongjin Song

    Abstract: Recently, remarkable progress has been made over large language models (LLMs), demonstrating their unprecedented capability in varieties of natural language tasks. However, completely training a large general-purpose model from the scratch is challenging for time series analysis, due to the large volumes and varieties of time series data, as well as the non-stationarity that leads to concept drift… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  46. arXiv:2402.03181  [pdf, other

    cs.AI cs.CL cs.IR

    C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

    Authors: Mintong Kang, Nezihe Merve Gürel, Ning Yu, Dawn Song, Bo Li

    Abstract: Despite the impressive capabilities of large language models (LLMs) across diverse applications, they still suffer from trustworthiness issues, such as hallucinations and misalignments. Retrieval-augmented language models (RAG) have been proposed to enhance the credibility of generations by grounding external knowledge, but the theoretical understandings of their generation risks remains unexplore… ▽ More

    Submitted 4 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024

  47. arXiv:2401.18057  [pdf, other

    cs.LG

    Rank Supervised Contrastive Learning for Time Series Classification

    Authors: Qianying Ren, Dongsheng Luo, Dongjin Song

    Abstract: Recently, various contrastive learning techniques have been developed to categorize time series data and exhibit promising performance. A general paradigm is to utilize appropriate augmentations and construct feasible positive samples such that the encoder can yield robust and discriminative representations by mapping similar data points closer together in the feature space while pushing dissimila… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  48. Topology-aware Embedding Memory for Continual Learning on Expanding Networks

    Authors: Xikun Zhang, Dongjin Song, Yixin Chen, Dacheng Tao

    Abstract: Memory replay based techniques have shown great success for continual learning with incrementally accumulated Euclidean data. Directly applying them to continually expanding networks, however, leads to the potential memory explosion problem due to the need to buffer representative nodes and their associated topological neighborhood structures. To this end, we systematically analyze the key challen… ▽ More

    Submitted 30 June, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: This paper has been accepted by KDD 2024

  49. arXiv:2401.12520  [pdf, other

    cs.CL cs.IR cs.LG

    Key Information Retrieval to Classify the Unstructured Data Content of Preferential Trade Agreements

    Authors: Jiahui Zhao, Ziyi Meng, Stepan Gordeev, Zijie Pan, Dongjin Song, Sandro Steinbach, Caiwen Ding

    Abstract: With the rapid proliferation of textual data, predicting long texts has emerged as a significant challenge in the domain of natural language processing. Traditional text prediction methods encounter substantial difficulties when grappling with long texts, primarily due to the presence of redundant and irrelevant information, which impedes the model's capacity to capture pivotal insights from the t… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: AI4TS Workshop@AAAI 2024 accepted publication

  50. arXiv:2401.12292  [pdf, other

    cs.CL cs.AI

    GRATH: Gradual Self-Truthifying for Large Language Models

    Authors: Weixin Chen, Dawn Song, Bo Li

    Abstract: Truthfulness is paramount for large language models (LLMs) as they are increasingly deployed in real-world applications. However, existing LLMs still struggle with generating truthful content, as evidenced by their modest performance on benchmarks like TruthfulQA. To address this issue, we propose GRAdual self-truTHifying (GRATH), a novel post-processing method to enhance truthfulness of LLMs. GRA… ▽ More

    Submitted 31 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.