Skip to main content

Showing 1–50 of 109 results for author: Mu, Y

  1. arXiv:2407.13164  [pdf, other

    cs.CL cs.AI

    Translate-and-Revise: Boosting Large Language Models for Constrained Translation

    Authors: Pengcheng Huang, Yongyu Mu, Yuzhang Wu, Bei Li, Chunyang Xiao, Tong Xiao, Jingbo Zhu

    Abstract: Imposing constraints on machine translation systems presents a challenging issue because these systems are not trained to make use of constraints in generating adequate, fluent translations. In this paper, we leverage the capabilities of large language models (LLMs) for constrained translation, given that LLMs can easily adapt to this task by taking translation instructions and constraints as prom… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 16 pages

  2. arXiv:2407.07580  [pdf, other

    cs.CV

    InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior

    Authors: Chenguo Lin, Yuchen Lin, Panwang Pan, Xuanyang Zhang, Yadong Mu

    Abstract: Comprehending natural language instructions is a charming property for both 2D and 3D layout synthesis systems. Existing methods implicitly model object joint distributions and express object relations, hindering generation's controllability. We introduce InstructLayout, a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity… ▽ More

    Submitted 10 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: This paper is an extension of ICLR 2024 "InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior". arXiv admin note: substantial text overlap with arXiv:2402.04717

  3. arXiv:2407.04237  [pdf, other

    cs.CV cs.GR

    GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction

    Authors: Yuxuan Mu, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofeng Wu, Songcen Xu, Peng Dai, Youliang Yan, Li Cheng

    Abstract: We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted for ECCV 2024

  4. arXiv:2407.00632  [pdf, other

    cs.RO cs.CL cs.CV cs.MA

    CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations

    Authors: Pengying Wu, Yao Mu, Kangjie Zhou, Ji Ma, Junting Chen, Chang Liu

    Abstract: Visual navigation tasks are critical for household service robots. As these tasks become increasingly complex, effective communication and collaboration among multiple robots become imperative to ensure successful completion. In recent years, large language models (LLMs) have exhibited remarkable comprehension and planning abilities in the context of embodied agents. However, their application in… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted to the RSS 2024 Workshop: GROUND

  5. arXiv:2406.17795  [pdf, other

    cs.CV cs.GR

    RACon: Retrieval-Augmented Simulated Character Locomotion Control

    Authors: Yuxuan Mu, Shihao Zou, Kangning Yin, Zheng Tian, Li Cheng, Weinan Zhang, Jun Wang

    Abstract: In computer animation, driving a simulated character with lifelike motion is challenging. Current generative models, though able to generalize to diverse motions, often pose challenges to the responsiveness of end-user control. To address these issues, we introduce RACon: Retrieval-Augmented Simulated Character Locomotion Control. Our end-to-end hierarchical reinforcement learning method utilizes… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted in ICME2024 for oral presentation

  6. arXiv:2406.15178  [pdf, other

    cs.CL

    Hybrid Alignment Training for Large Language Models

    Authors: Chenglong Wang, Hang Zhou, Kaiyan Chang, Bei Li, Yongyu Mu, Tong Xiao, Tongran Liu, Jingbo Zhu

    Abstract: Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and human-preference alignment. However, aligning LLMs with these objectives in sequence suffers from an inherent problem: the objectives may conflict, and the LLMs cannot guara… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: accepted by ACL (Findings) 2024

  7. arXiv:2406.12459  [pdf, other

    cs.CV

    HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

    Authors: Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu

    Abstract: Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In part… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.09953  [pdf, other

    cs.RO cs.AI

    DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning

    Authors: Zeyu Gao, Yao Mu, Jinye Qu, Mengkang Hu, Lingyue Guo, Ping Luo, Yanfeng Lu

    Abstract: Dual-arm robots offer enhanced versatility and efficiency over single-arm counterparts by enabling concurrent manipulation of multiple objects or cooperative execution of tasks using both arms. However, effectively coordinating the two arms for complex long-horizon tasks remains a significant challenge. Existing task planning methods predominantly focus on single-arm robots or rely on predefined b… ▽ More

    Submitted 30 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 46 pages, 13 figures

  9. arXiv:2406.09899  [pdf, other

    cs.LG cs.AI

    Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem

    Authors: Zhentao Tan, Yadong Mu

    Abstract: Recently various optimization problems, such as Mixed Integer Linear Programming Problems (MILPs), have undergone comprehensive investigation, leveraging the capabilities of machine learning. This work focuses on learning-based solutions for efficiently solving the Quadratic Assignment Problem (QAPs), which stands as a formidable challenge in combinatorial optimization. While many instances of sim… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  10. arXiv:2405.20795  [pdf, other

    cs.CV cs.AI

    InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced Visual Understanding

    Authors: Huaxiang Zhang, Yaojia Mu, Guo-Niu Zhu, Zhongxue Gan

    Abstract: Accurate visual understanding is imperative for advancing autonomous systems and intelligent robots. Despite the powerful capabilities of vision-language models (VLMs) in processing complex visual scenes, precisely recognizing obscured or ambiguously presented visual elements remains challenging. To tackle such issues, this paper proposes InsightSee, a multi-agent framework to enhance VLMs' interp… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  11. arXiv:2405.14677  [pdf, other

    cs.CV cs.LG

    RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

    Authors: Zhicheng Sun, Zhenhao Yang, Yang Jin, Haozhe Chi, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Di Zhang, Yang Song, Kun Gai, Yadong Mu

    Abstract: Customizing diffusion models to generate identity-preserving images from user-provided reference images is an intriguing new problem. The prevalent approaches typically require training on extensive domain-specific images to achieve identity preservation, which lacks flexibility across different use cases. To address this issue, we exploit classifier guidance, a training-free technique that steers… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  12. arXiv:2405.09811  [pdf, other

    cs.GT

    A Payoff-Based Policy Gradient Method in Stochastic Games with Long-Run Average Payoffs

    Authors: Junyue Zhang, Yifen Mu

    Abstract: Despite the significant potential for various applications, stochastic games with long-run average payoffs have received limited scholarly attention, particularly concerning the development of learning algorithms for them due to the challenges of mathematical analysis. In this paper, we study the stochastic games with long-run average payoffs and present an equivalent formulation for individual pa… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  13. arXiv:2405.07162  [pdf, other

    cs.RO cs.AI

    Learning Reward for Robot Skills Using Large Language Models via Self-Alignment

    Authors: Yuwei Zeng, Yao Mu, Lin Shao

    Abstract: Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to lear… ▽ More

    Submitted 15 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  14. arXiv:2405.00611  [pdf, other

    cs.CL

    Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

    Authors: Yida Mu, Peizhen Bai, Kalina Bontcheva, Xingyi Song

    Abstract: Large language models (LLMs) with their strong zero-shot topic extraction capabilities offer an alternative to probabilistic topic modelling and closed-set topic classification approaches. As zero-shot topic extractors, LLMs are expected to understand human instructions to generate relevant and non-hallucinated topics based on the given documents. However, LLM-based topic modelling approaches ofte… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  15. arXiv:2404.16423  [pdf, other

    cs.CV cs.RO

    Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images

    Authors: Hongyu Yan, Yadong Mu

    Abstract: Image-guided object assembly represents a burgeoning research topic in computer vision. This paper introduces a novel task: translating multi-view images of a structural 3D model (for example, one constructed with building blocks drawn from a 3D-object library) into a detailed sequence of assembly instructions executable by a robotic arm. Fed with multi-view images of the target 3D model for repli… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  16. arXiv:2404.11375  [pdf, other

    cs.CV cs.MM

    Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

    Authors: Xinghan Wang, Zixi Kang, Yadong Mu

    Abstract: Human motion understanding is a fundamental task with diverse practical applications, facilitated by the availability of large-scale motion capture datasets. Recent studies focus on text-motion tasks, such as text-based motion generation, editing and question answering. In this study, we introduce the novel task of text-based human motion grounding (THMG), aimed at precisely localizing temporal se… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  17. arXiv:2404.09392  [pdf, ps, other

    cs.IT cs.LG cs.NI eess.SP

    An Autoencoder-Based Constellation Design for AirComp in Wireless Federated Learning

    Authors: Yujia Mu, Xizixiang Wei, Cong Shen

    Abstract: Wireless federated learning (FL) relies on efficient uplink communications to aggregate model updates across distributed edge devices. Over-the-air computation (a.k.a. AirComp) has emerged as a promising approach for addressing the scalability challenge of FL over wireless links with limited communication resources. Unlike conventional methods, AirComp allows multiple edge devices to transmit upli… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  18. arXiv:2403.16248  [pdf, other

    cs.CL

    Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling

    Authors: Yida Mu, Chun Dong, Kalina Bontcheva, Xingyi Song

    Abstract: Topic modelling, as a well-established unsupervised technique, has found extensive use in automatically detecting significant topics within a corpus of documents. However, classic topic modelling approaches (e.g., LDA) have certain drawbacks, such as the lack of semantic understanding and the presence of overlapping topics. In this work, we investigate the untapped potential of large language mode… ▽ More

    Submitted 26 March, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  19. arXiv:2403.13365  [pdf, other

    cs.RO cs.CV

    ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics

    Authors: Qiaojun Yu, Ce Hao, Junbo Wang, Wenhai Liu, Liu Liu, Yao Mu, Yang You, Hengxu Yan, Cewu Lu

    Abstract: Robotic manipulation in everyday scenarios, especially in unstructured environments, requires skills in pose-aware object manipulation (POM), which adapts robots' grasping and handling according to an object's 6D pose. Recognizing an object's position and orientation is crucial for effective manipulation. For example, if a mug is lying on its side, it's more effective to grasp it by the rim rather… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  20. arXiv:2403.09073  [pdf, other

    cs.CL

    Large Language Models are Parallel Multilingual Learners

    Authors: Yongyu Mu, Peinan Feng, Zhiquan Cao, Yuzhang Wu, Bei Li, Chenglong Wang, Tong Xiao, Kai Song, Tongran Liu, Chunliang Zhang, Jingbo Zhu

    Abstract: In this study, we reveal an in-context learning (ICL) capability of multilingual large language models (LLMs): by translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities. To test this capability, we design extensive experiments encompassing 8 typical datasets, 7 languages and 8 state-of-th… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Working in process

  21. arXiv:2402.19007  [pdf, other

    cs.CV cs.RO

    DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

    Authors: Ji Ma, Hongming Dai, Yao Mu, Pengying Wu, Hao Wang, Xiaowei Chi, Yang Fei, Shanghang Zhang, Chang Liu

    Abstract: Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object attribute diversity, and scene texts, thus exhibiting noticeable discrepancies from real-… ▽ More

    Submitted 8 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: This version of the paper has been accepted for publication in IEEE Robotics and Automation Letters (RA-L)

  22. arXiv:2402.16117  [pdf, other

    cs.RO cs.AI cs.CV

    RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

    Authors: Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo

    Abstract: Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  23. arXiv:2402.14623  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

    Authors: Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding, Ping Luo

    Abstract: Rapid progress in high-level task planning and code generation for open-world robot manipulation has been witnessed in Embodied AI. However, previous studies put much effort into general common sense reasoning and task planning capabilities of large-scale language or multi-modal models, relatively little effort on ensuring the deployability of generated code on real robots, and other fundamental c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 10 pages of main paper, 4 pages of appendix; 10 figures in main paper, 3 figures in appendix

    ACM Class: I.2.7; I.2.8; I.2.9; I.2.10

  24. arXiv:2402.04717  [pdf, other

    cs.CV

    InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

    Authors: Chenguo Lin, Yadong Mu

    Abstract: Comprehending natural language instructions is a charming property for 3D indoor scene synthesis systems. Existing methods directly model object joint distributions and express object relations implicitly within a scene, thereby hindering the controllability of generation. We introduce InstructScene, a novel generative framework that integrates a semantic graph prior and a layout decoder to improv… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR 2024 for spotlight presentation; Project page: https://chenguolin.github.io/projects/InstructScene

  25. arXiv:2402.03161  [pdf, other

    cs.CV cs.CL

    Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

    Authors: Yang Jin, Zhicheng Sun, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang Song, Kun Gai, Yadong Mu

    Abstract: In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos. Compared to static images, video poses unique challenges for effective large-scale pre-training due to the modeling of its spatiotemporal dynamics. In this paper, we address such limitations in video-language pre-training… ▽ More

    Submitted 3 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  26. arXiv:2401.13505  [pdf, other

    cs.CV

    Generative Human Motion Stylization in Latent Space

    Authors: Chuan Guo, Yuxuan Mu, Xinxin Zuo, Peng Dai, Youliang Yan, Juwei Lu, Li Cheng

    Abstract: Human motion stylization aims to revise the style of an input motion while keeping its content unaltered. Unlike existing works that operate directly in pose space, we leverage the latent space of pretrained autoencoders as a more expressive and robust representation for motion extraction and infusion. Building upon this, we present a novel generative model that produces diverse stylization result… ▽ More

    Submitted 23 February, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted for ICLR2024

  27. arXiv:2401.02695  [pdf, other

    cs.RO cs.CV

    VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

    Authors: Pengying Wu, Yao Mu, Bingxian Wu, Yi Hou, Ji Ma, Shanghang Zhang, Chang Liu

    Abstract: In the realm of household robotics, the Zero-Shot Object Navigation (ZSON) task empowers agents to adeptly traverse unfamiliar environments and locate objects from novel categories without prior explicit training. This paper introduces VoroNav, a novel semantic exploration framework that proposes the Reduced Voronoi Graph to extract exploratory paths and planning nodes from a semantic map construc… ▽ More

    Submitted 6 February, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: 18 pages, 13 figures

  28. arXiv:2312.11598  [pdf, other

    cs.RO cs.CV cs.LG

    SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

    Authors: Zhixuan Liang, Yao Mu, Hengbo Ma, Masayoshi Tomizuka, Mingyu Ding, Ping Luo

    Abstract: Diffusion models have demonstrated strong potential for robotic trajectory planning. However, generating coherent trajectories from high-level instructions remains challenging, especially for long-range composition tasks requiring multiple sequential skills. We propose SkillDiffuser, an end-to-end hierarchical planning framework integrating interpretable skill learning with conditional diffusion p… ▽ More

    Submitted 28 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024. Camera ready version. Project page: https://skilldiffuser.github.io/

  29. arXiv:2312.00063  [pdf, other

    cs.CV

    MoMask: Generative Masked Modeling of 3D Human Motions

    Authors: Chuan Guo, Yuxuan Mu, Muhammad Gohar Javed, Sen Wang, Li Cheng

    Abstract: We introduce MoMask, a novel masked modeling framework for text-driven 3D human motion generation. In MoMask, a hierarchical quantization scheme is employed to represent human motion as multi-layer discrete motion tokens with high-fidelity details. Starting at the base layer, with a sequence of motion tokens obtained by vector quantization, the residual tokens of increasing orders are derived and… ▽ More

    Submitted 29 November, 2023; originally announced December 2023.

    Comments: Project webpage: https://ericguo5513.github.io/momask/

  30. arXiv:2311.05265  [pdf, other

    cs.CL cs.AI cs.LG

    Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

    Authors: Ben Wu, Yue Li, Yida Mu, Carolina Scarton, Kalina Bontcheva, Xingyi Song

    Abstract: In this paper, we address the limitations of the common data annotation and training methods for objective single-label classification tasks. Typically, when annotating such tasks annotators are only asked to provide a single label for each sample and annotator disagreement is discarded when a final hard label is decided through majority voting. We challenge this traditional approach, acknowledgin… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023 (Findings)

  31. arXiv:2310.08582  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

    Authors: Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo

    Abstract: This paper studies close-loop task planning, which refers to the process of generating a sequence of skills (a plan) to accomplish a specific goal while adapting the plan based on real-time observations. Recently, prompting Large Language Models (LLMs) to generate actions iteratively has become a prevalent paradigm due to its superior performance and user-friendliness. However, this paradigm is pl… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  32. arXiv:2310.03026  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

    Authors: Hao Sha, Yao Mu, Yuxuan Jiang, Li Chen, Chenfeng Xu, Ping Luo, Shengbo Eben Li, Masayoshi Tomizuka, Wei Zhan, Mingyu Ding

    Abstract: Existing learning-based autonomous driving (AD) systems face challenges in comprehending high-level information, generalizing to rare events, and providing interpretability. To address these problems, this work employs Large Language Models (LLMs) as a decision-making component for complex AD scenarios that require human commonsense understanding. We devise cognitive pathways to enable comprehensi… ▽ More

    Submitted 13 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

  33. arXiv:2310.03023  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Human-oriented Representation Learning for Robotic Manipulation

    Authors: Mingxiao Huo, Mingyu Ding, Chenfeng Xu, Thomas Tian, Xinghao Zhu, Yao Mu, Lingfeng Sun, Masayoshi Tomizuka, Wei Zhan

    Abstract: Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks. We advocate that such a representation automatically arises from simultaneously learning about multiple simple perceptual skills that are critical for everyday scenarios (e.g., hand detection, state estimate, etc.) and is better suited fo… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  34. arXiv:2310.02054  [pdf, other

    cs.AI

    AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model

    Authors: Zibin Dong, Yifu Yuan, Jianye Hao, Fei Ni, Yao Mu, Yan Zheng, Yujing Hu, Tangjie Lv, Changjie Fan, Zhipeng Hu

    Abstract: Aligning agent behaviors with diverse human preferences remains a challenging problem in reinforcement learning (RL), owing to the inherent abstractness and mutability of human preferences. To address these issues, we propose AlignDiff, a novel framework that leverages RL from Human Feedback (RLHF) to quantify human preferences, covering abstractness, and utilizes them to guide diffusion planning… ▽ More

    Submitted 4 February, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

  35. arXiv:2309.15289  [pdf, other

    cs.CV cs.LG

    SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

    Authors: Zhiqian Lan, Yuxuan Jiang, Yao Mu, Chen Chen, Shengbo Eben Li

    Abstract: Motion prediction is crucial for autonomous vehicles to operate safely in complex traffic environments. Extracting effective spatiotemporal relationships among traffic elements is key to accurate forecasting. Inspired by the successful practice of pretrained large language models, this paper presents SEPT, a modeling framework that leverages self-supervised learning to develop powerful spatiotempo… ▽ More

    Submitted 19 December, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  36. arXiv:2309.14146  [pdf, other

    cs.CL

    Examining Temporal Bias in Abusive Language Detection

    Authors: Mali Jin, Yida Mu, Diana Maynard, Kalina Bontcheva

    Abstract: The use of abusive language online has become an increasingly pervasive problem that damages both individuals and society, with effects ranging from psychological harm right through to escalation to real-life violence and even death. Machine learning models have been developed to automatically detect abusive language, but these models can suffer from temporal bias, the phenomenon in which topics,… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  37. arXiv:2309.11576  [pdf, other

    cs.CL

    Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

    Authors: Yida Mu, Xingyi Song, Kalina Bontcheva, Nikolaos Aletras

    Abstract: A crucial aspect of a rumor detection model is its ability to generalize, particularly its ability to detect emerging, previously unknown rumors. Past research has indicated that content-based (i.e., using solely source posts as input) rumor detection models tend to perform less effectively on unseen rumors. At the same time, the potential of context-based models remains largely untapped. The main… ▽ More

    Submitted 23 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at LREC-COLING 2024

  38. Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion

    Authors: Peiran Xu, Yadong Mu

    Abstract: Given a group of images, co-salient object detection (CoSOD) aims to highlight the common salient object in each image. There are two factors closely related to the success of this task, namely consensus extraction, and the dispersion of consensus to each image. Most previous works represent the group consensus using local features, while we instead utilize a hierarchical Transformer module for ex… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted by ACM MM 2023

  39. arXiv:2309.04669  [pdf, other

    cs.CV

    Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

    Authors: Yang Jin, Kun Xu, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu

    Abstract: Recently, the remarkable advance of the Large Language Model (LLM) has inspired researchers to transfer its extraordinary reasoning capability to both vision and language data. However, the prevailing approaches primarily regard the visual input as a prompt and focus exclusively on optimizing the text generation process conditioned upon vision content by a frozen LLM. Such an inequitable treatment… ▽ More

    Submitted 22 March, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: ICLR 2024

  40. arXiv:2308.15232  [pdf, other

    cs.LG cs.CL cs.IR

    Classification-Aware Neural Topic Model Combined With Interpretable Analysis -- For Conflict Classification

    Authors: Tianyu Liang, Yida Mu, Soonho Kim, Darline Larissa Kengne Kuate, Julie Lang, Rob Vos, Xingyi Song

    Abstract: A large number of conflict events are affecting the world all the time. In order to analyse such conflict events effectively, this paper presents a Classification-Aware Neural Topic Model (CANTM-IA) for Conflict Information Classification and Topic Discovery. The model provides a reliable interpretation of classification results and discovered topics by introducing interpretability analysis. At th… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted by RANLP 2023

  41. arXiv:2305.19923  [pdf, other

    cs.LG cs.AI

    MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

    Authors: Fei Ni, Jianye Hao, Yao Mu, Yifu Yuan, Yan Zheng, Bin Wang, Zhixuan Liang

    Abstract: Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning(RL). However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To tackle this challenge, in this paper we propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser), which considers the generalizati… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: 19 pages, 4 figures, accepted by ICML 23'

  42. arXiv:2305.17367  [pdf, other

    cs.CL

    Augmenting Large Language Model Translators via Translation Memories

    Authors: Yongyu Mu, Abudurexiti Reheman, Zhiquan Cao, Yuchun Fan, Bei Li, Yinqiao Li, Tong Xiao, Chunliang Zhang, Jingbo Zhu

    Abstract: Using translation memories (TMs) as prompts is a promising approach to in-context learning of machine translation models. In this work, we take a step towards prompting large language models (LLMs) with TMs and making them better translators. We find that the ability of LLMs to ``understand'' prompts is indeed helpful for making better use of TMs. Experiments show that the results of a pre-trained… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  43. arXiv:2305.15021  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

    Authors: Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo

    Abstract: Embodied AI is a crucial frontier in robotics, capable of planning and executing action sequences for robots to accomplish long-horizon tasks in physical environments. In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities. To achieve this, we have made the following ef… ▽ More

    Submitted 13 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  44. arXiv:2305.14310  [pdf, ps, other

    cs.CL

    Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science

    Authors: Yida Mu, Ben P. Wu, William Thorne, Ambrose Robinson, Nikolaos Aletras, Carolina Scarton, Kalina Bontcheva, Xingyi Song

    Abstract: Instruction-tuned Large Language Models (LLMs) have exhibited impressive language understanding and the capacity to generate responses that follow specific prompts. However, due to the computational demands associated with training these models, their applications often adopt a zero-shot setting. In this paper, we evaluate the zero-shot performance of two publicly accessible LLMs, ChatGPT and Open… ▽ More

    Submitted 24 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at LREC-COLING 2024

  45. arXiv:2304.10177  [pdf, other

    cs.LG cs.CV

    Regularizing Second-Order Influences for Continual Learning

    Authors: Zhicheng Sun, Yadong Mu, Gang Hua

    Abstract: Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge. Prevalent replay-based methods address this challenge by rehearsing on a small buffer holding the seen data, for which a delicate sample selection strategy is required. However, existing selection schemes typically seek only to maximize the utility of the ongoing selection, overl… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  46. arXiv:2304.09448  [pdf, other

    cs.LG cs.CL cs.CV

    EC^2: Emergent Communication for Embodied Control

    Authors: Yao Mu, Shunyu Yao, Mingyu Ding, Ping Luo, Chuang Gan

    Abstract: Embodied control requires agents to leverage multi-modal pre-training to quickly learn how to act in new environments, where video demonstrations contain visual and motion details needed for low-level perception and control, and language instructions support generalization with abstract, symbolic structures. While recent approaches apply contrastive learning to force alignment between the two moda… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: Published in CVPR2023

  47. arXiv:2304.04811  [pdf, other

    cs.CL

    A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation

    Authors: Yida Mu, Ye Jiang, Freddy Heppell, Iknoor Singh, Carolina Scarton, Kalina Bontcheva, Xingyi Song

    Abstract: The COVID-19 pandemic led to an infodemic where an overwhelming amount of COVID-19 related content was being disseminated at high velocity through social media. This made it challenging for citizens to differentiate between accurate and inaccurate information about COVID-19. This motivated us to carry out a comparative study of the characteristics of COVID-19 misinformation versus those of accurat… ▽ More

    Submitted 7 May, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: ICWSM TrueHealth 2023

  48. arXiv:2304.04806  [pdf, other

    cs.CL

    Examining Temporalities on Stance Detection towards COVID-19 Vaccination

    Authors: Yida Mu, Mali Jin, Kalina Bontcheva, Xingyi Song

    Abstract: Previous studies have highlighted the importance of vaccination as an effective strategy to control the transmission of the COVID-19 virus. It is crucial for policymakers to have a comprehensive understanding of the public's stance towards vaccination on a large scale. However, attitudes towards COVID-19 vaccination, such as pro-vaccine or vaccine hesitancy, have evolved over time on social media.… ▽ More

    Submitted 24 March, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted at LREC-COLING 2024

  49. arXiv:2304.02853  [pdf, other

    cs.CV

    Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce

    Authors: Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu

    Abstract: This paper aims to establish a generic multi-modal foundation model that has the scalable capability to massive downstream applications in E-commerce. Recently, large-scale vision-language pretraining approaches have achieved remarkable advances in the general domain. However, due to the significant differences between natural and product images, directly applying these frameworks for modeling ima… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: 16 pages, 10 figures, accepted by CVPR 2023

  50. arXiv:2303.09681  [pdf, other

    cs.CV

    Event-based Human Pose Tracking by Spiking Spatiotemporal Transformer

    Authors: Shihao Zou, Yuxuan Mu, Xinxin Zuo, Sen Wang, Li Cheng

    Abstract: Event camera, as an emerging biologically-inspired vision sensor for capturing motion dynamics, presents new potential for 3D human pose tracking, or video-based 3D human pose estimation. However, existing works in pose tracking either require the presence of additional gray-scale images to establish a solid starting pose, or ignore the temporal dependencies all together by collapsing segments of… ▽ More

    Submitted 6 September, 2023; v1 submitted 16 March, 2023; originally announced March 2023.