Skip to main content

Showing 1–50 of 1,488 results for author: Wang, M

  1. arXiv:2407.13255  [pdf, other

    cs.IT eess.SP

    Interleaved Block-Sparse Transform

    Authors: Lei Liu, Ming Wang, Shufeng Li, Yuhao Chi, Ning Wei, ZhaoYang Zhang

    Abstract: Low-complexity Bayes-optimal memory approximate message passing (MAMP) is an efficient signal estimation algorithm in compressed sensing and multicarrier modulation. However, achieving replica Bayes optimality with MAMP necessitates a large-scale right-unitarily invariant transformation, which is prohibitive in practical systems due to its high computational complexity and hardware costs. To solve… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Submitted to the IEEE Journal

  2. arXiv:2407.11853  [pdf, other

    cs.ET

    A Case for Application-Aware Space Radiation Tolerance in Orbital Computing

    Authors: Meiqi Wang, Han Qiu, Longnv Xu, Di Wang, Yuanjie Li, Tianwei Zhang, Jun Liu, Hewu Li

    Abstract: We are witnessing a surge in the use of commercial off-the-shelf (COTS) hardware for cost-effective in-orbit computing, such as deep neural network (DNN) based on-satellite sensor data processing, Earth object detection, and task decision.However, once exposed to harsh space environments, COTS hardware is vulnerable to cosmic radiation and suffers from exhaustive single-event upsets (SEUs) and mul… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2407.11419  [pdf, other

    cs.CV

    TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs

    Authors: Chenfan Xu, Zhentao Liu, Yuan Liu, Yulong Dou, Jiamin Wu, Jiepeng Wang, Minjiao Wang, Dinggang Shen, Zhiming Cui

    Abstract: Orthodontic treatment usually requires regular face-to-face examinations to monitor dental conditions of the patients. When in-person diagnosis is not feasible, an alternative is to utilize five intra-oral photographs for remote dental monitoring. However, it lacks of 3D information, and how to reconstruct 3D dental models from such sparse view photographs is a challenging problem. In this study,… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: MICCAI2024

  4. The Jade Gateway to Exergaming: How Socio-Cultural Factors Shape Exergaming Among East Asian Older Adults

    Authors: Reza Hadi Mogavi, Juhyung Son, Simin Yang, Derrick M. Wang, Lydia Choong, Ahmad Alhilal, Peng Yuan Zhou, Pan Hui, Lennart E. Nacke

    Abstract: Exergaming, blending exercise and gaming, improves the physical and mental health of older adults. We currently do not fully know the factors that drive older adults to either engage in or abstain from exergaming. Large-scale studies investigating this are still scarce, particularly those studying East Asian older adults. To address this, we interviewed 64 older adults from China, Japan, and South… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: This manuscript is the pre-print version of our paper, which has been accepted for the ACM CHI Play 2024. Please visit https://doi.org/10.1145/3677106

  5. arXiv:2407.08883  [pdf

    cs.CV

    TractGraphFormer: Anatomically Informed Hybrid Graph CNN-Transformer Network for Classification from Diffusion MRI Tractography

    Authors: Yuqian Chen, Fan Zhang, Meng Wang, Leo R. Zekelman, Suheyla Cetin-Karayumak, Tengfei Xue, Chaoyi Zhang, Yang Song, Nikos Makris, Yogesh Rathi, Weidong Cai, Lauren J. O'Donnell

    Abstract: The relationship between brain connections and non-imaging phenotypes is increasingly studied using deep neural networks. However, the local and global properties of the brain's white matter networks are often overlooked in convolutional network design. We introduce TractGraphFormer, a hybrid Graph CNN-Transformer deep learning framework tailored for diffusion MRI tractography. This model leverage… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages, 4 figures

  6. arXiv:2407.08813  [pdf, other

    eess.IV cs.AI cs.CV

    FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

    Authors: Yu Tian, Congcong Wen, Min Shi, Muhammad Muneeb Afzal, Hao Huang, Muhammad Osama Khan, Yan Luo, Yi Fang, Mengyu Wang

    Abstract: Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., di… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Codes available at https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain

  7. arXiv:2407.08801  [pdf, other

    cs.CV

    DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

    Authors: Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu, Meili Wang, Lizhuang Ma, Jian Chang, Jian Jun Zhang

    Abstract: Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  8. arXiv:2407.08572  [pdf, other

    cs.CV

    Boosting Adversarial Transferability for Skeleton-based Action Recognition via Exploring the Model Posterior Space

    Authors: Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Xun Yang, Meng Wang, He Wang

    Abstract: Skeletal motion plays a pivotal role in human activity recognition (HAR). Recently, attack methods have been proposed to identify the universal vulnerability of skeleton-based HAR(S-HAR). However, the research of adversarial transferability on S-HAR is largely missing. More importantly, existing attacks all struggle in transfer across unknown S-HAR models. We observed that the key reason is that t… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  9. arXiv:2407.08473  [pdf, other

    cs.AR cs.AI

    Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation

    Authors: Kaiyan Chang, Zhirong Chen, Yunhao Zhou, Wenlong Zhu, kun wang, Haobo Xu, Cangyuan Li, Mengdi Wang, Shengwen Liang, Huawei Li, Yinhe Han, Ying Wang

    Abstract: Natural language interfaces have exhibited considerable potential in the automation of Verilog generation derived from high-level specifications through the utilization of large language models, garnering significant attention. Nevertheless, this paper elucidates that visual representations contribute essential contextual information critical to design intent for hardware architectures possessing… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ICCAD 2024

  10. arXiv:2407.08126  [pdf, other

    cs.AI cs.CV cs.MM

    Label-anticipated Event Disentanglement for Audio-Visual Video Parsing

    Authors: Jinxing Zhou, Dan Guo, Yuxin Mao, Yiran Zhong, Xiaojun Chang, Meng Wang

    Abstract: Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate events within audio and visual modalities. Multiple events can overlap in the timeline, making identification challenging. While traditional methods usually focus on improving the early audio-visual encoders to embed more effective features, the decoding phase -- crucial for final event classification, often receives less… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  11. arXiv:2407.08125  [pdf, ps, other

    cs.LG

    Real-Time Summarization of Twitter

    Authors: Yixin Jin, Meiqi Wang, Meng Li, Wenjing Zhou, Yi Shen, Hao Liu

    Abstract: In this paper, we describe our approaches to TREC Real-Time Summarization of Twitter. We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant and novel to given interest profiles. Dirichlet score with and with very little smoothing (baseline) are employed to classify whether a tweet is relevant to a given inter… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: This paper was accepted to International Conference on Artificial Intelligence and Electromechanical Automation 2024

  12. arXiv:2407.07930  [pdf

    q-bio.BM cs.LG

    Token-Mol 1.0: Tokenized drug design with large language model

    Authors: Jike Wang, Rui Qin, Mingyang Wang, Meijing Fang, Yangyang Zhang, Yuchen Zhu, Qun Su, Qiaolin Gou, Chao Shen, Odin Zhang, Zhenxing Wu, Dejun Jiang, Xujun Zhang, Huifeng Zhao, Xiaozhe Wan, Zhourui Wu, Liwei Liu, Yu Kang, Chang-Yu Hsieh, Tingjun Hou

    Abstract: Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  13. arXiv:2407.07357  [pdf, ps, other

    cs.LG q-bio.MN

    A deep graph model for the signed interaction prediction in biological network

    Authors: Shuyi Jin, Mengji Zhang, Meijie Wang, Lun Yu

    Abstract: In pharmaceutical research, the strategy of drug repurposing accelerates the development of new therapies while reducing R&D costs. Network pharmacology lays the theoretical groundwork for identifying new drug indications, and deep graph models have become essential for their precision in mapping complex biological networks. Our study introduces an advanced graph model that utilizes graph convolut… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  14. arXiv:2407.07053  [pdf, other

    cs.CV

    Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

    Authors: Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang

    Abstract: Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In lig… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: code: https://github.com/zwq2018/Multi-modal-Self-instruct dataset: https://huggingface.co/datasets/zwq2018/Multi-modal-Self-instruct Leaderboard: https://multi-modal-self-instruct.github.io/

  15. arXiv:2407.06631  [pdf, other

    cs.SI cs.CY cs.HC cs.NI

    A Systematic Review of Echo Chamber Research: Comparative Analysis of Conceptualizations, Operationalizations, and Varying Outcomes

    Authors: David Hartmann, Lena Pohlmann, Sonja Mei Wang, Bettina Berendt

    Abstract: This systematic review synthesizes current research on echo chambers and filter bubbles to highlight the reasons for the dissent in echo chamber research on the existence, antecedents, and effects of the phenomenon. The review of 112 studies reveals that the lack of consensus in echo chamber research is based on different conceptualizations and operationalizations of echo chambers. While studies t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  16. arXiv:2407.06250  [pdf, other

    cs.CV

    FairDiff: Fair Segmentation with Point-Image Diffusion

    Authors: Wenyi Li, Haoran Xu, Guiyu Zhang, Huan-ang Gao, Mingju Gao, Mengyu Wang, Hao Zhao

    Abstract: Fairness is an important topic for medical image analysis, driven by the challenge of unbalanced training data among diverse target groups and the societal demand for equitable medical quality. In response to this issue, our research adopts a data-driven strategy-enhancing data balance by integrating synthetic images. However, in terms of generating synthetic images, previous works either lack pai… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  17. arXiv:2407.06025  [pdf, other

    cs.AI

    iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement

    Authors: Aoyu Pang, Maonan Wang, Man-On Pun, Chung Shue Chen, Xi Xiong

    Abstract: Urban congestion remains a critical challenge, with traffic signal control (TSC) emerging as a potent solution. TSC is often modeled as a Markov Decision Process problem and then solved using reinforcement learning (RL), which has proven effective. However, the existing RL-based TSC system often overlooks imperfect observations caused by degraded communication, such as packet loss, delays, and noi… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  18. arXiv:2407.05721  [pdf, other

    cs.CL

    PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation

    Authors: Jinpeng Hu, Tengteng Dong, Hui Ma, Peng Zou, Xiao Sun, Meng Wang

    Abstract: Mental health has attracted substantial attention in recent years and LLM can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In t… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: work in progress

  19. arXiv:2407.05311  [pdf, other

    cs.CV

    MMAD: Multi-label Micro-Action Detection in Videos

    Authors: Kun Li, Dan Guo, Pengyu Liu, Guoliang Chen, Meng Wang

    Abstract: Human body actions are an important form of non-verbal communication in social interactions. This paper focuses on a specific subset of body actions known as micro-actions, which are subtle, low-intensity body movements that provide a deeper understanding of inner human feelings. In real-world scenarios, human micro-actions often co-occur, with multiple micro-actions overlapping in time, such as s… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Work in Progress

  20. arXiv:2407.05267  [pdf, other

    cs.CV

    DTR: A Unified Deep Tensor Representation Framework for Multimedia Data Recovery

    Authors: Ting-Wei Zhou, Xi-Le Zhao, Jian-Li Wang, Yi-Si Luo, Min Wang, Xiao-Xuan Bai, Hong Yan

    Abstract: Recently, the transform-based tensor representation has attracted increasing attention in multimedia data (e.g., images and videos) recovery problems, which consists of two indispensable components, i.e., transform and characterization. Previously, the development of transform-based tensor representation mainly focuses on the transform aspect. Although several attempts consider using shallow matri… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  21. arXiv:2407.04217  [pdf, other

    cs.DB cs.IR

    An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models

    Authors: Mengzhao Wang, Haotian Wu, Xiangyu Ke, Yunjun Gao, Xiaoliang Xu, Lu Chen

    Abstract: Retrieval-augmented Large Language Models (LLMs) have reshaped traditional query-answering systems, offering unparalleled user experiences. However, existing retrieval techniques often struggle to handle multi-modal query contexts. In this paper, we present an interactive Multi-modal Query Answering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: This demo paper has been accepted by VLDB 2024

  22. arXiv:2407.03724  [pdf, other

    cs.RO

    Flight Structure Optimization of Modular Reconfigurable UAVs

    Authors: Yao Su, Ziyuan Jiao, Zeyu Zhang, Jingwen Zhang, Hang Li, Meng Wang, Hangxin Liu

    Abstract: This paper presents a Genetic Algorithm (GA) designed to reconfigure a large group of modular Unmanned Aerial Vehicles (UAVs), each with different weights and inertia parameters, into an over-actuated flight structure with improved dynamic properties. Previous research efforts either utilized expert knowledge to design flight structures for a specific task or relied on enumeration-based algorithms… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  23. arXiv:2407.03107  [pdf

    cs.HC cs.GR cs.MM

    Design of a UE5-based digital twin platform

    Authors: Shaoqiu Lyu, Muzhi Wang, Sunrui Zhang, Shengzhi Wang

    Abstract: Aiming at the current mainstream 3D scene engine learning and building cost is too high, this thesis proposes a digital twin platform design program based on Unreal Engine 5 (UE5). It aims to provide a universal platform construction design process to effectively reduce the learning cost of large-scale scene construction. Taking an actual project of a unit as an example, the overall cycle work of… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  24. PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition

    Authors: Yanbin Hao, Diansong Zhou, Zhicai Wang, Chong-Wah Ngo, Meng Wang

    Abstract: In recent years, vision Transformers and MLPs have demonstrated remarkable performance in image understanding tasks. However, their inherently dense computational operators, such as self-attention and token-mixing layers, pose significant challenges when applied to spatio-temporal video data. To address this gap, we propose PosMLP-Video, a lightweight yet powerful MLP-like backbone for video recog… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Journal ref: International Journal of Computer Vision, 27 June 2024

  25. arXiv:2407.01920  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models

    Authors: Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang

    Abstract: Large Language Models (LLMs) trained on extensive corpora inevitably retain sensitive data, such as personal privacy information and copyrighted material. Recent advancements in knowledge unlearning involve updating LLM parameters to erase specific knowledge. However, current unlearning paradigms are mired in vague forgetting boundaries, often erasing knowledge indiscriminately. In this work, we i… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress

  26. arXiv:2407.01636  [pdf, other

    cs.CV

    Learning Frequency-Aware Dynamic Transformers for All-In-One Image Restoration

    Authors: Zenglin Shi, Tong Su, Pei Liu, Yunpeng Wu, Le Zhang, Meng Wang

    Abstract: This work aims to tackle the all-in-one image restoration task, which seeks to handle multiple types of degradation with a single model. The primary challenge is to extract degradation representations from the input degraded images and use them to guide the model's adaptation to specific degradation types. Recognizing that various degradations affect image content differently across frequency band… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 8 pages

  27. arXiv:2407.01458  [pdf, other

    cs.LG cs.AI cs.GT econ.TH

    Contractual Reinforcement Learning: Pulling Arms with Invisible Hands

    Authors: Jibang Wu, Siyu Chen, Mengdi Wang, Huazheng Wang, Haifeng Xu

    Abstract: The agency problem emerges in today's large scale machine learning tasks, where the learners are unable to direct content creation or enforce data collection. In this work, we propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design. The problem, termed \emph{contractual reinforcement learning}, naturally aris… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  28. arXiv:2407.00719  [pdf

    cs.CR cs.DC cs.LG

    A Whole-Process Certifiably Robust Aggregation Method Against Backdoor Attacks in Federated Learning

    Authors: Anqi Zhou, Yezheng Liu, Yidong Chai, Hongyi Zhu, Xinyue Ge, Yuanchun Jiang, Meng Wang

    Abstract: Federated Learning (FL) has garnered widespread adoption across various domains such as finance, healthcare, and cybersecurity. Nonetheless, FL remains under significant threat from backdoor attacks, wherein malicious actors insert triggers into trained models, enabling them to perform certain tasks while still meeting FL's primary objectives. In response, robust aggregation methods have been prop… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 14 pages

  29. arXiv:2407.00336  [pdf, other

    cs.CR cs.LG

    Dual-view Aware Smart Contract Vulnerability Detection for Ethereum

    Authors: Jiacheng Yao, Maolin Wang, Wanqi Chen, Chengxiang Jin, Jiajun Zhou, Shanqing Yu, Qi Xuan

    Abstract: The wide application of Ethereum technology has brought technological innovation to traditional industries. As one of Ethereum's core applications, smart contracts utilize diverse contract codes to meet various functional needs and have gained widespread use. However, the non-tamperability of smart contracts, coupled with vulnerabilities caused by natural flaws or human errors, has brought unprece… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted by International Conference on Blockchain and Trustworthy Systems 2024

  30. arXiv:2406.19328  [pdf, other

    cs.SD cs.LG eess.AS

    Subtractive Training for Music Stem Insertion using Latent Diffusion Models

    Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

    Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  31. arXiv:2406.18708  [pdf, other

    cs.LG cs.CL

    Learn it or Leave it: Module Composition and Pruning for Continual Learning

    Authors: Mingyang Wang, Heike Adel, Lukas Lange, Jannik Strötgen, Hinrich Schütze

    Abstract: In real-world environments, continual learning is essential for machine learning models, as they need to acquire new knowledge incrementally without forgetting what they have already learned. While pretrained language models have shown impressive capabilities on various static tasks, applying them to continual learning poses significant challenges, including avoiding catastrophic forgetting, facil… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  32. arXiv:2406.18582  [pdf, other

    cs.CV cs.GR

    Canonical Consolidation Fields: Reconstructing Dynamic Shapes from Point Clouds

    Authors: Miaowei Wang, Changjian Li, Amir Vaxman

    Abstract: We present Canonical Consolidation Fields (CanFields): a method for reconstructing a time series of independently-sampled point clouds into a single deforming coherent shape. Such input often comes from motion capture. Existing methods either couple the geometry and the deformation, where by doing so they smooth fine details and lose the ability to track moving points, or they track the deformatio… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  33. arXiv:2406.18541  [pdf, other

    cs.CV cs.GR

    Refining 3D Point Cloud Normal Estimation via Sample Selection

    Authors: Jun Zhou, Yaoshun Li, Hongchen Tan, Mingjie Wang, Nannan Li, Xiuping Liu

    Abstract: In recent years, point cloud normal estimation, as a classical and foundational algorithm, has garnered extensive attention in the field of 3D geometric processing. Despite the remarkable performance achieved by current Neural Network-based methods, their robustness is still influenced by the quality of training data and the models' performance. In this study, we designed a fundamental framework f… ▽ More

    Submitted 19 May, 2024; originally announced June 2024.

  34. arXiv:2406.18311  [pdf, other

    cs.LG

    Online Learning of Multiple Tasks and Their Relationships : Testing on Spam Email Data and EEG Signals Recorded in Construction Fields

    Authors: Yixin Jin, Wenjing Zhou, Meiqi Wang, Meng Li, Xintao Li, Tianyu Hu

    Abstract: This paper examines an online multi-task learning (OMTL) method, which processes data sequentially to predict labels across related tasks. The framework learns task weights and their relatedness concurrently. Unlike previous models that assumed static task relatedness, our approach treats tasks as initially independent, updating their relatedness iteratively using newly calculated weight vectors.… ▽ More

    Submitted 29 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  35. arXiv:2406.18294  [pdf, other

    cs.CL

    Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

    Authors: Lei Zhang, Yunshui Li, Jiaming Li, Xiaobo Xia, Jiaxi Yang, Run Luo, Minzheng Wang, Longze Chen, Junhao Liu, Min Yang

    Abstract: Some recently developed code large language models (Code LLMs) have been pre-trained on repository-level code data (Repo-Code LLMs), enabling these models to recognize repository structures and utilize cross-file information for code completion. However, in real-world development scenarios, simply concatenating the entire code repository often exceeds the context window limits of these Repo-Code L… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  36. arXiv:2406.17764  [pdf, other

    cs.CL cs.AI

    BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning

    Authors: Ercong Nie, Bo Shao, Zifeng Ding, Mingyang Wang, Helmut Schmid, Hinrich Schütze

    Abstract: Large language models (LLMs) possess extensive parametric knowledge, but this knowledge is difficult to update with new information because retraining is very expensive and infeasible for closed-source models. Knowledge editing (KE) has emerged as a viable solution for updating the knowledge of LLMs without compromising their overall performance. On-the-fly KE methods, inspired by in-context learn… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  37. arXiv:2406.17534  [pdf, other

    cs.CL

    Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification

    Authors: Huiyao Chen, Yu Zhao, Zulong Chen, Mengjia Wang, Liangyue Li, Meishan Zhang, Min Zhang

    Abstract: Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few-shot learning, it is not as effective for HTC because of the expansive hierarchical label sets and extremely-ambiguous labels. In this work, we intro… ▽ More

    Submitted 29 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 17 pages

  38. arXiv:2406.17530  [pdf, other

    cs.CV cs.RO

    Point Tree Transformer for Point Cloud Registration

    Authors: Meiling Wang, Guangyan Chen, Yi Yang, Li Yuan, Yufeng Yue

    Abstract: Point cloud registration is a fundamental task in the fields of computer vision and robotics. Recent developments in transformer-based methods have demonstrated enhanced performance in this domain. However, the standard attention mechanism utilized in these methods often integrates many low-relevance points, thereby struggling to prioritize its attention weights on sparse yet meaningful points. Th… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  39. arXiv:2406.17419  [pdf, other

    cs.CL cs.AI

    Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

    Authors: Minzheng Wang, Longze Chen, Cheng Fu, Shengyi Liao, Xinghua Zhang, Bingli Wu, Haiyang Yu, Nan Xu, Lei Zhang, Run Luo, Yunshui Li, Min Yang, Fei Huang, Yongbin Li

    Abstract: Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up. However, existing benchmarks employ irrelevant noise texts to artificially extend the length of test cases, diverging from the real-world scenarios of long-contex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: We release our code and data publicly at https://github.com/MozerWang/Loong

  40. arXiv:2406.17167  [pdf, other

    cs.LG

    Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis

    Authors: Hongkang Li, Meng Wang, Shuai Zhang, Sijia Liu, Pin-Yu Chen

    Abstract: Efficient training and inference algorithms, such as low-rank adaption and model pruning, have shown impressive performance for learning Transformer-based large foundation models. However, due to the technical challenges of the non-convex optimization caused by the complicated architecture of Transformers, the theoretical study of why these methods can be applied to learn Transformers is mostly el… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: IEEE SAM Workshop 2024

  41. arXiv:2406.16942  [pdf, other

    eess.IV cs.AI cs.CV

    Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

    Authors: Yuanyuan Peng, Aidi Lin, Meng Wang, Tian Lin, Ke Zou, Yinglin Cheng, Tingkun Shi, Xulong Liao, Lixia Feng, Zhen Liang, Xinjian Chen, Huazhu Fu, Haoyu Chen

    Abstract: Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: All codes are available at https://github.com/yuanyuanpeng0129/FMUE

  42. arXiv:2406.16477  [pdf, other

    cs.CV cs.CL

    DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution

    Authors: Aiwen Jiang, Zhi Wei, Long Peng, Feiqiang Liu, Wenbo Li, Mingwen Wang

    Abstract: Image super-resolution pursuits reconstructing high-fidelity high-resolution counterpart for low-resolution image. In recent years, diffusion-based models have garnered significant attention due to their capabilities with rich prior knowledge. The success of diffusion models based on general text prompts has validated the effectiveness of textual control in the field of text2image. However, given… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  43. arXiv:2406.16294  [pdf, other

    cs.CL cs.AI

    LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments

    Authors: Zixia Jia, Mengmeng Wang, Baichen Tong, Song-Chun Zhu, Zilong Zheng

    Abstract: Recent advances in Large Language Models (LLMs) have shown inspiring achievements in constructing autonomous agents that rely on language descriptions as inputs. However, it remains unclear how well LLMs can function as few-shot or zero-shot embodied agents in dynamic interactive environments. To address this gap, we introduce LangSuitE, a versatile and simulation-free testbed featuring 6 represen… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  44. arXiv:2406.16213  [pdf, other

    cs.LG

    Provable Statistical Rates for Consistency Diffusion Models

    Authors: Zehao Dou, Minshuo Chen, Mengdi Wang, Zhuoran Yang

    Abstract: Diffusion models have revolutionized various application domains, including computer vision and audio generation. Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved. In response, consistency models have been developed to merge multiple steps in the sampling process, thereby significantly boosting the s… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 28 pages, 2 figures

  45. arXiv:2406.15567  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    SAIL: Self-Improving Efficient Online Alignment of Large Language Models

    Authors: Mucong Ding, Souradip Chakraborty, Vibhu Agrawal, Zora Che, Alec Koppel, Mengdi Wang, Amrit Bedi, Furong Huang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conc… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 24 pages, 6 figures, 3 tables

  46. arXiv:2406.14422  [pdf, other

    cs.CV cs.AI

    FutureNet-LOF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding

    Authors: Mingkun Wang, Xiaoguang Ren, Ruochun Jin, Minglong Li, Xiaochuan Zhang, Changqian Yu, Mingxu Wang, Wenjing Yang

    Abstract: Most prior motion prediction endeavors in autonomous driving have inadequately encoded future scenarios, leading to predictions that may fail to accurately capture the diverse movements of agents (e.g., vehicles or pedestrians). To address this, we propose FutureNet, which explicitly integrates initially predicted trajectories into the future scenario and further encodes these future contexts to e… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 10 pages

  47. arXiv:2406.12566  [pdf, other

    cs.CL

    RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation

    Authors: Shuting Wang, Xin Yu, Mang Wang, Weipeng Chen, Yutao Zhu, Zhicheng Dou

    Abstract: Retrieval-augmented generation (RAG) effectively addresses issues of static knowledge and hallucination in large language models. Existing studies mostly focus on question scenarios with clear user intents and concise answers. However, it is prevalent that users issue broad, open-ended queries with diverse sub-intents, for which they desire rich and long-form answers covering multiple relevant asp… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  48. arXiv:2406.11250  [pdf, other

    cs.CL

    Can Machines Resonate with Humans? Evaluating the Emotional and Empathic Comprehension of LMs

    Authors: Muhammad Arslan Manzoor, Yuxia Wang, Minghan Wang, Preslav Nakov

    Abstract: Empathy plays a pivotal role in fostering prosocial behavior, often triggered by the sharing of personal experiences through narratives. However, modeling empathy using NLP approaches remains challenging due to its deep interconnection with human interaction dynamics. Previous approaches, which involve fine-tuning language models (LMs) on human-annotated empathic datasets, have had limited success… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 18 pages

  49. arXiv:2406.10880  [pdf, other

    cs.CL

    Exploring the Potential of Multimodal LLM with Knowledge-Intensive Multimodal ASR

    Authors: Minghan Wang, Yuxia Wang, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari

    Abstract: Recent advancements in multimodal large language models (MLLMs) have made significant progress in integrating information across various modalities, yet real-world applications in educational and scientific domains remain challenging. This paper introduces the Multimodal Scientific ASR (MS-ASR) task, which focuses on transcribing scientific conference videos by leveraging visual information from s… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  50. arXiv:2406.10811  [pdf, other

    cs.CL cs.AI cs.CE

    LLMFactor: Extracting Profitable Factors through Prompts for Explainable Stock Movement Prediction

    Authors: Meiyun Wang, Kiyoshi Izumi, Hiroki Sakaji

    Abstract: Recently, Large Language Models (LLMs) have attracted significant attention for their exceptional performance across a broad range of tasks, particularly in text analysis. However, the finance sector presents a distinct challenge due to its dependence on time-series data for complex forecasting tasks. In this study, we introduce a novel framework called LLMFactor, which employs Sequential Knowledg… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: ACL(Findings)2024