Skip to main content

Showing 1–50 of 309 results for author: Luo, S

  1. arXiv:2407.21172  [pdf, other

    cs.RO

    Learning Stable Robot Grasping with Transformer-based Tactile Control Policies

    Authors: En Yen Puang, Zechen Li, Chee Meng Chew, Shan Luo, Yan Wu

    Abstract: Measuring grasp stability is an important skill for dexterous robot manipulation tasks, which can be inferred from haptic information with a tactile sensor. Control policies have to detect rotational displacement and slippage from tactile feedback, and determine a re-grasp strategy in term of location and force. Classic stable grasp task only trains control policies to solve for re-grasp location… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by ICIEA 2024

  2. arXiv:2407.20709  [pdf, other

    cs.RO

    A Case Study on Visual-Audio-Tactile Cross-Modal Retrieval

    Authors: Jagoda Wojcik, Jiaqi Jiang, Jiacheng Wu, Shan Luo

    Abstract: Cross-Modal Retrieval (CMR), which retrieves relevant items from one modality (e.g., audio) given a query in another modality (e.g., visual), has undergone significant advancements in recent years. This capability is crucial for robots to integrate and interpret information across diverse sensory inputs. However, the retrieval space in existing robotic CMR approaches often consists of only one mod… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 7 pages, 6 figures, accepted to the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  3. arXiv:2407.17910  [pdf, other

    stat.ML cs.AI cs.LG

    Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

    Authors: Runpeng Dai, Jianing Wang, Fan Zhou, Shikai Luo, Zhiwei Qin, Chengchun Shi, Hongtu Zhu

    Abstract: Off-policy evaluation (OPE) is widely applied in sectors such as pharmaceuticals and e-commerce to evaluate the efficacy of novel products or policies from offline datasets. This paper introduces a causal deepset framework that relaxes several key structural assumptions, primarily the mean-field assumption, prevalent in existing OPE methodologies that handle spatio-temporal interference. These tra… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  4. arXiv:2407.14758  [pdf, other

    cs.CV

    DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control

    Authors: Xinyu Xu, Shengcheng Luo, Yanchao Yang, Yong-Lu Li, Cewu Lu

    Abstract: Building a general-purpose intelligent home-assistant agent skilled in diverse tasks by human commands is a long-term blueprint of embodied AI research, which poses requirements on task planning, environment modeling, and object interaction. In this work, we study primitive mobile manipulations for embodied agents, i.e. how to navigate and interact based on an instructed verb-noun pair. We propose… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  5. arXiv:2407.14380  [pdf, other

    cs.RO

    Deep Domain Adaptation Regression for Force Calibration of Optical Tactile Sensors

    Authors: Zhuo Chen, Ni Ou, Jiaqi Jiang, Shan Luo

    Abstract: Optical tactile sensors provide robots with rich force information for robot grasping in unstructured environments. The fast and accurate calibration of three-dimensional contact forces holds significance for new sensors and existing tactile sensors which may have incurred damage or aging. However, the conventional neural-network-based force calibration method necessitates a large volume of force-… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems

  6. arXiv:2407.13246  [pdf, other

    cs.CV

    STS MICCAI 2023 Challenge: Grand challenge on 2D and 3D semi-supervised tooth segmentation

    Authors: Yaqi Wang, Yifan Zhang, Xiaodiao Chen, Shuai Wang, Dahong Qian, Fan Ye, Feng Xu, Hongyuan Zhang, Qianni Zhang, Chengyu Wu, Yunxiang Li, Weiwei Cui, Shan Luo, Chengkai Wang, Tianhao Li, Yi Liu, Xiang Feng, Huiyu Zhou, Dongyun Liu, Qixuan Wang, Zhouhao Lin, Wei Song, Yuanlin Li, Bing Wang, Chunshi Wang , et al. (2 additional authors not shown)

    Abstract: Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  7. arXiv:2407.12255  [pdf, other

    cs.CV

    Dual-Hybrid Attention Network for Specular Highlight Removal

    Authors: Xiaojiao Guo, Xuhang Chen, Shenghong Luo, Shuqiang Wang, Chi-Man Pun

    Abstract: Specular highlight removal plays a pivotal role in multimedia applications, as it enhances the quality and interpretability of images and videos, ultimately improving the performance of downstream tasks such as content-based retrieval, object recognition, and scene understanding. Despite significant advances in deep learning-based methods, current state-of-the-art approaches often rely on addition… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

  8. arXiv:2407.09020  [pdf, other

    cs.CL

    3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection

    Authors: Rina Carines Cabral, Siwen Luo, Josiah Poon, Soyeon Caren Han

    Abstract: The significance of mental health classification is paramount in contemporary society, where digital platforms serve as crucial sources for monitoring individuals' well-being. However, existing social media mental health datasets primarily consist of text-only samples, potentially limiting the efficacy of models trained on such data. Recognising that humans utilise cross-modal information to compr… ▽ More

    Submitted 14 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  9. arXiv:2407.08567  [pdf, other

    cs.CV cs.LG

    Adaptive Parametric Activation

    Authors: Konstantinos Panagiotis Alexandridis, Jiankang Deng, Anh Nguyen, Shan Luo

    Abstract: The activation function plays a crucial role in model optimisation, yet the optimal choice remains unclear. For example, the Sigmoid activation is the de-facto activation in balanced classification tasks, however, in imbalanced classification, it proves inappropriate due to bias towards frequent classes. In this work, we delve deeper in this phenomenon by performing a comprehensive statistical ana… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  10. arXiv:2407.03157  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    Let the Code LLM Edit Itself When You Edit the Code

    Authors: Zhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhang, Di He

    Abstract: In this work, we investigate a typical scenario in code generation where a developer edits existing code in real time and requests a code assistant, e.g., a large language model, to re-predict the next token or next line on the fly. Naively, the LLM needs to re-encode the entire KV cache to provide an accurate prediction. However, this process is computationally expensive, especially when the sequ… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Preprint. Work in Progress

  11. arXiv:2407.01649  [pdf, other

    q-bio.QM cs.LG

    FAFE: Immune Complex Modeling with Geodesic Distance Loss on Noisy Group Frames

    Authors: Ruidong Wu, Ruihan Guo, Rui Wang, Shitong Luo, Yue Xu, Jiahan Li, Jianzhu Ma, Qiang Liu, Yunan Luo, Jian Peng

    Abstract: Despite the striking success of general protein folding models such as AlphaFold2(AF2, Jumper et al. (2021)), the accurate computational modeling of antibody-antigen complexes remains a challenging task. In this paper, we first analyze AF2's primary loss function, known as the Frame Aligned Point Error (FAPE), and raise a previously overlooked issue that FAPE tends to face gradient vanishing probl… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  12. arXiv:2407.00905  [pdf, other

    cs.CV

    Learning Robust 3D Representation from CLIP via Dual Denoising

    Authors: Shuqing Luo, Bowen Qu, Wei Gao

    Abstract: In this paper, we explore a critical yet under-investigated issue: how to learn robust and well-generalized 3D representation from pre-trained vision language models such as CLIP. Previous works have demonstrated that cross-modal distillation can provide rich and useful knowledge for 3D data. However, like most deep learning models, the resultant 3D learning network is still vulnerable to adversar… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  13. arXiv:2407.00299  [pdf, other

    cs.RO cs.AI cs.CV cs.HC cs.LG

    Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition

    Authors: Shengcheng Luo, Quanquan Peng, Jun Lv, Kaiwen Hong, Katherine Rose Driggs-Campbell, Cewu Lu, Yong-Lu Li

    Abstract: Employing a teleoperation system for gathering demonstrations offers the potential for more efficient learning of robot manipulation. However, teleoperating a robot arm equipped with a dexterous hand or gripper, via a teleoperation system poses significant challenges due to its high dimensionality, complex motions, and differences in physiological structure. In this study, we introduce a novel s… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures

  14. arXiv:2406.19756  [pdf, other

    cs.CV cs.AI

    Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train

    Authors: Haojun Jiang, Meng Li, Zhenguo Sun, Ning Jia, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

    Abstract: The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-trai… ▽ More

    Submitted 19 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024 ASMUS Workshop

  15. arXiv:2406.16853  [pdf, other

    cs.LG cond-mat.mtrl-sci cs.AI q-bio.BM

    GeoMFormer: A General Architecture for Geometric Molecular Representation Learning

    Authors: Tianlang Chen, Shengjie Luo, Di He, Shuxin Zheng, Tie-Yan Liu, Liwei Wang

    Abstract: Molecular modeling, a central topic in quantum mechanics, aims to accurately calculate the properties and simulate the behaviors of molecular systems. The molecular model is governed by physical laws, which impose geometric constraints such as invariance and equivariance to coordinate rotation and translation. While numerous deep learning approaches have been developed to learn molecular represent… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 25 pages, 13 tables, l figure; ICML 2024 camera ready version

  16. arXiv:2406.14991  [pdf, other

    cs.CL cs.SE

    SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

    Authors: Zeyao Ma, Bohan Zhang, Jing Zhang, Jifan Yu, Xiaokang Zhang, Xiaohan Zhang, Sijia Luo, Xi Wang, Jie Tang

    Abstract: We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users. Unlike existing benchmarks that rely on synthesized queries and simplified spreadsheet files, SpreadsheetBench is built from 912 real questions gathered from online Excel… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Homepage: https://spreadsheetbench.github.io/

  17. arXiv:2406.13165  [pdf, other

    eess.IV cs.AI cs.CV cs.RO

    Cardiac Copilot: Automatic Probe Guidance for Echocardiography with World Model

    Authors: Haojun Jiang, Zhenguo Sun, Ning Jia, Meng Li, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

    Abstract: Echocardiography is the only technique capable of real-time imaging of the heart and is vital for diagnosing the majority of cardiac diseases. However, there is a severe shortage of experienced cardiac sonographers, due to the heart's complex structure and significant operational challenges. To mitigate this situation, we present a Cardiac Copilot system capable of providing real-time probe moveme… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Early Accepted by MICCAI 2024

  18. arXiv:2406.13035  [pdf, other

    cs.CL

    D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models

    Authors: Zhongwei Wan, Xinjian Wu, Yu Zhang, Yi Xin, Chaofan Tao, Zhihong Zhu, Xin Wang, Siqi Luo, Jing Xiong, Mi Zhang

    Abstract: Efficient inference in Large Language Models (LLMs) is impeded by the growing memory demands of key-value (KV) caching, especially for longer sequences. Traditional KV cache eviction strategies, which prioritize less critical KV-pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations. To address this, we introduce Dynamic Discrimi… ▽ More

    Submitted 23 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  19. When Vision Meets Touch: A Contemporary Review for Visuotactile Sensors from the Signal Processing Perspective

    Authors: Shoujie Li, Zihan Wang, Changsheng Wu, Xiang Li, Shan Luo, Bin Fang, Fuchun Sun, Xiao-Ping Zhang, Wenbo Ding

    Abstract: Tactile sensors, which provide information about the physical properties of objects, are an essential component of robotic systems. The visuotactile sensing technology with the merits of high resolution and low cost has facilitated the development of robotics from environment exploration to dexterous operation. Over the years, several reviews on visuotactile sensors for robots have been presented,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Journal of Selected Topics in Signal Processing

  20. arXiv:2406.09767  [pdf, other

    cs.RO

    Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting

    Authors: Ce Hao, Kelvin Lin, Siyuan Luo, Harold Soh

    Abstract: Diffusion policies have demonstrated robust performance in generative modeling, prompting their application in robotic manipulation controlled via language descriptions. In this paper, we introduce a zero-shot, open-vocabulary diffusion policy method for robot manipulation. Using Vision-Language Models (VLMs), our method transforms linguistic task descriptions into actionable keyframes in 3D space… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  21. arXiv:2406.09675  [pdf, other

    cs.LG cs.AI

    Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency

    Authors: Ningyi Liao, Haoyu Liu, Zulun Zhu, Siqiang Luo, Laks V. S. Lakshmanan

    Abstract: With the recent advancements in graph neural networks (GNNs), spectral GNNs have received increasing popularity by virtue of their specialty in capturing graph signals in the frequency domain, demonstrating promising capability in specific tasks. However, few systematic studies have been conducted on assessing their spectral characteristics. This emerging family of models also varies in terms of d… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  22. arXiv:2406.09332  [pdf, other

    cs.RO

    RoTipBot: Robotic Handling of Thin and Flexible Objects using Rotatable Tactile Sensors

    Authors: Jiaqi Jiang, Xuyang Zhang, Daniel Fernandes Gomes, Thanh-Toan Do, Shan Luo

    Abstract: This paper introduces RoTipBot, a novel robotic system for handling thin, flexible objects. Different from previous works that are limited to singulating them using suction cups or soft grippers, RoTipBot can grasp and count multiple layers simultaneously, emulating human handling in various environments. Specifically, we develop a novel vision-based tactile sensor named RoTip that can rotate and… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 20 pages, 21 figures

  23. arXiv:2406.08222  [pdf

    cs.CV cs.AI cs.CY cs.HC

    A Sociotechnical Lens for Evaluating Computer Vision Models: A Case Study on Detecting and Reasoning about Gender and Emotion

    Authors: Sha Luo, Sang Jung Kim, Zening Duan, Kaiping Chen

    Abstract: In the evolving landscape of computer vision (CV) technologies, the automatic detection and interpretation of gender and emotion in images is a critical area of study. This paper investigates social biases in CV models, emphasizing the limitations of traditional evaluation metrics such as precision, recall, and accuracy. These metrics often fall short in capturing the complexities of gender and em… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  24. arXiv:2406.05546  [pdf, other

    cs.DC cs.AI

    Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training

    Authors: Ray Cao, Sherry Luo, Steve Gan, Sujeeth Jinesh

    Abstract: In this study, we explore the impact of relaxing data consistency in parallel machine learning training during a failure using various parameter server configurations. Our failure recovery strategies include traditional checkpointing, chain replication (which ensures a backup server takes over in case of failure), and a novel stateless parameter server approach. In the stateless approach, workers… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  25. arXiv:2406.04628  [pdf, other

    cs.CE q-bio.QM

    Projecting Molecules into Synthesizable Chemical Spaces

    Authors: Shitong Luo, Wenhao Gao, Zuofan Wu, Jian Peng, Connor W. Coley, Jianzhu Ma

    Abstract: Discovering new drug molecules is a pivotal yet challenging process due to the near-infinitely large chemical space and notorious demands on time and resources. Numerous generative models have recently been introduced to accelerate the drug discovery process, but their progression to experimental validation remains limited, largely due to a lack of consideration for synthetic accessibility in prac… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  26. arXiv:2406.03746  [pdf, other

    cs.CL cs.AI

    Efficient Knowledge Infusion via KG-LLM Alignment

    Authors: Zhouyu Jiang, Ling Zhong, Mengshu Sun, Jun Xu, Rui Sun, Hui Cai, Shuhan Luo, Zhiqiang Zhang

    Abstract: To tackle the problem of domain-specific knowledge scarcity within large language models (LLMs), knowledge graph-retrievalaugmented method has been proven to be an effective and efficient technique for knowledge infusion. However, existing approaches face two primary challenges: knowledge mismatch between public available knowledge graphs and the specific domain of the task at hand, and poor infor… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: ACL2024 Findings

  27. arXiv:2406.01363  [pdf, other

    cs.CL cs.IR

    Privacy in LLM-based Recommendation: Recent Advances and Future Directions

    Authors: Sichun Luo, Wei Shao, Yuxuan Yao, Jian Xu, Mingyang Liu, Qintong Li, Bowei He, Maolin Wang, Guanzhi Deng, Hanxu Hou, Xinyi Zhang, Linqi Song

    Abstract: Nowadays, large language models (LLMs) have been integrated with conventional recommendation models to improve recommendation performance. However, while most of the existing works have focused on improving the model performance, the privacy issue has only received comparatively less attention. In this paper, we review recent advancements in privacy within LLM-based recommendation, categorizing th… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  28. arXiv:2406.01138  [pdf, ps, other

    eess.SP cs.IT

    Precise Analysis of Covariance Identifiability for Activity Detection in Grant-Free Random Access

    Authors: Shengsong Luo, Junjie Ma, Chongbin Xu, Xin Wang

    Abstract: We consider the identifiability issue of maximum likelihood based activity detection in massive MIMO based grant-free random access. A prior work by Chen et al. indicates that the identifiability undergoes a phase transition for commonly-used random signatures. In this paper, we provide an analytical characterization of the boundary of the phase transition curve. Our theoretical results agree well… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  29. arXiv:2406.00735  [pdf, other

    q-bio.BM cs.AI cs.LG

    Full-Atom Peptide Design based on Multi-modal Flow Matching

    Authors: Jiahan Li, Chaoran Cheng, Zuofan Wu, Ruihan Guo, Shitong Luo, Zhizhou Ren, Jian Peng, Jianzhu Ma

    Abstract: Peptides, short chains of amino acid residues, play a vital role in numerous biological processes by interacting with other target molecules, offering substantial potential in drug discovery. In this work, we present PepFlow, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides that target specific protein receptors. Drawing inspi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  30. arXiv:2405.16759  [pdf, other

    cs.CV cs.LG

    Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

    Authors: Cristina N. Vasconcelos, Abdullah Rashwan, Austin Waters, Trevor Walker, Keyang Xu, Jimmy Yan, Rui Qian, Shixin Luo, Zarana Parekh, Andrew Bunner, Hongliang Fei, Roopal Garg, Mandy Guo, Ivana Kajic, Yeqing Li, Henna Nandwani, Jordi Pont-Tuset, Yasumasa Onoe, Sarah Rosston, Su Wang, Wenlei Zhou, Kevin Swersky, David J. Fleet, Jason M. Baldridge, Oliver Wang

    Abstract: We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignm… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  31. arXiv:2405.16130  [pdf, ps, other

    cs.LG stat.ME

    Automating the Selection of Proxy Variables of Unmeasured Confounders

    Authors: Feng Xie, Zhengming Chen, Shanshan Luo, Wang Miao, Ruichu Cai, Zhi Geng

    Abstract: Recently, interest has grown in the use of proxy variables of unobserved confounding for inferring the causal effect in the presence of unmeasured confounders from observational data. One difficulty inhibiting the practical use is finding valid proxy variables of unobserved confounding to a target causal effect of interest. These proxy variables are typically justified by background knowledge. In… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  32. arXiv:2405.16114  [pdf, other

    cs.AI cs.CV cs.LG

    Multi-scale Quaternion CNN and BiGRU with Cross Self-attention Feature Fusion for Fault Diagnosis of Bearing

    Authors: Huanbai Liu, Fanlong Zhang, Yin Tan, Lian Huang, Yan Li, Guoheng Huang, Shenghong Luo, An Zeng

    Abstract: In recent years, deep learning has led to significant advances in bearing fault diagnosis (FD). Most techniques aim to achieve greater accuracy. However, they are sensitive to noise and lack robustness, resulting in insufficient domain adaptation and anti-noise ability. The comparison of studies reveals that giving equal attention to all features does not differentiate their significance. In this… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  33. arXiv:2405.12398  [pdf, other

    cs.LG

    ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference

    Authors: Jason Chun Lok Li, Steven Tin Sui Luo, Le Xu, Ngai Wong

    Abstract: Coordinate network or implicit neural representation (INR) is a fast-emerging method for encoding natural signals (such as images and videos) with the benefits of a compact neural representation. While numerous methods have been proposed to increase the encoding capabilities of an INR, an often overlooked aspect is the inference efficiency, usually measured in multiply-accumulate (MAC) count. This… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: ICLR 2024 (v3: 21 pages, 11 figures, Project Page: https://github.com/stevolopolis/asmr.git)

  34. arXiv:2405.12130  [pdf, other

    cs.CL cs.LG

    MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

    Authors: Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang

    Abstract: Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Work in Progress

  35. arXiv:2405.10531  [pdf, other

    cs.LG cs.CV

    Nonparametric Teaching of Implicit Neural Representations

    Authors: Chen Zhang, Steven Tin Sui Luo, Jason Chun Lok Li, Yik-Chung Wu, Ngai Wong

    Abstract: We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of I… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: ICML 2024 (24 pages, 13 figures)

  36. arXiv:2405.07237  [pdf, other

    cs.RO

    Soft Contact Simulation and Manipulation Learning of Deformable Objects with Vision-based Tactile Sensor

    Authors: Jianhua Shan, Yuhao Sun, Shixin Zhang, Fuchun Sun, Zixi Chen, Zirong Shen, Cesare Stefanini, Yiyong Yang, Shan Luo, Bin Fang

    Abstract: Deformable object manipulation is a classical and challenging research area in robotics. Compared with rigid object manipulation, this problem is more complex due to the deformation properties including elastic, plastic, and elastoplastic deformation. In this paper, we describe a new deformable object manipulation method including soft contact simulation, manipulation learning, and sim-to-real tra… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  37. arXiv:2405.03387  [pdf, ps, other

    cs.CL

    The high dimensional psychological profile and cultural bias of ChatGPT

    Authors: Hang Yuan, Zhongyue Che, Shao Li, Yue Zhang, Xiaomeng Hu, Siyang Luo

    Abstract: Given the rapid advancement of large-scale language models, artificial intelligence (AI) models, like ChatGPT, are playing an increasingly prominent role in human society. However, to ensure that artificial intelligence models benefit human society, we must first fully understand the similarities and differences between the human-like characteristics exhibited by artificial intelligence models and… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  38. arXiv:2405.03101  [pdf, ps, other

    cs.IT

    Double Self-Sustainable Reconfigurable Intelligent Surfaces Aided Wireless Communications

    Authors: Ji Wang, Suhong Luo, Yixuan Li, Wenwu Xie, Xingwang Li, Arumugam Nallanathan

    Abstract: A double self-sustainable reconfigurable intelligent surfaces (RISs) assisted multi-user multiple input multiple output (MIMO) system is investigated. Two RISs are equipped with energy harvesting circuit to achieve self-sustainable transmission. The aim is to minimize the transmission power at the base station (BS), while guaranteeing the quality of service (QoS) requirements of the users and meet… ▽ More

    Submitted 7 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  39. arXiv:2405.01439  [pdf, other

    cs.CV

    Improving Domain Generalization on Gaze Estimation via Branch-out Auxiliary Regularization

    Authors: Ruijie Zhao, Pinyan Tang, Sihui Luo

    Abstract: Despite remarkable advancements, mainstream gaze estimation techniques, particularly appearance-based methods, often suffer from performance degradation in uncontrolled environments due to variations in illumination and individual facial attributes. Existing domain adaptation strategies, limited by their need for target domain samples, may fall short in real-world applications. This letter introdu… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  40. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  41. arXiv:2404.19217  [pdf, other

    cs.RO

    FOTS: A Fast Optical Tactile Simulator for Sim2Real Learning of Tactile-motor Robot Manipulation Skills

    Authors: Yongqiang Zhao, Kun Qian, Boyi Duan, Shan Luo

    Abstract: Simulation is a widely used tool in robotics to reduce hardware consumption and gather large-scale data. Despite previous efforts to simulate optical tactile sensors, there remain challenges in efficiently synthesizing images and replicating marker motion under different contact loads. In this work, we propose a fast optical tactile simulator, named FOTS, for simulating optical tactile sensors. We… ▽ More

    Submitted 30 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  42. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  43. arXiv:2404.12720  [pdf, other

    cs.CV cs.CL

    PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering

    Authors: Yihao Ding, Kaixuan Ren, Jiabin Huang, Siwen Luo, Soyeon Caren Han

    Abstract: Document Question Answering (QA) presents a challenge in understanding visually-rich documents (VRD), particularly those dominated by lengthy textual content like research journal articles. Existing studies primarily focus on real-world documents with sparse text, while challenges persist in comprehending the hierarchical semantic relations among multiple pages to locate multimodal components. To… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  44. arXiv:2404.11044  [pdf, other

    cs.AR

    Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access

    Authors: Luming Wang, Xu Zhang, Songyue Wang, Zhuolun Jiang, Tianyue Lu, Mingyu Chen, Siwei Luo, Keji Huang

    Abstract: The growing memory demands of modern applications have driven the adoption of far memory technologies in data centers to provide cost-effective, high-capacity memory solutions. However, far memory presents new performance challenges because its access latencies are significantly longer and more variable than local DRAM. For applications to achieve acceptable performance on far memory, a high degre… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  45. arXiv:2404.10518  [pdf, other

    cs.CV

    MobileNetV4 -- Universal Models for the Mobile Ecosystem

    Authors: Danfeng Qin, Chas Leichner, Manolis Delakis, Marco Fornoni, Shixin Luo, Fan Yang, Weijun Wang, Colby Banbury, Chengxi Ye, Berkin Akin, Vaibhav Aggarwal, Tenghui Zhu, Daniele Moro, Andrew Howard

    Abstract: We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB,… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  46. arXiv:2404.01165  [pdf, other

    cs.CL

    LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models

    Authors: Haoran Li, Junqi Liu, Zexian Wang, Shiyuan Luo, Xiaowei Jia, Huaxiu Yao

    Abstract: The modeling of environmental ecosystems plays a pivotal role in the sustainable management of our planet. Accurate prediction of key environmental variables over space and time can aid in informed policy and decision-making, thus improving people's livelihood. Recently, deep learning-based methods have shown promise in modeling the spatial-temporal relationships for predicting environmental varia… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  47. arXiv:2404.01127  [pdf, other

    cs.CV cs.AI

    Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation

    Authors: Yulin Chen, Guoheng Huang, Kai Huang, Zijin Lin, Guo Zhong, Shenghong Luo, Jie Deng, Jian Zhou

    Abstract: Accurate segmentation of lesion regions is crucial for clinical diagnosis and treatment across various diseases. While deep convolutional networks have achieved satisfactory results in medical image segmentation, they face challenges such as loss of lesion shape information due to continuous convolution and downsampling, as well as the high cost of manually labeling lesions with varying shapes and… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  48. arXiv:2404.01078  [pdf, other

    cs.LG

    Energy-based Model for Accurate Shapley Value Estimation in Interpretable Deep Learning Predictive Modeling

    Authors: Cheng Lu, Jiusun Zeng, Yu Xia, Jinhui Cai, Shihua Luo

    Abstract: As a favorable tool for explainable artificial intelligence (XAI), Shapley value has been widely used to interpret deep learning based predictive models. However, accurate and efficient estimation of Shapley value is difficult since the computation load grows exponentially with the increase of input features. Most existing accelerated estimation methods have to compromise on estimation accuracy wi… ▽ More

    Submitted 5 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  49. arXiv:2403.19094  [pdf, other

    cs.CL

    Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

    Authors: Yuxuan Yao, Han Wu, Zhijiang Guo, Biyan Zhou, Jiahui Gao, Sichun Luo, Hanxu Hou, Xiaojin Fu, Linqi Song

    Abstract: Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues is learning from human or external feedback (e.g. tools). In this paper, we introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the… ▽ More

    Submitted 18 July, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted to COLM 2024

  50. arXiv:2403.13512  [pdf, other

    cs.CV cs.AI

    Scale Decoupled Distillation

    Authors: Shicai Wei Chunbo Luo Yang Luo

    Abstract: Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it often suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing logit-based methods may be sub-optimal since they only leverage the global logit output that couples multiple semantic knowledge. This may transfer ambiguous knowled… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024 10 pages 6figure