Skip to main content

Showing 1–50 of 3,144 results for author: Li, S

  1. arXiv:2407.13677  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    PASTA: Controllable Part-Aware Shape Generation with Autoregressive Transformers

    Authors: Songlin Li, Despoina Paschalidou, Leonidas Guibas

    Abstract: The increased demand for tools that automate the 3D content creation process led to tremendous progress in deep generative models that can generate diverse 3D objects of high fidelity. In this paper, we present PASTA, an autoregressive transformer architecture for generating high quality 3D shapes. PASTA comprises two main components: An autoregressive transformer that generates objects as a seque… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13664  [pdf, other

    cs.LG

    Decision Focused Causal Learning for Direct Counterfactual Marketing Optimization

    Authors: Hao Zhou, Rongxiao Huang, Shaoming Li, Guibin Jiang, Jiaqi Zheng, Bing Cheng, Wei Lin

    Abstract: Marketing optimization plays an important role to enhance user engagement in online Internet platforms. Existing studies usually formulate this problem as a budget allocation problem and solve it by utilizing two fully decoupled stages, i.e., machine learning (ML) and operation research (OR). However, the learning objective in ML does not take account of the downstream optimization task in OR, whi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  3. arXiv:2407.13480  [pdf

    cs.RO cs.AI

    Risk-Aware Vehicle Trajectory Prediction Under Safety-Critical Scenarios

    Authors: Qingfan Wang, Dongyang Xu, Gaoyuan Kuang, Chen Lv, Shengbo Eben Li, Bingbing Nie

    Abstract: Trajectory prediction is significant for intelligent vehicles to achieve high-level autonomous driving, and a lot of relevant research achievements have been made recently. Despite the rapid development, most existing studies solely focused on normal safe scenarios while largely neglecting safety-critical scenarios, particularly those involving imminent collisions. This oversight may result in aut… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.13460  [pdf, other

    cs.CV cs.LG

    SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders

    Authors: Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang, Jane Yung-jen Hsu

    Abstract: Existing zero-shot skeleton-based action recognition methods utilize projection networks to learn a shared latent space of skeleton features and semantic embeddings. The inherent imbalance in action recognition datasets, characterized by variable skeleton sequences yet constant class labels, presents significant challenges for alignment. To address the imbalance, we propose SA-DVAE -- Semantic Ali… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  5. arXiv:2407.13362  [pdf, other

    cs.CV

    Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation

    Authors: Pengfei Wang, Yuxi Wang, Shuai Li, Zhaoxiang Zhang, Zhen Lei, Lei Zhang

    Abstract: The scarcity of large-scale 3D-text paired data poses a great challenge on open vocabulary 3D scene understanding, and hence it is popular to leverage internet-scale 2D data and transfer their open vocabulary capabilities to 3D models through knowledge distillation. However, the existing distillation-based 3D scene understanding approaches rely on the representation capacity of 2D models, disregar… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  6. arXiv:2407.13342  [pdf, other

    cs.CV

    Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds

    Authors: Shengtao Li, Ge Gao, Yudong Liu, Ming Gu, Yu-Shen Liu

    Abstract: Neural signed distance functions (SDFs) have shown powerful ability in fitting the shape geometry. However, inferring continuous signed distance fields from discrete unoriented point clouds still remains a challenge. The neural network typically fits the shape with a rough surface and omits fine-grained geometric details such as shape edges and corners. In this paper, we propose a novel non-linear… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project page: https://list17.github.io/ImplicitFilter

  7. arXiv:2407.13303  [pdf, other

    cs.LG

    Mean Teacher based SSL Framework for Indoor Localization Using Wi-Fi RSSI Fingerprinting

    Authors: Sihao Li, Zhe Tang, Kyeong Soo Kim, Jeremy S. Smith

    Abstract: Wi-Fi fingerprinting is widely applied for indoor localization due to the widespread availability of Wi-Fi devices. However, traditional methods are not ideal for multi-building and multi-floor environments due to the scalability issues. Therefore, more and more researchers have employed deep learning techniques to enable scalable indoor localization. This paper introduces a novel semi-supervised… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 12 pages, 10 figures, under preparation for a journal publication

  8. arXiv:2407.13288  [pdf, other

    cs.LG

    Hierarchical Stage-Wise Training of Linked Deep Neural Networks for Multi-Building and Multi-Floor Indoor Localization Based on Wi-Fi RSSI Fingerprinting

    Authors: Sihao Li, Kyeong Soo Kim, Zhe Tang, Graduate, Jeremy S. Smith

    Abstract: In this paper, we present a new solution to the problem of large-scale multi-building and multi-floor indoor localization based on linked neural networks, where each neural network is dedicated to a sub-problem and trained under a hierarchical stage-wise training framework. When the measured data from sensors have a hierarchical representation as in multi-building and multi-floor indoor localizati… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 9 pages, 5 figures, under review for journal publication

  9. arXiv:2407.13255  [pdf, other

    cs.IT eess.SP

    Interleaved Block-Sparse Transform

    Authors: Lei Liu, Ming Wang, Shufeng Li, Yuhao Chi, Ning Wei, ZhaoYang Zhang

    Abstract: Low-complexity Bayes-optimal memory approximate message passing (MAMP) is an efficient signal estimation algorithm in compressed sensing and multicarrier modulation. However, achieving replica Bayes optimality with MAMP necessitates a large-scale right-unitarily invariant transformation, which is prohibitive in practical systems due to its high computational complexity and hardware costs. To solve… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Submitted to the IEEE Journal

  10. arXiv:2407.13137  [pdf, other

    cs.CV

    OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

    Authors: Jian Sun, Yuqi Dai, Chi-Man Vong, Qing Xu, Shengbo Eben Li, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems. It realizes ego-vehicle surrounding environment perception by projecting 2D multi-view images into 3D world space. Recently, BEV segmentation has made notable progress, attributed to better view transformation modules, larger image encoders, or more temporal information. However, there are still two issu… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  11. arXiv:2407.13126  [pdf, other

    cs.DC

    Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration

    Authors: Tianyu Wang, Sheng Li, Bingyao Li, Yue Dai, Ao Li, Geng Yuan, Yufei Ding, Youtao Zhang, Xulong Tang

    Abstract: Continuous learning (CL) has emerged as one of the most popular deep learning paradigms deployed in modern cloud GPUs. Specifically, CL has the capability to continuously update the model parameters (through model retraining) and use the updated model (if available) to serve overtime arriving inference requests. It is generally beneficial to co-locate the retraining and inference together to enabl… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  12. arXiv:2407.13048  [pdf, other

    cs.CL

    Establishing Knowledge Preference in Language Models

    Authors: Sizhe Zhou, Sha Li, Yu Meng, Yizhu Jiao, Heng Ji, Jiawei Han

    Abstract: Language models are known to encode a great amount of factual knowledge through pretraining. However, such knowledge might be insufficient to cater to user requests, requiring the model to integrate external knowledge sources and adhere to user-provided specifications. When answering questions about ongoing events, the model should use recent news articles to update its response; when asked to pro… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 27 pages, 8 figures, 23 tables, working in progress

  13. arXiv:2407.12973  [pdf, other

    cs.CV cs.AI

    Temporal Label Hierachical Network for Compound Emotion Recognition

    Authors: Sunan Li, Hailun Lian, Cheng Lu, Yan Zhao, Tianhua Qi, Hao Yang, Yuan Zong, Wenming Zheng

    Abstract: The emotion recognition has attracted more attention in recent decades. Although significant progress has been made in the recognition technology of the seven basic emotions, existing methods are still hard to tackle compound emotion recognition that occurred commonly in practical application. This article introduces our achievements in the 7th Field Emotion Behavior Analysis (ABAW) competition. I… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: draft for abaw7

  14. arXiv:2407.12825  [pdf, other

    cs.CL cs.AI cs.LG

    A Depression Detection Method Based on Multi-Modal Feature Fusion Using Cross-Attention

    Authors: Shengjie Li, Yinhao Xiao

    Abstract: Depression, a prevalent and serious mental health issue, affects approximately 3.8\% of the global population. Despite the existence of effective treatments, over 75\% of individuals in low- and middle-income countries remain untreated, partly due to the challenge in accurately diagnosing depression in its early stages. This paper introduces a novel method for detecting depression based on multi-m… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  15. arXiv:2407.12593  [pdf, other

    cs.CV

    EvSign: Sign Language Recognition and Translation with Streaming Events

    Authors: Pengyu Zhang, Hao Yin, Zeren Wang, Wenyue Chen, Shengming Li, Dong Wang, Huchuan Lu, and Xu Jia

    Abstract: Sign language is one of the most effective communication tools for people with hearing difficulties. Most existing works focus on improving the performance of sign language tasks on RGB videos, which may suffer from degraded recording conditions, such as fast movement of hands with motion blur and textured signer's appearance. The bio-inspired event camera, which asynchronously captures brightness… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: To appear on ECCV 2024

  16. arXiv:2407.12504  [pdf, other

    cs.CL

    Case2Code: Learning Inductive Reasoning with Synthetic Data

    Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

    Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  17. arXiv:2407.12491  [pdf, other

    cs.CV

    Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving

    Authors: Yuqi Dai, Jian Sun, Shengbo Eben Li, Qing Xu, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Perception is essential for autonomous driving system. Recent approaches based on Bird's-eye-view (BEV) and deep learning have made significant progress. However, there exists challenging issues including lengthy development cycles, poor reusability, and complex sensor setups in perception algorithm development process. To tackle the above challenges, this paper proposes a novel hierarchical Bird'… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  18. arXiv:2407.12117  [pdf, other

    cs.LG cs.DC

    Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

    Authors: Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui

    Abstract: Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads to substantial activation memory consumption during training, but also incurs considerable memory fragmentation. To facilitate long context training, existing f… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  19. arXiv:2407.12053  [pdf, other

    cs.LG cs.AI q-bio.QM

    Improving AlphaFlow for Efficient Protein Ensembles Generation

    Authors: Shaoning Li, Mingyu Li, Yusong Wang, Xinheng He, Nanning Zheng, Jian Zhang, Pheng-Ann Heng

    Abstract: Investigating conformational landscapes of proteins is a crucial way to understand their biological functions and properties. AlphaFlow stands out as a sequence-conditioned generative model that introduces flexibility into structure prediction models by fine-tuning AlphaFold under the flow-matching framework. Despite the advantages of efficient sampling afforded by flow-matching, AlphaFlow still r… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML 2024 AI4Science workshop

  20. arXiv:2407.12019  [pdf, other

    cs.CL cs.AI

    DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

    Authors: Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma, Xiaodong Liu, Zhuo Li, Xiaoguang Mao

    Abstract: Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities in knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image information utilization. Thus, we propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets. We also propose a method: Dy… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Published on PRCV24

  21. arXiv:2407.11781  [pdf, other

    cs.CV

    SlingBAG: Sliding ball adaptive growth algorithm with differentiable radiation enables super-efficient iterative 3D photoacoustic image reconstruction

    Authors: Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li

    Abstract: High-quality 3D photoacoustic imaging (PAI) reconstruction under sparse view or limited view has long been challenging. Traditional 3D iterative-based reconstruction methods suffer from both slow speed and high memory consumption. Recently, in computer graphics, the differentiable rendering has made significant progress, particularly with the rise of 3D Gaussian Splatting. Inspired by these, we in… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  22. arXiv:2407.11585  [pdf, other

    cs.CV cs.AI

    QVD: Post-training Quantization for Video Diffusion Models

    Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

    Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effe… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: accepted by ACMMM2024

  23. arXiv:2407.11420  [pdf, other

    cs.RO

    iKalibr: Unified Targetless Spatiotemporal Calibration for Resilient Integrated Inertial Systems

    Authors: Shuolong Chen, Xingxing Li, Shengyu Li, Yuxuan Zhou, Xiaoteng Yang

    Abstract: The integrated inertial system, typically integrating an IMU and an exteroceptive sensor such as radar, LiDAR, and camera, has been widely accepted and applied in modern robotic applications for ego-motion estimation, motion control, or autonomous exploration. To improve system accuracy, robustness, and further usability, both multiple and various sensors are generally resiliently integrated, whic… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  24. arXiv:2407.11405  [pdf, other

    cs.CR cs.CV

    Cover-separable Fixed Neural Network Steganography via Deep Generative Models

    Authors: Guobiao Li, Sheng Li, Zhenxing Qian, Xinpeng Zhang

    Abstract: Image steganography is the process of hiding secret data in a cover image by subtle perturbation. Recent studies show that it is feasible to use a fixed neural network for data embedding and extraction. Such Fixed Neural Network Steganography (FNNS) demonstrates favorable performance without the need for training networks, making it more practical for real-world applications. However, the stego-im… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepetd at ACMMM 2024

  25. arXiv:2407.11034  [pdf

    cs.LG

    Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Biomedical Data Analysis

    Authors: Siqi Li, Xin Li, Kunyu Yu, Di Miao, Mingcheng Zhu, Mengying Yan, Yuhe Ke, Danny D'Agostino, Yilin Ning, Qiming Wu, Ziwen Wang, Yuqing Shang, Molei Liu, Chuan Hong, Nan Liu

    Abstract: Clinical and biomedical research in low-resource settings often faces significant challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine l… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  26. arXiv:2407.10550  [pdf, other

    cs.CV

    Learning Natural Consistency Representation for Face Forgery Video Detection

    Authors: Daichi Zhang, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, Shiming Ge

    Abstract: Face Forgery videos have elicited critical social public concerns and various detectors have been proposed. However, fully-supervised detectors may lead to easily overfitting to specific forgery methods or videos, and existing self-supervised detectors are strict on auxiliary tasks, such as requiring audio or multi-modalities, leading to limited generalization and robustness. In this paper, we exa… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  27. arXiv:2407.10457  [pdf, other

    cs.CL cs.AI

    The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

    Authors: Yifan Song, Guoyin Wang, Sujian Li, Bill Yuchen Lin

    Abstract: Current evaluations of large language models (LLMs) often overlook non-determinism, typically focusing on a single output per example. This limits our understanding of LLM performance variability in real-world applications. Our study addresses this issue by exploring key questions about the performance differences between greedy decoding and sampling, identifying benchmarks' consistency regarding… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  28. arXiv:2407.10078  [pdf, other

    cs.IR cs.AI

    Semantic Understanding and Data Imputation using Large Language Model to Accelerate Recommendation System

    Authors: Zhicheng Ding, Jiahao Tian, Zhenkai Wang, Jinman Zhao, Siyang Li

    Abstract: This paper aims to address the challenge of sparse and missing data in recommendation systems, a significant hurdle in the age of big data. Traditional imputation methods struggle to capture complex relationships within the data. We propose a novel approach that fine-tune Large Language Model (LLM) and use it impute missing data for recommendation systems. LLM which is trained on vast amounts of t… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  29. arXiv:2407.10068  [pdf, other

    cs.CL

    Multi-Granularity Semantic Revision for Large Language Model Distillation

    Authors: Xiaoyu Liu, Yun Zhang, Wei Li, Simiao Li, Xudong Huang, Hanting Chen, Yehui Tang, Jie Hu, Zhiwei Xiong, Yunhe Wang

    Abstract: Knowledge distillation plays a key role in compressing the Large Language Models (LLMs), which boosts a small-size student model under large teacher models' guidance. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, the distillation loss functions introduced in previous art st… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  30. arXiv:2407.09992  [pdf, other

    cs.MM

    TOP:A New Target-Audience Oriented Content Paraphrase Task

    Authors: Boda Lin, Jiaxin Shi, Haolong Yan, Binghao Tang, Xiaocheng Gong, Si Li

    Abstract: Recommendation systems usually recommend the existing contents to different users. However, in comparison to static recommendation methods, a recommendation logic that dynamically adjusts based on user interest preferences may potentially attract a larger user base. Thus, we consider paraphrasing existing content based on the interests of the users to modify the content to better align with the pr… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 8 pages

  31. arXiv:2407.09853  [pdf, other

    cs.CV

    Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation

    Authors: Han Li, Shaohui Li, Shuangrui Ding, Wenrui Dai, Maida Cao, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: Image compression for machine and human vision (ICMH) has gained increasing attention in recent years. Existing ICMH methods are limited by high training and storage overheads due to heavy design of task-specific networks. To address this issue, in this paper, we develop a novel lightweight adapter-based tuning framework for ICMH, named Adapt-ICMH, that better balances task performance and bitrate… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024, project: https://github.com/qingshi9974/ECCV2024-AdpatICMH

  32. arXiv:2407.09820  [pdf

    cs.CY

    Mining individual daily commuting patterns of dockless bike-sharing users: a two-layer framework integrating spatiotemporal flow clustering and rule-based decision trees

    Authors: Caigang Zhuang, Shaoying Li, Xiaoping Liu

    Abstract: The rise of dockless bike-sharing systems has led to increased interest in using bike-sharing data for urban transportation and travel behavior research. However, few studies have focused on the individual daily mobility patterns, hindering their alignment with the increasingly refined needs of urban active transportation planning. To bridge this gap, this study presents a two-layer framework, int… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  33. arXiv:2407.09705  [pdf, other

    cs.CV cs.AI cs.MM

    Diagnosing and Re-learning for Balanced Multimodal Learning

    Authors: Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu

    Abstract: To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal performance discrepancy as the basis. However, the intrinsic limitation of modality capacity is ignored. The scarcely informative modalities can be recognized as ``… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  34. arXiv:2407.09618  [pdf, other

    cs.LG cs.SI

    The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

    Authors: Sitao Luan, Chenqing Hua, Qincheng Lu, Liheng Ma, Lirong Wu, Xinyu Wang, Minkai Xu, Xiao-Wen Chang, Doina Precup, Rex Ying, Stan Z. Li, Jian Tang, Guy Wolf, Stefanie Jegelka

    Abstract: Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to be the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance com… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Suggestions and comments are welcomed at sitao.luan@mail.mcgill.ca!

  35. arXiv:2407.09295  [pdf, other

    cs.CR

    Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study

    Authors: Yulong Yang, Xinshan Yang, Shuaidong Li, Chenhao Lin, Zhengyu Zhao, Chao Shen, Tianwei Zhang

    Abstract: The rapid progress in the reasoning capability of the Multi-modal Large Language Models (MLLMs) has triggered the development of autonomous agent systems on mobile devices. MLLM-based mobile agent systems consist of perception, reasoning, memory, and multi-agent collaboration modules, enabling automatic analysis of user instructions and the design of task pipelines with only natural language and d… ▽ More

    Submitted 17 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: Preprint. Work in progress

  36. arXiv:2407.09029  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework

    Authors: Haoqin Sun, Shiwan Zhao, Shaokai Li, Xiangyu Kong, Xuechen Wang, Aobo Kong, Jiaming Zhou, Yong Chen, Wenjia Zeng, Yong Qin

    Abstract: Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that sequentially engages in cross-modal alignment, reconstruction, and refinement phases to handle… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  37. arXiv:2407.08954  [pdf, other

    cs.CR

    PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning

    Authors: Sizai Hou, Songze Li, Tayyebeh Jahani-Nezhad, Giuseppe Caire

    Abstract: Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data while preserving user privacy. However, the typical paradigm of FL faces challenges of both privacy and robustness: the transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  38. arXiv:2407.08722  [pdf, other

    cs.RO cs.CV cs.LG

    Unifying 3D Representation and Control of Diverse Robots with a Single Camera

    Authors: Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Matusik, Chao Liu, Daniela Rus, Vincent Sitzmann

    Abstract: Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Project Page: https://sizhe-li.github.io/publication/neural_jacobian_field

  39. arXiv:2407.08659  [pdf, other

    cs.LG cs.CV

    Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density

    Authors: Shuangqi Li, Chen Liu, Tong Zhang, Hieu Le, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with either enhanced fidelity or increased diversity. Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density, which is based on the nearest-neighbor information from real samples. Our appr… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  40. OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

    Authors: Shiqi Jiang, Ting Ren, Congrui Fu, Shuai Li, Hui Yuan

    Abstract: Screen content (SC) differs from natural scene (NS) with unique characteristics such as noise-free, repetitive patterns, and high contrast. Aiming at addressing the inadequacies of current learned image compression (LIC) methods for SC, we propose an improved two-stage octave convolutional residual blocks (IToRB) for high and low-frequency feature extraction and a cascaded two-stage multi-scale re… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 7 figures, 2 tables

    Journal ref: IEEE Signal Processing Letters, 2024

  41. arXiv:2407.08529  [pdf, other

    cs.CR

    Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks

    Authors: Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

    Abstract: Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion a… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by DASFAA 2024, 16 pages

  42. arXiv:2407.08520  [pdf, other

    eess.IV cs.CV cs.MM

    Enhancing context models for point cloud geometry compression with context feature residuals and multi-loss

    Authors: Chang Sun, Hui Yuan, Shuai Li, Xin Lu, Raouf Hamzaoui

    Abstract: In point cloud geometry compression, context models usually use the one-hot encoding of node occupancy as the label, and the cross-entropy between the one-hot encoding and the probability distribution predicted by the context model as the loss function. However, this approach has two main weaknesses. First, the differences between contexts of different nodes are not significant, making it difficul… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 11 pages, 8 figures

    Journal ref: IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 14, no. 2, pp. 224-234, Jun. 2024

  43. arXiv:2407.08039  [pdf, other

    cs.CL

    Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

    Authors: Yuji Zhang, Sha Li, Jiateng Liu, Pengfei Yu, Yi R. Fung, Jing Li, Manling Li, Heng Ji

    Abstract: Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a language model wi… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  44. arXiv:2407.07835  [pdf, other

    cs.CV cs.AI

    RoBus: A Multimodal Dataset for Controllable Road Networks and Building Layouts Generation

    Authors: Tao Li, Ruihang Li, Huangnan Zheng, Shanding Ye, Shijian Li, Zhijie Pan

    Abstract: Automated 3D city generation, focusing on road networks and building layouts, is in high demand for applications in urban design, multimedia games and autonomous driving simulations. The surge of generative AI facilitates designing city layouts based on deep learning models. However, the lack of high-quality datasets and benchmarks hinders the progress of these data-driven methods in generating ro… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  45. arXiv:2407.07525  [pdf, other

    cs.CV cs.RO

    Incremental Multiview Point Cloud Registration with Two-stage Candidate Retrieval

    Authors: Shiqi Li, Jihua Zhu, Yifan Xie, Mingchen Zhu

    Abstract: Multiview point cloud registration serves as a cornerstone of various computer vision tasks. Previous approaches typically adhere to a global paradigm, where a pose graph is initially constructed followed by motion synchronization to determine the absolute pose. However, this separated approach may not fully leverage the characteristics of multiview registration and might struggle with low-overlap… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  46. arXiv:2407.07289  [pdf, other

    cs.CV

    Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

    Authors: Dengyan Luo, Yanping Xiang, Hu Wang, Luping Ji, Shuai Li, Mao Ye

    Abstract: The detection of moving infrared dim-small targets has been a challenging and prevalent research topic. The current state-of-the-art methods are mainly based on ConvLSTM to aggregate information from adjacent frames to facilitate the detection of the current frame. However, these methods implicitly utilize motion information only in the training stage and fail to explicitly explore motion compensa… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  47. arXiv:2407.06691  [pdf, other

    cs.IT eess.SP

    OFDM Achieves the Lowest Ranging Sidelobe Under Random ISAC Signaling

    Authors: Fan Liu, Ying Zhang, Yifeng Xiong, Shuangyang Li, Weijie Yuan, Feifei Gao, Shi Jin, Giuseppe Caire

    Abstract: This paper aims to answer a fundamental question in the area of Integrated Sensing and Communications (ISAC): What is the optimal communication-centric ISAC waveform for ranging? Towards that end, we first established a generic framework to analyze the sensing performance of communication-centric ISAC waveforms built upon orthonormal signaling bases and random data symbols. Then, we evaluated thei… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 14 pages, 12 figures, submitted to IEEE for possible publication

  48. arXiv:2407.05575  [pdf, other

    cs.CV

    Towards Reflected Object Detection: A Benchmark

    Authors: Zhongtian Wang, You Wu, Hui Zhou, Shuiwang Li

    Abstract: Object detection has greatly improved over the past decade thanks to advances in deep learning and large-scale datasets. However, detecting objects reflected in surfaces remains an underexplored area. Reflective surfaces are ubiquitous in daily life, appearing in homes, offices, public spaces, and natural environments. Accurate detection and interpretation of reflected objects are essential for va… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  49. arXiv:2407.05383  [pdf, other

    cs.CV

    Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking

    Authors: You Wu, Xucheng Wang, Dan Zeng, Hengzhou Ye, Xiaolan Xie, Qijun Zhao, Shuiwang Li

    Abstract: Recently, the surge in the adoption of single-stream architectures utilizing pre-trained ViT backbones represents a promising advancement in the field of generic visual tracking. By integrating feature extraction and fusion into a cohesive framework, these architectures offer improved performance, efficiency, and robustness. However, there has been limited exploration into optimizing these framewo… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  50. arXiv:2407.05259  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Multi-scale Conditional Generative Modeling for Microscopic Image Restoration

    Authors: Luzhe Huang, Xiongye Xiao, Shixuan Li, Jiawen Sun, Yi Huang, Aydogan Ozcan, Paul Bogdan

    Abstract: The advance of diffusion-based generative models in recent years has revolutionized state-of-the-art (SOTA) techniques in a wide variety of image analysis and synthesis tasks, whereas their adaptation on image restoration, particularly within computational microscopy remains theoretically and empirically underexplored. In this research, we introduce a multi-scale generative model that enhances con… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.