Skip to main content

Showing 1–50 of 877 results for author: Luo, J

  1. arXiv:2407.11590  [pdf, other

    eess.IV cs.CV

    Rethinking Learned Image Compression: Context is All You Need

    Authors: Jixiang Luo

    Abstract: Since LIC has made rapid progress recently compared to traditional methods, this paper attempts to discuss the question about 'Where is the boundary of Learned Image Compression(LIC)?' with regard to subjective matrics. Thus this paper splits the above problem into two sub-problems:1)Where is the boundary of rate-distortion performance of PSNR? 2)How to further improve the compression gain and ach… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.11409  [pdf, other

    cs.CL

    Representation Bias in Political Sample Simulations with Large Language Models

    Authors: Weihong Qi, Hanjia Lyu, Jiebo Luo

    Abstract: This study seeks to identify and quantify biases in simulating political samples with Large Language Models, specifically focusing on vote choice and public opinion. Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao Dataset, and China Family Panel Studies to simulate voting behaviors and public opinions. This me… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2407.10462  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features

    Authors: Jing Luo, Xinyu Yang, Dorien Herremans

    Abstract: Controllable music generation promotes the interaction between humans and composition systems by projecting the users' intent on their desired music. The challenge of introducing controllability is an increasingly important issue in the symbolic music generation field. When building controllable generative popular multi-instrument music systems, two main challenges typically present themselves, na… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Demo page: https://chinglohsiu.github.io/files/bandcontrolnet.html

  4. arXiv:2407.10402  [pdf, ps, other

    cs.SE

    A Framework for QoS of Integration Testing in Satellite Edge Clouds

    Authors: Guogen Zeng, Juan Luo, Yufeng Zhang, Ying Qiao, Shuyang Teng

    Abstract: The diversification of satellite communication services imposes varied requirements on network service quality, making quality of service (QoS) testing for microservices running on satellites more complex. Existing testing tools have limitations, potentially offering only single-functionality testing, thus failing to meet the requirements of QoS testing for edge cloud services in mobile satellite… ▽ More

    Submitted 16 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

  5. arXiv:2407.09811  [pdf, other

    cs.AI cs.HC q-bio.GN

    CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis

    Authors: Yihang Xiao, Jinyi Liu, Yan Zheng, Xiaohan Xie, Jianye Hao, Mingzhi Li, Ruitao Wang, Fei Ni, Yuxiao Li, Jintian Luo, Shaoqing Jiao, Jiajie Peng

    Abstract: Single-cell RNA sequencing (scRNA-seq) data analysis is crucial for biological research, as it enables the precise characterization of cellular heterogeneity. However, manual manipulation of various tools to achieve desired outcomes can be labor-intensive for researchers. To address this, we introduce CellAgent (http://cell.agent4science.cn/), an LLM-driven multi-agent framework, specifically desi… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  6. arXiv:2407.06886  [pdf, other

    cs.CV cs.AI cs.LG cs.MA cs.RO

    Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

    Authors: Yang Liu, Weixing Chen, Yongjie Bai, Jingzhou Luo, Xinshuai Song, Kaixuan Jiang, Zhida Li, Ganlong Zhao, Junyi Lin, Guanbin Li, Wen Gao, Liang Lin

    Abstract: Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilit… ▽ More

    Submitted 18 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: The first comprehensive review of Embodied AI in the era of MLMs, 37 pages. We also provide the paper list for Embodied AI: https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List

  7. arXiv:2407.05125  [pdf, other

    cs.DC cs.LG

    A Joint Approach to Local Updating and Gradient Compression for Efficient Asynchronous Federated Learning

    Authors: Jiajun Song, Jiajun Luo, Rongwei Lu, Shuzhao Xie, Bin Chen, Zhi Wang

    Abstract: Asynchronous Federated Learning (AFL) confronts inherent challenges arising from the heterogeneity of devices (e.g., their computation capacities) and low-bandwidth environments, both potentially causing stale model updates (e.g., local gradients) for global aggregation. Traditional approaches mitigating the staleness of updates typically focus on either adjusting the local updating or gradient co… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  8. arXiv:2407.04174  [pdf, other

    cs.NI eess.SP

    Gemini: Integrating Full-fledged Sensing upon Millimeter Wave Communications

    Authors: Yilong Li, Zhe Chen, Jun Luo, Suman Banerjee

    Abstract: Integrating millimeter wave (mmWave)technology in both communication and sensing is promising as it enables the reuse of existing spectrum and infrastructure without draining resources. Most existing systems piggyback sensing onto conventional communication modes without fully exploiting the potential of integrated sensing and communication (ISAC) in mmWave radios (not full-fledged). In this paper… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 12 pages

  9. arXiv:2407.02483  [pdf, other

    cs.CL cs.AI

    MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

    Authors: Binxu Li, Tiankai Yan, Yuanting Pan, Zhe Xu, Jie Luo, Ruiyang Ji, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang

    Abstract: Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To brid… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  10. arXiv:2407.02197  [pdf, other

    cs.AI cs.CV cs.RO

    Research on Reliable and Safe Occupancy Grid Prediction in Underground Parking Lots

    Authors: JiaQi Luo

    Abstract: Against the backdrop of advancing science and technology, autonomous vehicle technology has emerged as a focal point of intense scrutiny within the academic community. Nevertheless, the challenge persists in guaranteeing the safety and reliability of this technology when navigating intricate scenarios. While a substantial portion of autonomous driving research is dedicated to testing in open-air e… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 15 pages, 19 figures

  11. arXiv:2407.01552  [pdf

    cs.NI physics.optics

    High Spectral-Efficiency, Ultra-low MIMO SDM Transmission over a Field-Deployed Multi-Core OAM Fiber

    Authors: Junyi Liu, Zengquan Xu, Shuqi Mo, Yuming Huang, Yining Huang, Zhenhua Li, Yuying Guo, Lei Shen, Shuo Xu, Ran Gao, Cheng Du, Qian Feng, Jie Luo, Jie Liu, Siyuan Yu

    Abstract: Few-mode multi-core fiber (FM-MCF) based Space-Division Multiplexing (SDM) systems possess the potential to maximize the number of multiplexed spatial channels per fiber by harnessing both the space (fiber cores) and mode (optical mode per core) dimensions. However, to date, no SDM transmissions over field-deployed FM-MCFs in realistic outdoor settings have been reported, which contrasts with SDM… ▽ More

    Submitted 29 April, 2024; originally announced July 2024.

    Comments: 17 pages, 8 figures

  12. arXiv:2406.19592  [pdf, other

    cs.PL

    Dataflow-Based Optimization for Quantum Intermediate Representation Programs

    Authors: Junjie Luo, Haoyu Zhang, Jianjun Zhao

    Abstract: This paper proposes QDFO, a dataflow-based optimization approach to Microsoft QIR. QDFO consists of two main functions: one is to preprocess the QIR code so that the LLVM optimizer can capture more optimization opportunities, and the other is to optimize the QIR code so that duplicate loading and constructing of qubits and qubit arrays can be avoided. We evaluated our work on the IBM Challenge Dat… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  13. arXiv:2406.18522  [pdf, other

    cs.CV cs.CL

    ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation

    Authors: Shenghai Yuan, Jinfa Huang, Yongqi Xu, Yaoyang Liu, Shaofeng Zhang, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li Yuan

    Abstract: We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to evaluate the temporal and metamorphic capabilities of the T2V models (e.g. Sora and Lumiere) in time-lapse video generation. In contrast to existing benchmarks that focus on the visual quality and textual relevance of generated videos, ChronoMagic-Bench focuses on the model's ability to generate time-lapse videos wi… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 31 pages, 15 figures

  14. arXiv:2406.18270  [pdf, other

    cs.IT cs.NI eess.SY

    Exploiting Data Significance in Remote Estimation of Discrete-State Markov Sources

    Authors: Jiping Luo, Nikolaos Pappas

    Abstract: We consider the semantics-aware remote estimation of a discrete-state Markov source with normal (low-priority) and alarm (high-priority) states. Erroneously announcing a normal state at the destination when the source is actually in an alarm state (i.e., missed alarm error) incurs a significantly higher cost than falsely announcing an alarm state when the source is in a normal state (i.e., false a… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  15. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  16. arXiv:2406.14054  [pdf, other

    cs.LG

    Urban-Focused Multi-Task Offline Reinforcement Learning with Contrastive Data Sharing

    Authors: Xinbo Zhao, Yingxue Zhang, Xin Zhang, Yu Yang, Yiqun Xie, Yanhua Li, Jun Luo

    Abstract: Enhancing diverse human decision-making processes in an urban environment is a critical issue across various applications, including ride-sharing vehicle dispatching, public transportation management, and autonomous driving. Offline reinforcement learning (RL) is a promising approach to learn and optimize human urban strategies (or policies) from pre-collected human-generated spatial-temporal urba… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  17. arXiv:2406.10100  [pdf, other

    cs.CV cs.AI

    SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

    Authors: Junwei Luo, Zhen Pang, Yongjun Zhang, Tingzhu Wang, Linlin Wang, Bo Dang, Jiangwei Lao, Jian Wang, Jingdong Chen, Yihua Tan, Yansheng Li

    Abstract: Remote Sensing Large Multi-Modal Models (RSLMMs) are developing rapidly and showcase significant capabilities in remote sensing imagery (RSI) comprehension. However, due to the limitations of existing datasets, RSLMMs have shortcomings in understanding the rich semantic relations among objects in complex remote sensing scenes. To unlock RSLMMs' complex comprehension ability, we propose a large-sca… ▽ More

    Submitted 8 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 30 pages, 5 figures, 19 tables, dataset and code see https://github.com/Luo-Z13/SkySenseGPT

  18. arXiv:2406.09410  [pdf, other

    cs.CV cs.AI

    STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery

    Authors: Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

    Abstract: Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 18 pages, 11 figures

  19. arXiv:2406.09105  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

    Authors: Chenwei Lin, Hanjia Lyu, Xian Xu, Jiebo Luo

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated outstanding performance in various general multimodal applications such as image recognition and visual reasoning, and have also shown promising potential in specialized domains. However, the application potential of LVLMs in the insurance domain-characterized by rich application scenarios and abundant multimodal data-has not been effectively… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  20. arXiv:2406.09031  [pdf, other

    cs.LG cs.AI

    A Comprehensive Graph Pooling Benchmark: Effectiveness, Robustness and Generalizability

    Authors: Pengyun Wang, Junyu Luo, Yanxin Shen, Siyu Heng, Xiao Luo

    Abstract: Graph pooling has gained attention for its ability to obtain effective node and graph representations for various downstream tasks. Despite the recent surge in graph pooling approaches, there is a lack of standardized experimental settings and fair benchmarks to evaluate their performance. To address this issue, we have constructed a comprehensive benchmark that includes 15 graph pooling methods a… ▽ More

    Submitted 16 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  21. arXiv:2406.08907  [pdf, other

    cs.CV cs.MM

    Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding

    Authors: Yue Xu, Kaizhi Yang, Jiebo Luo, Xuejin Chen

    Abstract: 3D visual grounding is an emerging research area dedicated to making connections between the 3D physical world and natural language, which is crucial for achieving embodied intelligence. In this paper, we propose DASANet, a Dual Attribute-Spatial relation Alignment Network that separately models and aligns object attributes and spatial relation features between language and 3D vision modalities. W… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  22. arXiv:2406.07043  [pdf, other

    cs.CV

    1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

    Authors: Mingqi Gao, Jingnan Luo, Jinyu Yang, Jungong Han, Feng Zheng

    Abstract: Motion Expression guided Video Segmentation (MeViS), as an emerging task, poses many new challenges to the field of referring video object segmentation (RVOS). In this technical report, we investigated and validated the effectiveness of static-dominant data and frame sampling on this challenging setting. Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeVi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  23. arXiv:2406.04331  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    PaCE: Parsimonious Concept Engineering for Large Language Models

    Authors: Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan Chan, Darshan Thaker, Aditya Chattopadhyay, Chris Callison-Burch, René Vidal

    Abstract: Large Language Models (LLMs) are being used for a wide variety of tasks. While they are capable of generating human-like responses, they can also produce undesirable output including potentially harmful information, racist or sexist language, and hallucinations. Alignment methods are designed to reduce such undesirable output, via techniques such as fine-tuning, prompt engineering, and representat… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 26 pages, 17 figures, 5 tables, dataset and code at https://github.com/peterljq/Parsimonious-Concept-Engineering

  24. arXiv:2406.02343  [pdf, other

    cs.LG cs.CV

    Cluster-Aware Similarity Diffusion for Instance Retrieval

    Authors: Jifei Luo, Hantao Yao, Changsheng Xu

    Abstract: Diffusion-based re-ranking is a common method used for retrieving instances by performing similarity propagation in a nearest neighbor graph. However, existing techniques that construct the affinity graph based on pairwise instances can lead to the propagation of misinformation from outliers and other manifolds, resulting in inaccurate results. To overcome this issue, we propose a novel Cluster-Aw… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by ICML2024

  25. arXiv:2406.01085  [pdf, other

    cs.CR cs.AI

    FedAdOb: Privacy-Preserving Federated Deep Learning with Adaptive Obfuscation

    Authors: Hanlin Gu, Jiahuan Luo, Yan Kang, Yuan Yao, Gongxi Zhu, Bowen Li, Lixin Fan, Qiang Yang

    Abstract: Federated learning (FL) has emerged as a collaborative approach that allows multiple clients to jointly learn a machine learning model without sharing their private data. The concern about privacy leakage, albeit demonstrated under specific conditions, has triggered numerous follow-up research in designing powerful attacking methods and effective defending mechanisms aiming to thwart these attacki… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  26. arXiv:2405.20064  [pdf, other

    eess.AS cs.SD

    1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

    Authors: Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain

    Abstract: Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for t… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  27. arXiv:2405.17976  [pdf

    cs.AI cs.CL

    Yuan 2.0-M32: Mixture of Experts with Attention Router

    Authors: Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen

    Abstract: Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 14 pages,3 figures, 7 tables

  28. arXiv:2405.16785  [pdf, other

    cs.CV

    PromptFix: You Prompt and We Fix the Photo

    Authors: Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo

    Abstract: Diffusion models equipped with language models demonstrate excellent controllability in image generation tasks, allowing image processing to adhere to human instructions. However, the lack of diverse instruction-following data hampers the development of models that effectively recognize and execute user-customized instructions, particularly in low-level tasks. Moreover, the stochastic nature of th… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  29. arXiv:2405.15687  [pdf, other

    cs.CV cs.LG

    Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models

    Authors: Yongsheng Yu, Jiebo Luo

    Abstract: Conventional demographic inference methods have predominantly operated under the supervision of accurately labeled data, yet struggle to adapt to shifting social landscapes and diverse cultural contexts, leading to narrow specialization and limited accuracy in applications. Recently, the emergence of large multimodal models (LMMs) has shown transformative potential across various research tasks, s… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted to ICME 2024

  30. arXiv:2405.15412  [pdf, other

    physics.ao-ph cs.AI cs.LG

    ORCA: A Global Ocean Emulator for Multi-year to Decadal Predictions

    Authors: Zijie Guo, Pumeng Lyu, Fenghua Ling, Jing-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

    Abstract: Ocean dynamics plays a crucial role in driving global weather and climate patterns. Accurate and efficient modeling of ocean dynamics is essential for improved understanding of complex ocean circulation and processes, for predicting climate variations and their associated teleconnections, and for addressing the challenges of climate change. While great efforts have been made to improve numerical O… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  31. arXiv:2405.12213  [pdf, other

    cs.RO cs.LG

    Octo: An Open-Source Generalist Robot Policy

    Authors: Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

    Abstract: Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sen… ▽ More

    Submitted 26 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Project website: https://octo-models.github.io

  32. arXiv:2405.11868  [pdf, other

    cs.LG cs.AI cs.CE cs.IR cs.SI

    Towards Graph Contrastive Learning: A Survey and Beyond

    Authors: Wei Ju, Yifan Wang, Yifang Qin, Zhengyang Mao, Zhiping Xiao, Junyu Luo, Junwei Yang, Yiyang Gu, Dongjie Wang, Qingqing Long, Siyu Yi, Xiao Luo, Ming Zhang

    Abstract: In recent years, deep learning on graphs has achieved remarkable success in various domains. However, the reliance on annotated graph data remains a significant bottleneck due to its prohibitive cost and time-intensive nature. To address this challenge, self-supervised learning (SSL) on graphs has gained increasing attention and has made significant progress. SSL enables machine learning models to… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  33. arXiv:2405.11788  [pdf, other

    cs.LG

    TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

    Authors: Junlong Jia, Ying Hu, Xi Weng, Yiming Shi, Miao Li, Xingjian Zhang, Baichuan Zhou, Ziyu Liu, Jie Luo, Lei Huang, Ji Wu

    Abstract: We present TinyLLaVA Factory, an open-source modular codebase for small-scale large multimodal models (LMMs) with a focus on simplicity of code implementations, extensibility of new features, and reproducibility of training results. Following the design philosophy of the factory pattern in software engineering, TinyLLaVA Factory modularizes the entire system into interchangeable components, with e… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Our codebase is made public at https://github.com/TinyLLaVA/TinyLLaVA_Factory with documentation available at https://tinyllava-factory.readthedocs.io/en/latest/

  34. arXiv:2405.10497  [pdf, other

    cs.MM cs.AI cs.CV cs.SI

    SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge

    Authors: Bo Wu, Peiye Liu, Wen-Huang Cheng, Bei Liu, Zhaoyang Zeng, Jia Wang, Qiushi Huang, Jiebo Luo

    Abstract: Social Media Popularity Prediction (SMPP) is a crucial task that involves automatically predicting future popularity values of online posts, leveraging vast amounts of multimodal data available on social media platforms. Studying and investigating social media popularity becomes central to various online applications and requires novel methods of comprehensive analysis, multimodal comprehension, a… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: ACM Multimedia. arXiv admin note: text overlap with arXiv:1910.01795

  35. arXiv:2405.07966  [pdf, other

    cs.CV cs.AI

    OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition

    Authors: Qiuchi Xiang, Jintao Cheng, Jiehao Luo, Jin Wu, Rui Fan, Xieyuanli Chen, Xiaoyu Tang

    Abstract: Place recognition is the foundation for enabling autonomous systems to achieve independent decision-making and safe operations. It is also crucial in tasks such as loop closure detection and global localization within SLAM. Previous methods utilize mundane point cloud representations as input and deep learning-based LiDAR-based Place Recognition (LPR) approaches employing different point cloud ima… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  36. arXiv:2405.06299  [pdf, other

    eess.SP cs.AI

    Cross-domain Learning Framework for Tracking Users in RIS-aided Multi-band ISAC Systems with Sparse Labeled Data

    Authors: Jingzhi Hu, Dusit Niyato, Jun Luo

    Abstract: Integrated sensing and communications (ISAC) is pivotal for 6G communications and is boosted by the rapid development of reconfigurable intelligent surfaces (RISs). Using the channel state information (CSI) across multiple frequency bands, RIS-aided multi-band ISAC systems can potentially track users' positions with high precision. Though tracking with CSI is desirable as no communication overhead… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  37. arXiv:2405.05925  [pdf, other

    cs.LG cs.AI physics.ao-ph

    FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting

    Authors: Xiaohui Zhong, Lei Chen, Hao Li, Jun Liu, Xu Fan, Jie Feng, Kan Dai, Jing-Jia Luo, Jie Wu, Yuan Qi, Bo Lu

    Abstract: Ensemble forecasting is crucial for improving weather predictions, especially for forecasts of extreme events. Constructing an ensemble prediction system (EPS) based on conventional NWP models is highly computationally expensive. ML models have emerged as valuable tools for deterministic weather forecasts, providing forecasts with significantly reduced computational requirements and even surpassin… ▽ More

    Submitted 5 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  38. arXiv:2405.05616  [pdf, other

    cs.CL cs.AI

    G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense Reasoning

    Authors: Ruiting Dai, Yuqiao Tan, Lisi Mo, Shuang Liang, Guohao Huo, Jiayi Luo, Yao Cheng

    Abstract: Commonsense question answering has demonstrated considerable potential across various applications like assistants and social robots. Although fully fine-tuned pre-trained Language Models(LM) have achieved remarkable performance in commonsense reasoning, their tendency to excessively prioritize textual information hampers the precise transfer of structural knowledge and undermines interpretability… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  39. arXiv:2405.00361  [pdf, other

    cs.CL

    AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts

    Authors: Zefang Liu, Jiahua Luo

    Abstract: We introduce AdaMoLE, a novel method for fine-tuning large language models (LLMs) through an Adaptive Mixture of Low-Rank Adaptation (LoRA) Experts. Moving beyond conventional methods that employ a static top-k strategy for activating experts, AdaMoLE dynamically adjusts the activation threshold using a dedicated threshold network, adaptively responding to the varying complexities of different tas… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  40. arXiv:2404.18528  [pdf, other

    cs.LG

    Generation of Uncorrelated Residual Variables for Chemical Process Fault Diagnosis via Transfer Learning-based Input-Output Decoupled Network

    Authors: Zhuofu Pan, Qingkai Sui, Yalin Wang, Jiang Luo, Jie Chen, Hongtian Chen

    Abstract: Structural decoupling has played an essential role in model-based fault isolation and estimation in past decades, which facilitates accurate fault localization and reconstruction thanks to the diagonal transfer matrix design. However, traditional methods exhibit limited effectiveness in modeling high-dimensional nonlinearity and big data, and the decoupling idea has not been well-valued in data-dr… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  41. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  42. arXiv:2404.15895  [pdf, other

    cs.CY cs.CR

    Global Trends in Cryptocurrency Regulation: An Overview

    Authors: Xihan Xiong, Junliang Luo

    Abstract: Cryptocurrencies have evolved into an important asset class, providing a variety of benefits. However, they also present significant risks, such as market volatility and the potential for misuse in illegal activities. These risks underline the urgent need for a comprehensive regulatory framework to ensure consumer protection, market integrity, and financial stability. Yet, the global landscape of… ▽ More

    Submitted 29 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  43. arXiv:2404.15532  [pdf, other

    cs.HC cs.AI cs.CL cs.CV cs.MA

    BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

    Authors: Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu Jin, Jiebo Luo, Yongfeng Zhang

    Abstract: This paper presents BattleAgent, an emulation system that combines the Large Vision-Language Model and Multi-agent System. This novel system aims to simulate complex dynamic interactions among multiple agents, as well as between agents and their environments, over a period of time. It emulates both the decision-making processes of leaders and the viewpoints of ordinary participants, such as soldie… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 26 pages, 14 figures The data and code for this project are accessible at https://github.com/agiresearch/battleagent

  44. arXiv:2404.15272  [pdf, other

    cs.CV cs.AI cs.CL

    CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios

    Authors: Jingyang Lin, Yingda Xia, Jianpeng Zhang, Ke Yan, Le Lu, Jiebo Luo, Ling Zhang

    Abstract: Medical Vision-Language Pretraining (Med-VLP) establishes a connection between visual content from medical images and the relevant textual descriptions. Existing Med-VLP methods primarily focus on 2D images depicting a single body part, notably chest X-rays. In this paper, we extend the scope of Med-VLP to encompass 3D images, specifically targeting full-body scenarios, by using a multimodal datas… ▽ More

    Submitted 28 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: 12 pages, 5 figures, 3 tables

  45. arXiv:2404.14715  [pdf, other

    cs.CV cs.CL

    FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

    Authors: Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo

    Abstract: Recent progress in large-scale pre-training has led to the development of advanced vision-language models (VLMs) with remarkable proficiency in comprehending and generating multimodal content. Despite the impressive ability to perform complex reasoning for VLMs, current models often struggle to effectively and precisely capture the compositional information on both the image and text sides. To add… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  46. arXiv:2404.14581  [pdf, other

    cs.CV cs.AI cs.CR

    The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking

    Authors: Yuying Li, Zeyan Liu, Junyi Zhao, Liangqin Ren, Fengjun Li, Jiebo Luo, Bo Luo

    Abstract: Generative AI models can produce high-quality images based on text prompts. The generated images often appear indistinguishable from images generated by conventional optical photography devices or created by human artists (i.e., real images). While the outstanding performance of such generative models is generally well received, security concerns arise. For instance, such image generators could be… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  47. arXiv:2404.14047  [pdf, other

    cs.LG

    How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

    Authors: Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno

    Abstract: Meta's LLaMA family has become one of the most powerful open-source Large Language Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-limited scenarios, we explore LLaMA3's capabilities when qua… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  48. arXiv:2404.13820  [pdf, other

    cs.CC cs.NE

    Prove Symbolic Regression is NP-hard by Symbol Graph

    Authors: Jinglu Song, Qiang Lu, Bozhou Tian, Jingwen Zhang, Jake Luo, Zhiguang Wang

    Abstract: Symbolic regression (SR) is the task of discovering a symbolic expression that fits a given data set from the space of mathematical expressions. Despite the abundance of research surrounding the SR problem, there's a scarcity of works that confirm its NP-hard nature. Therefore, this paper introduces the concept of a symbol graph as a comprehensive representation of the entire mathematical expressi… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  49. arXiv:2404.13441  [pdf

    physics.app-ph cs.LG

    Machine Learning-Assisted Thermoelectric Cooling for On-Demand Multi-Hotspot Thermal Management

    Authors: Jiajian Luo, Jaeho Lee

    Abstract: Thermoelectric coolers (TECs) offer a promising solution for direct cooling of local hotspots and active thermal management in advanced electronic systems. However, TECs present significant trade-offs among spatial cooling, heating and power consumption. The optimization of TECs requires extensive simulations, which are impractical for managing actual systems with multiple hotspots under spatial a… ▽ More

    Submitted 24 May, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

    Comments: This article has been submitted to Journal of Applied Physics under review

    Report number: JAP-1089-7550

    Journal ref: J. Appl. Phys. 135, 244503 (2024)

  50. arXiv:2404.12353  [pdf, other

    cs.CV cs.AI

    V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

    Authors: Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo

    Abstract: Video summarization aims to create short, accurate, and cohesive summaries of longer videos. Despite the existence of various video summarization datasets, a notable limitation is their limited amount of source videos, which hampers the effective fine-tuning of advanced large vision-language models (VLMs). Additionally, most existing datasets are created for video-to-video summarization, overlooki… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.