Skip to main content

Showing 1–50 of 1,378 results for author: Gao, Y

  1. arXiv:2407.13217  [pdf, other

    cs.CV

    LIDIA: Precise Liver Tumor Diagnosis on Multi-Phase Contrast-Enhanced CT via Iterative Fusion and Asymmetric Contrastive Learning

    Authors: Wei Huang, Wei Liu, Xiaoming Zhang, Xiaoli Yin, Xu Han, Chunli Li, Yuan Gao, Yu Shi, Le Lu, Ling Zhang, Lei Zhang, Ke Yan

    Abstract: The early detection and precise diagnosis of liver tumors are tasks of critical clinical value, yet they pose significant challenges due to the high heterogeneity and variability of liver tumors. In this work, a precise LIver tumor DIAgnosis network on multi-phase contrast-enhance CT, named LIDIA, is proposed for real-world scenario. To fully utilize all available phases in contrast-enhanced CT, L… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  2. arXiv:2407.13210  [pdf, other

    eess.IV cs.CV

    Improved Esophageal Varices Assessment from Non-Contrast CT Scans

    Authors: Chunli Li, Xiaoming Zhang, Yuan Gao, Xiaoli Yin, Le Lu, Ling Zhang, Ke Yan, Yu Shi

    Abstract: Esophageal varices (EV), a serious health concern resulting from portal hypertension, are traditionally diagnosed through invasive endoscopic procedures. Despite non-contrast computed tomography (NC-CT) imaging being a less expensive and non-invasive imaging modality, it has yet to gain full acceptance as a primary clinical diagnostic tool for EV evaluation. To overcome existing diagnostic challen… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Early accepted to MICCAI 2024

  3. arXiv:2407.12038  [pdf, ps, other

    eess.AS cs.AI

    ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

    Authors: Ruibo Fu, Rui Liu, Chunyu Qiang, Yingming Gao, Yi Lu, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Shuchen Shi, Xin Qi, Guanjun Li

    Abstract: The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: ISCSLP 2024 Challenge description

  4. arXiv:2407.11499  [pdf, other

    cs.CV

    Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection

    Authors: Qijie Mo, Yipeng Gao, Shenghao Fu, Junkai Yan, Ancong Wu, Wei-Shi Zheng

    Abstract: In incremental object detection, knowledge distillation has been proven to be an effective way to alleviate catastrophic forgetting. However, previous works focused on preserving the knowledge of old models, ignoring that images could simultaneously contain categories from past, present, and future stages. The co-occurrence of objects makes the optimization objectives inconsistent across different… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  5. arXiv:2407.11464  [pdf, other

    cs.CV

    Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes

    Authors: Zhi Cai, Yingjie Gao, Yaoyan Zheng, Nan Zhou, Di Huang

    Abstract: In computer vision, object detection is an important task that finds its application in many scenarios. However, obtaining extensive labels can be challenging, especially in crowded scenes. Recently, the Segment Anything Model (SAM) has been proposed as a powerful zero-shot segmenter, offering a novel approach to instance segmentation tasks. However, the accuracy and efficiency of SAM and its vari… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  6. arXiv:2407.11381  [pdf, other

    cs.CV

    Leveraging Segment Anything Model in Identifying Buildings within Refugee Camps (SAM4Refugee) from Satellite Imagery for Humanitarian Operations

    Authors: Yunya Gao

    Abstract: Updated building footprints with refugee camps from high-resolution satellite imagery can support related humanitarian operations. This study explores the utilization of the "Segment Anything Model" (SAM) and one of its branches, SAM-Adapter, for semantic segmentation tasks in the building extraction from satellite imagery. SAM-Adapter is a lightweight adaptation of the SAM and emerges as a powerf… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  7. arXiv:2407.11356  [pdf, other

    cs.CV

    The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation

    Authors: Muyang Qiu, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

    Abstract: Despite the recent success of domain generalization in medical image segmentation, voxel-wise annotation for all source domains remains a huge burden. Semi-supervised domain generalization has been proposed very recently to combat this challenge by leveraging limited labeled data along with abundant unlabeled data collected from multiple medical institutions, depending on precisely harnessing unla… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  8. arXiv:2407.11046  [pdf, other

    cs.LG cs.AI cs.CL

    A Survey on LoRA of Large Language Models

    Authors: Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao

    Abstract: Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  9. arXiv:2407.10353  [pdf, other

    cs.RO

    UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers

    Authors: Huy Ha, Yihuai Gao, Zipeng Fu, Jie Tan, Shuran Song

    Abstract: We introduce UMI-on-Legs, a new framework that combines real-world and simulation data for quadruped manipulation systems. We scale task-centric data collection in the real world using a hand-held gripper (UMI), providing a cheap way to demonstrate task-relevant manipulation skills without a robot. Simultaneously, we scale robot-centric data in simulation by training whole-body controller for task… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 18 pages, 7 figures, website: https://umi-on-legs.github.io/

    ACM Class: I.2.9

  10. arXiv:2407.09285  [pdf, other

    cs.CV

    MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results

    Authors: Jiangpeng He, Yuhao Chen, Gautham Vinod, Talha Ibn Mahmud, Fengqing Zhu, Edward Delp, Alexander Wong, Pengcheng Xi, Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva, Jiadong Tang, Dianyi Yang, Yu Gao, Zhaoxiang Liang, Yawei Jueluo, Chengyu Shi, Pengyu Wang

    Abstract: The increasing interest in computer vision applications for nutrition and dietary monitoring has led to the development of advanced 3D reconstruction techniques for food items. However, the scarcity of high-quality data and limited collaboration between industry and academia have constrained progress in this field. Building on recent advancements in 3D reconstruction, we host the MetaFood Workshop… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Technical report for MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction. arXiv admin note: substantial text overlap with arXiv:2407.01717

  11. arXiv:2407.08953  [pdf, ps, other

    q-fin.CP cs.LG

    Attribution Methods in Asset Pricing: Do They Account for Risk?

    Authors: Dangxing Chen, Yuan Gao

    Abstract: Over the past few decades, machine learning models have been extremely successful. As a result of axiomatic attribution methods, feature contributions have been explained more clearly and rigorously. There are, however, few studies that have examined domain knowledge in conjunction with the axioms. In this study, we examine asset pricing in finance, a field closely related to risk management. Cons… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Journal ref: 2024 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr)

  12. arXiv:2407.08150  [pdf, other

    cs.CV

    Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

    Authors: Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, Ping Wang

    Abstract: Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within… ▽ More

    Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MULTIMEDIA 2024

  13. arXiv:2407.06358  [pdf, other

    cs.CV

    MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions

    Authors: Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan

    Abstract: Sora's high-motion intensity and long consistent videos have significantly impacted the field of video generation, attracting unprecedented attention. However, existing publicly available datasets are inadequate for generating Sora-like videos, as they mainly contain short videos with low motion intensity and brief captions. To address these issues, we propose MiraData, a high-quality video datase… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  14. arXiv:2407.05571  [pdf, other

    cs.NI eess.SP

    Cost-Efficient Computation Offloading in SAGIN: A Deep Reinforcement Learning and Perception-Aided Approach

    Authors: Yulan Gao, Ziqiang Ye, Han Yu

    Abstract: The Space-Air-Ground Integrated Network (SAGIN), crucial to the advancement of sixth-generation (6G) technology, plays a key role in ensuring universal connectivity, particularly by addressing the communication needs of remote areas lacking cellular network infrastructure. This paper delves into the role of unmanned aerial vehicles (UAVs) within SAGIN, where they act as a control layer owing to th… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  15. arXiv:2407.05082  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    DMTG: One-Shot Differentiable Multi-Task Grouping

    Authors: Yuan Gao, Shuguo Jiang, Moran Li, Jin-Gang Yu, Gui-Song Xia

    Abstract: We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grouping (MTG). Given N tasks, we propose to simultaneously identify the best task groups from 2^N candidates and train the model weights simultaneously in one-shot, with the high-order task-affinity fully exploited. This is distinct from the pioneering methods which sequentially identify the groups and train th… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to ICML 2024

    Journal ref: International Conference on Machine Learning (ICML), 2024

  16. arXiv:2407.04621  [pdf, other

    cs.CV

    OneRestore: A Universal Restoration Framework for Composite Degradation

    Authors: Yu Guo, Yuan Gao, Yuxu Lu, Huilin Zhu, Ryan Wen Liu, Shengfeng He

    Abstract: In real-world scenarios, image impairments often manifest as composite degradations, presenting a complex interplay of elements such as low light, haze, rain, and snow. Despite this reality, existing restoration methods typically target isolated degradation types, thereby falling short in environments where multiple degrading factors coexist. To bridge this gap, our study proposes a versatile imag… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  17. arXiv:2407.04217  [pdf, other

    cs.DB cs.IR

    An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models

    Authors: Mengzhao Wang, Haotian Wu, Xiangyu Ke, Yunjun Gao, Xiaoliang Xu, Lu Chen

    Abstract: Retrieval-augmented Large Language Models (LLMs) have reshaped traditional query-answering systems, offering unparalleled user experiences. However, existing retrieval techniques often struggle to handle multi-modal query contexts. In this paper, we present an interactive Multi-modal Query Answering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: This demo paper has been accepted by VLDB 2024

  18. arXiv:2407.04125  [pdf, other

    cs.CL cs.AI cs.LG

    Query-Guided Self-Supervised Summarization of Nursing Notes

    Authors: Ya Gao, Hans Moen, Saila Koivusalo, Miika Koskinen, Pekka Marttinen

    Abstract: Nursing notes, an important component of Electronic Health Records (EHRs), keep track of the progression of a patient's health status during a care episode. Distilling the key information in nursing notes through text summarization techniques can improve clinicians' efficiency in understanding patients' conditions when reviewing nursing notes. However, existing abstractive summarization methods in… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  19. arXiv:2407.04066  [pdf, ps, other

    cs.CV

    EMPL: A novel Efficient Meta Prompt Learning Framework for Few-shot Unsupervised Domain Adaptation

    Authors: Wanqi Yang, Haoran Wang, Lei Wang, Ge Song, Yang Gao

    Abstract: Few-shot unsupervised domain adaptation (FS-UDA) utilizes few-shot labeled source domain data to realize effective classification in unlabeled target domain. However, current FS-UDA methods are still suffer from two issues: 1) the data from different domains can not be effectively aligned by few-shot labeled data due to the large domain gaps, 2) it is unstable and time-consuming to generalize to n… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  20. arXiv:2407.03320  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

    Authors: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao , et al. (2 additional authors not shown)

    Abstract: We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Technical Report. https://github.com/InternLM/InternLM-XComposer

  21. arXiv:2407.03178  [pdf, other

    cs.MM cs.CV cs.LG

    Relating CNN-Transformer Fusion Network for Change Detection

    Authors: Yuhao Gao, Gensheng Pei, Mengmeng Sheng, Zeren Sun, Tao Chen, Yazhou Yao

    Abstract: While deep learning, particularly convolutional neural networks (CNNs), has revolutionized remote sensing (RS) change detection (CD), existing approaches often miss crucial features due to neglecting global context and incomplete change learning. Additionally, transformer networks struggle with low-level details. RCTNet addresses these limitations by introducing \textbf{(1)} an early fusion backbo… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accepted by IEEE Conference on Multimedia Expo

  22. arXiv:2407.02911  [pdf, other

    eess.IV cs.CV

    Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

    Authors: Luyi Han, Tao Tan, Tianyu Zhang, Xin Wang, Yuan Gao, Chunyao Lu, Xinglong Liang, Haoran Dou, Yunzhi Huang, Ritse Mann

    Abstract: Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the rec… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  23. arXiv:2407.02547  [pdf, other

    cs.AI cs.LG

    Domain Generalizable Knowledge Tracing via Concept Aggregation and Relation-Based Attention

    Authors: Yuquan Xie, Wanqi Yang, Jinyu Wei, Ming Yang, Yang Gao

    Abstract: Knowledge Tracing (KT) is a critical task in online education systems, aiming to monitor students' knowledge states throughout a learning period. Common KT approaches involve predicting the probability of a student correctly answering the next question based on their exercise history. However, these methods often suffer from performance degradation when faced with the scarcity of student interacti… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  24. arXiv:2407.01260  [pdf, other

    cs.CR

    DeepiSign-G: Generic Watermark to Stamp Hidden DNN Parameters for Self-contained Tracking

    Authors: Alsharif Abuadbba, Nicholas Rhodes, Kristen Moore, Bushra Sabir, Shuo Wang, Yansong Gao

    Abstract: Deep learning solutions in critical domains like autonomous vehicles, facial recognition, and sentiment analysis require caution due to the severe consequences of errors. Research shows these models are vulnerable to adversarial attacks, such as data poisoning and neural trojaning, which can covertly manipulate model behavior, compromising reliability and safety. Current defense strategies like wa… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 13 pages

  25. arXiv:2407.01067  [pdf, other

    cs.AI cs.CL cs.CV cs.HC cs.LG

    Human-like object concept representations emerge naturally in multimodal large language models

    Authors: Changde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, Jinpeng Li, Shuang Qiu, Le Chang, Huiguang He

    Abstract: The conceptualization and categorization of natural objects in the human mind have long intrigued cognitive scientists and neuroscientists, offering crucial insights into human perception and cognition. Recently, the rapid development of Large Language Models (LLMs) has raised the attractive question of whether these models can also develop human-like object representations through exposure to vas… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  26. arXiv:2407.00952  [pdf, other

    cs.LG cs.CL cs.DC

    SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

    Authors: Zheng Lin, Xuanjie Hu, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Ang Li, Praneeth Vepakomma, Yue Gao

    Abstract: The scalability of large language models (LLMs) in handling high-complexity models and large-scale datasets has led to tremendous successes in pivotal domains. While there is an urgent need to acquire more training data for LLMs, a concerning reality is the depletion of high-quality public datasets within a few years. In view of this, the federated learning (FL) LLM fine-tuning paradigm recently h… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 9 pages, 3 figures

  27. arXiv:2407.00949  [pdf, ps, other

    cs.CV eess.IV

    SpectralKAN: Kolmogorov-Arnold Network for Hyperspectral Images Change Detection

    Authors: Yanheng Wang, Xiaohan Yu, Yongsheng Gao, Jianjun Sha, Jian Wang, Lianru Gao, Yonggang Zhang, Xianhui Rong

    Abstract: It has been verified that deep learning methods, including convolutional neural networks (CNNs), graph neural networks (GNNs), and transformers, can accurately extract features from hyperspectral images (HSIs). These algorithms perform exceptionally well on HSIs change detection (HSIs-CD). However, the downside of these impressive results is the enormous number of parameters, FLOPs, GPU memory, tr… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  28. arXiv:2406.19803  [pdf, other

    cs.CL

    Scalable and Domain-General Abstractive Proposition Segmentation

    Authors: Mohammad Javad Hosseini, Yang Gao, Tim Baumgärtner, Alex Fabrikant, Reinald Kim Amplayo

    Abstract: Segmenting text into fine-grained units of meaning is important to a wide range of NLP applications. The default approach of segmenting text into sentences is often insufficient, especially since sentences are usually complex enough to include multiple units of meaning that merit separate treatment in the downstream task. We focus on the task of abstractive proposition segmentation: transforming t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  29. arXiv:2406.19130  [pdf, other

    cs.CV

    Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease Diagnosis

    Authors: Yibo Gao, Zheyao Gao, Xin Gao, Yuanye Liu, Bomin Wang, Xiahai Zhuang

    Abstract: Due to the high stakes in medical decision-making, there is a compelling demand for interpretable deep learning methods in medical image analysis. Concept Bottleneck Models (CBM) have emerged as an active interpretable framework incorporating human-interpretable concepts into decision-making. However, their concept predictions may lack reliability when applied to clinical diagnosis, impeding conce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: accepted by MICCAI 2024

  30. arXiv:2406.18453  [pdf, other

    cs.CV

    Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference

    Authors: Yuan Gao, Yajing Luo, Junhong Wang, Kui Jia, Gui-Song Xia

    Abstract: Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair. This is arguably achieved by incorporating (i) 3D/2.5D shape perception from a single image, (ii) render-and-compare simulation, and (iii) rich semantic cue awareness to furnish (coarse) reference-query correspondence. Existing methods implement (i) by a 3D CAD mo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: The codes are available at https://github.com/ethanygao/training-free_generalizable_relative_pose

  31. arXiv:2406.18067  [pdf, other

    cs.CL eess.AS

    Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification

    Authors: Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: The diverse nature of dialects presents challenges for models trained on specific linguistic patterns, rendering them susceptible to errors when confronted with unseen or out-of-distribution (OOD) data. This study introduces a novel margin-enhanced joint energy model (MEJEM) tailored specifically for OOD detection in dialects. By integrating a generative model and the energy margin loss, our appro… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  32. arXiv:2406.18065  [pdf, other

    eess.AS cs.SD

    On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations

    Authors: Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: For speech classification tasks, deep learning models often achieve high accuracy but exhibit shortcomings in calibration, manifesting as classifiers exhibiting overconfidence. The significance of calibration lies in its critical role in guaranteeing the reliability of decision-making within deep learning systems. This study explores the effectiveness of Energy-Based Models in calibrating confiden… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  33. arXiv:2406.17841  [pdf, other

    quant-ph cs.AI

    Probing many-body Bell correlation depth with superconducting qubits

    Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong Jin, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, Jinfeng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

    Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages,6 figures + 14 pages, 6 figures

  34. arXiv:2406.16299  [pdf, other

    cs.CL cs.AI

    Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other

    Authors: Yifei Gao, Jie Ou, Lei Wang, Yuting Xiao, Zhiyuan Xiang, Ruiting Dai, Jun Cheng

    Abstract: Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization metho… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Efficient quantization method

    MSC Class: F.2.3

  35. arXiv:2406.14828  [pdf, other

    cs.CL

    Word Matters: What Influences Domain Adaptation in Summarization?

    Authors: Yinghao Li, Siyu Miao, Heyan Huang, Yang Gao

    Abstract: Domain adaptation aims to enable Large Language Models (LLMs) to generalize domain datasets unseen effectively during the training phase. However, factors such as the size of the model parameters and the scale of training data are general influencers and do not reflect the nuances of domain adaptation performance. This paper investigates the fine-grained factors affecting domain adaptation perform… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  36. arXiv:2406.14697  [pdf, other

    cs.LG

    A Benchmark Study of Deep-RL Methods for Maximum Coverage Problems over Graphs

    Authors: Zhicheng Liang, Yu Yang, Xiangyu Ke, Xiaokui Xiao, Yunjun Gao

    Abstract: Recent years have witnessed a growing trend toward employing deep reinforcement learning (Deep-RL) to derive heuristics for combinatorial optimization (CO) problems on graphs. Maximum Coverage Problem (MCP) and its probabilistic variant on social networks, Influence Maximization (IM), have been particularly prominent in this line of research. In this paper, we present a comprehensive benchmark stu… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  37. arXiv:2406.14264  [pdf, other

    eess.IV cs.CV

    Zero-Shot Image Denoising for High-Resolution Electron Microscopy

    Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, Jingyi Yu, Yuyao Zhang

    Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 12 figures

  38. arXiv:2406.13498  [pdf, other

    cs.CV

    Semantic Enhanced Few-shot Object Detection

    Authors: Zheng Wang, Yingjie Gao, Qingjie Liu, Yunhong Wang

    Abstract: Few-shot object detection~(FSOD), which aims to detect novel objects with limited annotated instances, has made significant progress in recent years. However, existing methods still suffer from biased representations, especially for novel classes in extremely low-shot scenarios. During fine-tuning, a novel class may exploit knowledge from similar base classes to construct its own feature distribut… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by ICIP 2024

  39. arXiv:2406.13282  [pdf, other

    cs.CL

    Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective

    Authors: Meizhi Zhong, Chen Zhang, Yikun Lei, Xikai Liu, Yan Gao, Yao Hu, Kehai Chen, Min Zhang

    Abstract: Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on comparably short texts to far longer texts. A heavy bunch of efforts have been dedicated to boosting the extrapolation via extending the formulations of the RoPE, how… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  40. arXiv:2406.13268  [pdf, other

    eess.AS cs.SD

    CEC: A Noisy Label Detection Method for Speaker Recognition

    Authors: Yao Shen, Yingying Gao, Yaqian Hao, Chenguang Hu, Fulin Zhang, Junlan Feng, Shilei Zhang

    Abstract: Noisy labels are inevitable, even in well-annotated datasets. The detection of noisy labels is of significant importance to enhance the robustness of speaker recognition models. In this paper, we propose a novel noisy label detection approach based on two new statistical metrics: Continuous Inconsistent Counting (CIC) and Total Inconsistent Counting (TIC). These metrics are calculated through Cros… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: interspeech 2024

  41. arXiv:2406.13161  [pdf, other

    cs.AI cs.CL cs.LG cs.PL

    APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts

    Authors: Honghua Dong, Qidong Su, Yubo Gao, Zhaoyu Li, Yangjun Ruan, Gennady Pekhimenko, Chris J. Maddison, Xujie Si

    Abstract: Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and thus challenging to implement and maintain. To address this challenge, we propose APPL, A Prompt Programming Language that acts as a bridge between computer pr… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  42. arXiv:2406.13145  [pdf, other

    eess.SY cs.LG

    Constructing and Evaluating Digital Twins: An Intelligent Framework for DT Development

    Authors: Longfei Ma, Nan Cheng, Xiucheng Wang, Jiong Chen, Yinjun Gao, Dongxiao Zhang, Jun-Jie Zhang

    Abstract: The development of Digital Twins (DTs) represents a transformative advance for simulating and optimizing complex systems in a controlled digital space. Despite their potential, the challenge of constructing DTs that accurately replicate and predict the dynamics of real-world systems remains substantial. This paper introduces an intelligent framework for the construction and evaluation of DTs, spec… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  43. Blitzcrank: Fast Semantic Compression for In-memory Online Transaction Processing

    Authors: Yiming Qiao, Yihan Gao, Huanchen Zhang

    Abstract: We present BLITZCRANK, a high-speed semantic compressor designed for OLTP databases. Previous solutions are inadequate for compressing row-stores: they suffer from either low compression factor due to a coarse compression granularity or suboptimal performance due to the inefficiency in handling dynamic data sets. To solve these problems, we first propose novel semantic models that support fast inf… ▽ More

    Submitted 28 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: 18 pages, 19 figures

    Journal ref: PVLDB, 17(10): 2528 - 2540, 2024

  44. arXiv:2406.12641  [pdf, other

    cs.CL

    DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

    Authors: Zhouhong Gu, Lin Zhang, Xiaoxuan Zhu, Jiangjie Chen, Wenhao Huang, Yikai Zhang, Shusen Wang, Zheyu Ye, Yan Gao, Hongwei Feng, Yanghua Xiao

    Abstract: Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  45. arXiv:2406.12526  [pdf, other

    cs.GT cs.MA math.OC

    On the Convergence of Tâtonnement for Linear Fisher Markets

    Authors: Tianlong Nan, Yuan Gao, Christian Kroer

    Abstract: Tâtonnement is a simple, intuitive market process where prices are iteratively adjusted based on the difference between demand and supply. Many variants under different market assumptions have been studied and shown to converge to a market equilibrium, in some cases at a fast rate. However, the classical case of linear Fisher markets have long eluded the analyses, and it remains unclear whether tâ… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 31 pages, 16 figures

  46. arXiv:2406.12300  [pdf

    eess.IV cs.CV q-bio.NC

    IR2QSM: Quantitative Susceptibility Mapping via Deep Neural Networks with Iterative Reverse Concatenations and Recurrent Modules

    Authors: Min Li, Chen Chen, Zhuang Xiong, Ying Liu, Pengfei Rong, Shanshan Shan, Feng Liu, Hongfu Sun, Yang Gao

    Abstract: Quantitative susceptibility mapping (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-bas… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 9 figures

  47. arXiv:2406.12030  [pdf, other

    cs.CV cs.AI cs.CL

    SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

    Authors: Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao

    Abstract: The emergence of Vision Language Models (VLMs) has brought unprecedented advances in understanding multimodal information. The combination of textual and visual semantics in VLMs is highly complex and diverse, making the safety alignment of these models challenging. Furthermore, due to the limited study on the safety alignment of VLMs, there is a lack of large-scale, high-quality datasets. To addr… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  48. arXiv:2406.12018  [pdf, other

    cs.CL

    CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling

    Authors: Yu Bai, Xiyuan Zou, Heyan Huang, Sanxing Chen, Marc-Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung

    Abstract: Long sequence modeling has gained broad interest as large language models (LLMs) continue to advance. Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed evicted) without affecting the perplexity performance in generating long sequences. However, we show that these methods, despite preserving perplexit… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Work in progress

  49. arXiv:2406.11474  [pdf, other

    cs.CL cs.AI

    How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment

    Authors: Heyan Huang, Yinghao Li, Huashan Sun, Yu Bai, Yang Gao

    Abstract: Recent studies have demonstrated that In-Context Learning (ICL), through the use of specific demonstrations, can align Large Language Models (LLMs) with human preferences known as In-Context Alignment (ICA), indicating that models can comprehend human instructions without requiring parameter adjustments. However, the exploration of the mechanism and applicability of ICA remains limited. In this pa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 22 pages, 6 figures, work in progress

  50. arXiv:2406.11258  [pdf, other

    cs.CL

    Enhancing Biomedical Knowledge Retrieval-Augmented Generation with Self-Rewarding Tree Search and Proximal Policy Optimization

    Authors: Minda Hu, Licheng Zong, Hongru Wang, Jingyan Zhou, Jingjing Li, Yichen Gao, Kam-Fai Wong, Yu Li, Irwin King

    Abstract: Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG). However, existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries, resulting in sub-optimal performance. To address these limitations, we propose a novel plug-and-play LL… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.