Skip to main content

Showing 1–50 of 443 results for author: Sun, Q

  1. arXiv:2407.12857  [pdf, other

    cs.CL cs.DL cs.IR

    Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis

    Authors: Jianxiang Yu, Zichen Ding, Jiaqi Tan, Kangyang Luo, Zhenmin Weng, Chenghua Gong, Long Zeng, Renjing Cui, Chengcheng Han, Qiushi Sun, Zhiyong Wu, Yunshi Lan, Xiang Li

    Abstract: In recent years, the rapid increase in scientific papers has overwhelmed traditional review mechanisms, resulting in varying quality of publications. Although existing methods have explored the capabilities of Large Language Models (LLMs) for automated scientific reviewing, their generated contents are often generic or partial. To address the issues above, we introduce an automated paper reviewing… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2407.12260  [pdf, other

    cs.HC

    HuBar: A Visual Analytics Tool to Explore Human Behaviour based on fNIRS in AR guidance systems

    Authors: Sonia Castelo, Joao Rulff, Parikshit Solunke, Erin McGowan, Guande Wu, Iran Roman, Roque Lopez, Bea Steers, Qi Sun, Juan Bello, Bradley Feest, Michael Middleton, Ryan Mckendrick, Claudio Silva

    Abstract: The concept of an intelligent augmented reality (AR) assistant has significant, wide-ranging applications, with potential uses in medicine, military, and mechanics domains. Such an assistant must be able to perceive the environment and actions, reason about the environment state in relation to a given task, and seamlessly interact with the task performer. These interactions typically involve an AR… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures. This is the author's version of the article that has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics (TVCG)

  3. arXiv:2407.10810  [pdf, other

    cs.CV cs.AI cs.AR cs.LG

    FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries

    Authors: Yuqi Jiang, Xudong Lu, Qian Jin, Qi Sun, Hanming Wu, Cheng Zhuo

    Abstract: Intelligence is key to advancing integrated circuit (IC) fabrication. Recent breakthroughs in Large Multimodal Models (LMMs) have unlocked unparalleled abilities in understanding images and text, fostering intelligent fabrication. Leveraging the power of LMMs, we introduce FabGPT, a customized IC fabrication large multimodal model for wafer defect knowledge query. FabGPT manifests expertise in con… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  4. arXiv:2407.10627  [pdf, other

    cs.CL cs.AI cs.LG

    Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena

    Authors: Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Qingwei Lin, Jianguang Lou, Shifeng Chen, Yansong Tang, Weizhu Chen

    Abstract: Assessing the effectiveness of large language models (LLMs) presents substantial challenges. The method of conducting human-annotated battles in an online Chatbot Arena is a highly effective evaluative technique. However, this approach is limited by the costs and time required for human annotation. In this paper, we introduce Arena Learning, an innovative offline strategy designed to simulate thes… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  5. arXiv:2407.10181  [pdf, other

    cs.CV

    Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures

    Authors: Jiaqi He, Zhihua Wang, Leon Wang, Tsein-I Liu, Yuming Fang, Qilin Sun, Kede Ma

    Abstract: Contemporary color difference (CD) measures for photographic images typically operate by comparing co-located pixels, patches in a ``perceptually uniform'' color space, or features in a learned latent space. Consequently, these measures inadequately capture the human color perception of misaligned image pairs, which are prevalent in digital photography (e.g., the same scene captured by different s… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  6. arXiv:2407.09797  [pdf, other

    cs.CV

    ScaleRAFT: Cross-Scale Recurrent All-Pairs Field Transforms for 3D Motion Estimation

    Authors: Han Ling, Quansen Sun

    Abstract: In this paper, we study the problem of estimating the 3D motion of dense pixels from continuous image pairs. Most previous methods are based on mature optical flow baselines and depth values, projecting the 2D motion on pixel planes into 3D space, and further optimizing the results by combining depth-motion-branch and other sub-modules. This stacked framework cannot leverage the complementarity be… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  7. arXiv:2407.09298  [pdf, other

    cs.CL

    Transformer Layers as Painters

    Authors: Qi Sun, Marc Pickett, Aakash Kumar Nain, Llion Jones

    Abstract: Despite their nearly universal adoption for large language models, the internal workings of transformers are not well understood. We aim to better understand the impact of removing or reorganizing information throughout the layers of a pretrained transformer. Such an understanding could both yield better usage of existing models as well as to make architectural improvements to produce new variants… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 15 pages total, including references and appendices

  8. arXiv:2407.06653  [pdf, other

    cs.CV

    Toward Motion Robustness: A masked attention regularization framework in remote photoplethysmography

    Authors: Pengfei Zhao, Qigong Sun, Xiaolin Tian, Yige Yang, Shuo Tao, Jie Cheng, Jiantong Chen

    Abstract: There has been growing interest in facial video-based remote photoplethysmography (rPPG) measurement recently, with a focus on assessing various vital signs such as heart rate and heart rate variability. Despite previous efforts on static datasets, their approaches have been hindered by inaccurate region of interest (ROI) localization and motion issues, and have shown limited generalization in rea… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: CVPR workshop 2024 accepted

  9. arXiv:2407.06513  [pdf, other

    cs.CV

    Computer vision tasks for intelligent aerospace missions: An overview

    Authors: Huilin Chen, Qiyu Sun, Fangfei Li, Yang Tang

    Abstract: Computer vision tasks are crucial for aerospace missions as they help spacecraft to understand and interpret the space environment, such as estimating position and orientation, reconstructing 3D models, and recognizing objects, which have been extensively studied to successfully carry out the missions. However, traditional methods like Kalman Filtering, Structure from Motion, and Multi-View Stereo… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 23 pages, 7 figures, journal

  10. arXiv:2407.05552  [pdf, other

    cs.CV

    Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder

    Authors: Jia Liu, Changlin Li, Qirui Sun, Jiahui Ming, Chen Fang, Jue Wang, Bing Zeng, Shuaicheng Liu

    Abstract: Fine-tuning advanced diffusion models for high-quality image stylization usually requires large training datasets and substantial computational resources, hindering their practical applicability. We propose Ada-Adapter, a novel framework for few-shot style personalization of diffusion models. Ada-Adapter leverages off-the-shelf diffusion models and pre-trained image feature encoders to learn a com… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 16 pages, 11 figures

    MSC Class: 68T07 ACM Class: I.4.0

  11. arXiv:2407.05388  [pdf, other

    cs.CV

    Forest2Seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis

    Authors: Qi Sun, Hang Zhou, Wengang Zhou, Li Li, Houqiang Li

    Abstract: Synthesizing realistic 3D indoor scenes is a challenging task that traditionally relies on manual arrangement and annotation by expert designers. Recent advances in autoregressive models have automated this process, but they often lack semantic understanding of the relationships and hierarchies present in real-world scenes, yielding limited performance. In this paper, we propose Forest2Seq, a fram… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  12. arXiv:2407.04191  [pdf, other

    cs.CV cs.AI cs.GR

    GazeFusion: Saliency-guided Image Generation

    Authors: Yunxiang Zhang, Nan Wu, Connor Z. Lin, Gordon Wetzstein, Qi Sun

    Abstract: Diffusion models offer unprecedented image generation capabilities given just a text prompt. While emerging control mechanisms have enabled users to specify the desired spatial arrangements of the generated content, they cannot predict or control where viewers will pay more attention due to the complexity of human vision. Recognizing the critical necessity of attention-controllable image generatio… ▽ More

    Submitted 16 March, 2024; originally announced July 2024.

  13. arXiv:2407.03535  [pdf, other

    cs.CV

    BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

    Authors: Ruirui Lin, Nantheera Anantrasirichai, Guoxi Huang, Joanne Lin, Qi Sun, Alexandra Malyugina, David R Bull

    Abstract: Low-light videos often exhibit spatiotemporal incoherent noise, compromising visibility and performance in computer vision applications. One significant challenge in enhancing such content using deep learning is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions, inco… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.01970

  14. arXiv:2407.01866  [pdf, other

    cs.CV cs.GR

    Image-GS: Content-Adaptive Image Representation via 2D Gaussians

    Authors: Yunxiang Zhang, Alexandr Kuznetsov, Akshay Jindal, Kenneth Chen, Anton Sochenov, Anton Kaplanyan, Qi Sun

    Abstract: Neural image representations have recently emerged as a promising technique for storing, streaming, and rendering visual data. Coupled with learning-based workflows, these novel representations have demonstrated remarkable visual fidelity and memory efficiency. However, existing neural image representations often rely on explicit uniform data structures without content adaptivity or computation-in… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  15. arXiv:2407.01601  [pdf, other

    cs.LG cs.AI

    Unveiling and Controlling Anomalous Attention Distribution in Transformers

    Authors: Ruiqing Yan, Xingbo Du, Haoyu Deng, Linghan Zheng, Qiuzhuang Sun, Jifang Hu, Yuhang Shao, Penghao Jiang, Jinrong Jiang, Lian Zhao

    Abstract: With the advent of large models based on the Transformer architecture, researchers have observed an anomalous phenomenon in the Attention mechanism--there is a very high attention on the first element, which is prevalent across Transformer-based models. It is crucial to understand it for the development of techniques focusing on attention distribution, such as Key-Value (KV) Cache compression and… ▽ More

    Submitted 3 July, 2024; v1 submitted 26 June, 2024; originally announced July 2024.

  16. arXiv:2407.00615  [pdf, other

    cs.LG

    GC-Bench: An Open and Unified Benchmark for Graph Condensation

    Authors: Qingyun Sun, Ziying Chen, Beining Yang, Cheng Ji, Xingcheng Fu, Sheng Zhou, Hao Peng, Jianxin Li, Philip S. Yu

    Abstract: Graph condensation (GC) has recently garnered considerable attention due to its ability to reduce large-scale graph datasets while preserving their essential properties. The core concept of GC is to create a smaller, more manageable graph that retains the characteristics of the original graph. Despite the proliferation of graph condensation methods developed in recent years, there is no comprehens… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Preprint, under review)

  17. arXiv:2407.00497  [pdf, other

    cs.CL

    LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

    Authors: Jiahao Ying, Mingbao Lin, Yixin Cao, Wei Tang, Bo Wang, Qianru Sun, Xuanjing Huang, Shuicheng Yan

    Abstract: This paper introduces the innovative "LLMs-as-Instructors" framework, which leverages the advanced Large Language Models (LLMs) to autonomously enhance the training of smaller target models. Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model, facilitating targeted and efficient training cycles… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  18. arXiv:2406.19644  [pdf, other

    cs.AI

    Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs

    Authors: Zichao Shen, Tianchen Zhu, Qingyun Sun, Shiqi Gao, Jianxin Li

    Abstract: Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks due to the difficulty in designing comprehensive and precise reward functions. This inherent difficulty curtails the broader application of RL within game environments characterized by diverse constraints. Preference-based reinforcement learning (PbRL) presents a pioneering framework that cap… ▽ More

    Submitted 30 June, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: accepted by IJCAI 2024 GAAMAL

  19. arXiv:2406.18556  [pdf

    eess.IV cs.CV cs.LG

    Renal digital pathology visual knowledge search platform based on language large model and book knowledge

    Authors: Xiaomin Lv, Chong Lai, Liya Ding, Maode Lai, Qingrong Sun

    Abstract: Large models have become mainstream, yet their applications in digital pathology still require exploration. Meanwhile renal pathology images play an important role in the diagnosis of renal diseases. We conducted image segmentation and paired corresponding text descriptions based on 60 books for renal pathology, clustering analysis for all image and text description features based on large models,… ▽ More

    Submitted 26 May, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures

  20. arXiv:2406.17156  [pdf, other

    cs.GR cs.HC

    Toward Ubiquitous 3D Object Digitization: A Wearable Computing Framework for Non-Invasive Physical Property Acquisition

    Authors: Yunxiang Zhang, Xin Sun, Dengfeng Li, Xinge Yu, Qi Sun

    Abstract: Accurately digitizing physical objects is central to many applications, including virtual/augmented reality, industrial design, and e-commerce. Prior research has demonstrated efficient and faithful reconstruction of objects' geometric shapes and visual appearances, which suffice for digitally representing rigid objects. In comparison, physical properties, such as elasticity and pressure, are also… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 10 pages, 6 figures

  21. arXiv:2406.11900  [pdf, other

    q-bio.QM cs.AI cs.LG

    Horizon-wise Learning Paradigm Promotes Gene Splicing Identification

    Authors: Qi-Jie Li, Qian Sun, Shao-Qun Zhang

    Abstract: Identifying gene splicing is a core and significant task confronted in modern collaboration between artificial intelligence and bioinformatics. Past decades have witnessed great efforts on this concern, such as the bio-plausible splicing pattern AT-CG and the famous SpliceAI. In this paper, we propose a novel framework for the task of gene splicing identification, named Horizon-wise Gene Splicing… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  22. arXiv:2406.11736  [pdf, other

    cs.CL cs.AI

    Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models

    Authors: Fangzhi Xu, Qiushi Sun, Kanzhi Cheng, Jun Liu, Yu Qiao, Zhiyong Wu

    Abstract: One of the primary driving forces contributing to the superior performance of Large Language Models (LLMs) is the extensive availability of human-annotated natural language data, which is used for alignment fine-tuning. This inspired researchers to investigate self-training methods to mitigate the extensive reliance on human annotations. However, the current success of self-training has been prima… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 18 pages, 6 figures

  23. arXiv:2406.11235  [pdf, other

    cs.LG

    QTIP: Quantization with Trellises and Incoherence Processing

    Authors: Albert Tseng, Qingyao Sun, David Hou, Christopher De Sa

    Abstract: Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing weights to low-precision datatypes. Since LLM inference is usually memory-bound, PTQ methods can improve inference throughput. Recent state-of-the-art PTQ approaches have converged on using vector quantization (VQ) to quantize multiple weights at once, which improves information utilization through better shaping.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  24. arXiv:2406.11234  [pdf, other

    cs.CL cs.AI

    MiniConGTS: A Near Ultimate Minimalist Contrastive Grid Tagging Scheme for Aspect Sentiment Triplet Extraction

    Authors: Qiao Sun, Liujia Yang, Minghao Ma, Nanyang Ye, Qinying Gu

    Abstract: Aspect Sentiment Triplet Extraction (ASTE) aims to co-extract the sentiment triplets in a given corpus. Existing approaches within the pretraining-finetuning paradigm tend to either meticulously craft complex tagging schemes and classification heads, or incorporate external semantic augmentation to enhance performance. In this study, we, for the first time, re-evaluate the redundancy in tagging sc… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.07342

  25. arXiv:2406.10527  [pdf, other

    cs.CV

    Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center

    Authors: Zichen Yu, Changyong Shu, Qianpu Sun, Junjie Linghu, Xiaobao Wei, Jiangyong Yu, Zongdai Liu, Dawei Yang, Hui Li, Yan Chen

    Abstract: Panoptic occupancy poses a novel challenge by aiming to integrate instance occupancy and semantic occupancy within a unified framework. However, there is still a lack of efficient solutions for panoptic occupancy. In this paper, we propose Panoptic-FlashOcc, a straightforward yet robust 2D feature framework that enables realtime panoptic occupancy. Building upon the lightweight design of FlashOcc,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  26. arXiv:2406.09870  [pdf, other

    cs.LG cs.AI

    IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

    Authors: Jiawen Qin, Haonan Yuan, Qingyun Sun, Lyujin Xu, Jiaqi Yuan, Pengfeng Huang, Zhaonan Wang, Xingcheng Fu, Hao Peng, Jianxin Li, Philip S. Yu

    Abstract: Deep graph learning has gained grand popularity over the past years due to its versatility and success in representing graph data across a wide range of domains. However, the pervasive issue of imbalanced graph data distributions, where certain parts exhibit disproportionally abundant data while others remain sparse, undermines the efficacy of conventional graph learning algorithms, leading to bia… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Preprint, under review)

  27. arXiv:2406.08897  [pdf, other

    cs.LG

    Motif-driven Subgraph Structure Learning for Graph Classification

    Authors: Zhiyao Zhou, Sheng Zhou, Bochao Mao, Jiawei Chen, Qingyun Sun, Yan Feng, Chun Chen, Can Wang

    Abstract: To mitigate the suboptimal nature of graph structure, Graph Structure Learning (GSL) has emerged as a promising approach to improve graph structure and boost performance in downstream tasks. Despite the proposal of numerous GSL methods, the progresses in this field mostly concentrated on node-level tasks, while graph-level tasks (e.g., graph classification) remain largely unexplored. Notably, appl… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures

  28. arXiv:2406.08467  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    DafnyBench: A Benchmark for Formal Software Verification

    Authors: Chloe Loughridge, Qinyi Sun, Seth Ahrenbach, Federico Cassano, Chuyue Sun, Ying Sheng, Anish Mudide, Md Rakib Hossain Misu, Nada Amin, Max Tegmark

    Abstract: We introduce DafnyBench, the largest benchmark of its kind for training and evaluating machine learning systems for formal software verification. We test the ability of LLMs such as GPT-4 and Claude 3 to auto-generate enough hints for the Dafny formal verification engine to successfully verify over 750 programs with about 53,000 lines of code. The best model and prompting scheme achieved 68% succe… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Code & dataset available at: https://github.com/sun-wendy/DafnyBench

  29. arXiv:2406.07413  [pdf, other

    cs.LG

    Holistic Memory Diversification for Incremental Learning in Growing Graphs

    Authors: Ziyue Qiao, Junren Xiao, Qingqiang Sun, Meng Xiao, Hui Xiong

    Abstract: This paper addresses the challenge of incremental learning in growing graphs with increasingly complex tasks. The goal is to continually train a graph model to handle new tasks while retaining its inference ability on previous tasks. Existing methods usually neglect the importance of memory diversity, limiting in effectively selecting high-quality memory from previous tasks and remembering broad p… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  30. arXiv:2406.02962  [pdf, other

    cs.CL cs.AI cs.IR

    Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models

    Authors: Qiang Sun, Yuanyi Luo, Wenxiao Zhang, Sirui Li, Jichunyang Li, Kai Niu, Xiangrui Kong, Wei Liu

    Abstract: Even for a conservative estimate, 80% of enterprise data reside in unstructured files, stored in data lakes that accommodate heterogeneous formats. Classical search engines can no longer meet information seeking needs, especially when the task is to browse and explore for insight formulation. In other words, there are no obvious search keywords to use. Knowledge graphs, due to their natural visual… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  31. arXiv:2406.00672  [pdf, other

    cs.CV

    Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

    Authors: Xuenian Wang, Shanshan Shi, Renao Yan, Qiehe Sun, Lianghui Zhu, Tian Guan, Yonghong He

    Abstract: In the field of whole slide image (WSI) classification, multiple instance learning (MIL) serves as a promising approach, commonly decoupled into feature extraction and aggregation. In this paradigm, our observation reveals that discriminative embeddings are crucial for aggregation to the final prediction. Among all feature updating strategies, task-oriented ones can capture characteristics specifi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  32. arXiv:2405.20836  [pdf, other

    math.NA cs.LG

    Solving partial differential equations with sampled neural networks

    Authors: Chinmay Datar, Taniya Kapoor, Abhishek Chandra, Qing Sun, Iryna Burak, Erik Lien Bolager, Anna Veselovska, Massimo Fornasier, Felix Dietrich

    Abstract: Approximation of solutions to partial differential equations (PDE) is an important problem in computational science and engineering. Using neural networks as an ansatz for the solution has proven a challenge in terms of training time and approximation accuracy. In this contribution, we discuss how sampling the hidden weights and biases of the ansatz network from data-agnostic and data-dependent pr… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 16 pages, 15 figures

  33. arXiv:2405.18132  [pdf, other

    cs.CV

    EG4D: Explicit Generation of 4D Object without Score Distillation

    Authors: Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang Li

    Abstract: In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  34. arXiv:2405.16265  [pdf, other

    cs.LG

    MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

    Authors: Jikun Kang, Xin Zhe Li, Xi Chen, Amirreza Kazemi, Qianyi Sun, Boxing Chen, Dong Li, Xu He, Quan He, Feng Wen, Jianye Hao, Jun Yao

    Abstract: Although Large Language Models (LLMs) achieve remarkable performance across various tasks, they often struggle with complex reasoning tasks, such as answering mathematical questions. Recent efforts to address this issue have primarily focused on leveraging mathematical datasets through supervised fine-tuning or self-improvement techniques. However, these methods often depend on high-quality datase… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  35. arXiv:2405.12939  [pdf, other

    cs.CL

    Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

    Authors: Zhangyue Yin, Qiushi Sun, Qipeng Guo, Zhiyuan Zeng, Xiaonan Li, Tianxiang Sun, Cheng Chang, Qinyuan Cheng, Ding Wang, Xiaofeng Mou, Xipeng Qiu, XuanJing Huang

    Abstract: Recent advancements in Chain-of-Thought prompting have facilitated significant breakthroughs for Large Language Models (LLMs) in complex reasoning tasks. Current research enhances the reasoning performance of LLMs by sampling multiple reasoning chains and ensembling based on the answer frequency. However, this approach fails in scenarios where the correct answers are in the minority. We identify t… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 17 pages, 14 figures, accepted by LREC-COLING 2024

  36. arXiv:2405.07229  [pdf, other

    cs.MM

    MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning Tasks

    Authors: Xiaocui Yang, Wenfang Wu, Shi Feng, Ming Wang, Daling Wang, Yang Li, Qi Sun, Yifei Zhang, Xiaoming Fu, Soujanya Poria

    Abstract: The rising popularity of multimodal large language models (MLLMs) has sparked a significant increase in research dedicated to evaluating these models. However, current evaluation studies predominantly concentrate on the ability of models to comprehend and reason within a unimodal (vision-only) context, overlooking critical performance evaluations in complex multimodal reasoning tasks that integrat… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Under review, the new version of MM-BigBench: arXiv:2310.09036

  37. arXiv:2405.03188  [pdf, other

    cs.LG

    Hyperbolic Geometric Latent Diffusion Model for Graph Generation

    Authors: Xingcheng Fu, Yisen Gao, Yuecen Wei, Qingyun Sun, Hao Peng, Jianxin Li, Xianxian Li

    Abstract: Diffusion models have made significant contributions to computer vision, sparking a growing interest in the community recently regarding the application of them to graph generation. Existing discrete graph diffusion models exhibit heightened computational complexity and diminished training efficiency. A preferable and natural way is to directly diffuse the graph within the latent space. However, d… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted by the 41st International Conference on Machine Learning (ICML 2024)

  38. arXiv:2405.02023  [pdf, other

    cs.CV

    IFNet: Deep Imaging and Focusing for Handheld SAR with Millimeter-wave Signals

    Authors: Yadong Li, Dongheng Zhang, Ruixu Geng, Jincheng Wu, Yang Hu, Qibin Sun, Yan Chen

    Abstract: Recent advancements have showcased the potential of handheld millimeter-wave (mmWave) imaging, which applies synthetic aperture radar (SAR) principles in portable settings. However, existing studies addressing handheld motion errors either rely on costly tracking devices or employ simplified imaging models, leading to impractical deployment or limited performance. In this paper, we present IFNet,… ▽ More

    Submitted 5 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  39. arXiv:2405.01558  [pdf, other

    cs.CV cs.GR cs.LG eess.IV physics.optics

    Configurable Learned Holography

    Authors: Yicheng Zhan, Liang Shi, Wojciech Matusik, Qi Sun, Kaan Akşit

    Abstract: In the pursuit of advancing holographic display technology, we face a unique yet persistent roadblock: the inflexibility of learned holography in adapting to various hardware configurations. This is due to the variances in the complex optical components and system settings in existing holographic displays. Although the emerging learned approaches have enabled rapid and high-quality hologram genera… ▽ More

    Submitted 6 May, 2024; v1 submitted 24 March, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures

  40. arXiv:2404.17092  [pdf, other

    cs.CV

    Defending Spiking Neural Networks against Adversarial Attacks through Image Purification

    Authors: Weiran Chen, Qi Sun, Qi Xu

    Abstract: Spiking Neural Networks (SNNs) aim to bridge the gap between neuroscience and machine learning by emulating the structure of the human nervous system. However, like convolutional neural networks, SNNs are vulnerable to adversarial attacks. To tackle the challenge, we propose a biologically inspired methodology to enhance the robustness of SNNs, drawing insights from the visual masking effect and f… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 8 pages, 5 figures, ECAI2024 under review

  41. arXiv:2404.13391  [pdf, other

    eess.SY cs.LG math.OC

    Online Planning of Power Flows for Power Systems Against Bushfires Using Spatial Context

    Authors: Jianyu Xu, Qiuzhuang Sun, Yang Yang, Huadong Mo, Daoyi Dong

    Abstract: The 2019-20 Australia bushfire incurred numerous economic losses and significantly affected the operations of power systems. A power station or transmission line can be significantly affected due to bushfires, leading to an increase in operational costs. We study a fundamental but challenging problem of planning the optimal power flow (OPF) for power systems subject to bushfires. Considering the s… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  42. arXiv:2404.12587  [pdf, other

    cs.AI

    Reinforcement Learning Approach for Integrating Compressed Contexts into Knowledge Graphs

    Authors: Ngoc Quach, Qi Wang, Zijun Gao, Qifeng Sun, Bo Guan, Lillian Floyd

    Abstract: The widespread use of knowledge graphs in various fields has brought about a challenge in effectively integrating and updating information within them. When it comes to incorporating contexts, conventional methods often rely on rules or basic machine learning models, which may not fully grasp the complexity and fluidity of context information. This research suggests an approach based on reinforcem… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by the 2024 International Conference on Machine Learning and Neural Networks (MLNN 2024)

  43. arXiv:2404.09633  [pdf, other

    cs.CV

    In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation

    Authors: Han Xue, Qianru Sun, Li Song, Wenjun Zhang, Zhiwu Huang

    Abstract: We propose In-Context Translation (ICT), a general learning framework to unify visual recognition (e.g., semantic segmentation), low-level image processing (e.g., denoising), and conditional image generation (e.g., edge-to-image synthesis). Thanks to unification, ICT significantly reduces the inherent inductive bias that comes with designing models for specific tasks, and it maximizes mutual enhan… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  44. arXiv:2404.08915  [pdf, other

    cs.CV cs.LG

    PM2: A New Prompting Multi-modal Model Paradigm for Few-shot Medical Image Classification

    Authors: Zhenwei Wang, Qiule Sun, Bingbing Zhang, Pengfei Wang, Jianxin Zhang, Qiang Zhang

    Abstract: Few-shot learning has been successfully applied to medical image classification as only very few medical examples are available for training. Due to the challenging problem of limited number of annotated medical images, image representations should not be solely derived from a single image modality which is insufficient for characterizing concept classes. In this paper, we propose a new prompting… ▽ More

    Submitted 25 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  45. arXiv:2404.07471  [pdf, other

    cs.SE cs.AI cs.CL

    Structure-aware Fine-tuning for Code Pre-trained Models

    Authors: Jiayi Wu, Renyu Zhu, Nuo Chen, Qiushi Sun, Xiang Li, Ming Gao

    Abstract: Over the past few years, we have witnessed remarkable advancements in Code Pre-trained Models (CodePTMs). These models achieved excellent representation capabilities by designing structure-based pre-training tasks for code. However, how to enhance the absorption of structural knowledge when fine-tuning CodePTMs still remains a significant challenge. To fill this gap, in this paper, we present Stru… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by COLING 2024

  46. arXiv:2404.00399  [pdf, other

    cs.CL cs.AI cs.LG

    Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

    Authors: Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak , et al. (20 additional authors not shown)

    Abstract: Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, where… ▽ More

    Submitted 23 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Preprint

  47. arXiv:2403.17934  [pdf, other

    cs.CV

    AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

    Authors: Qingping Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi Sing Leung, Ziwei Liu, Lei Yang, Zhongang Cai

    Abstract: Expressive human pose and shape estimation (a.k.a. 3D whole-body mesh recovery) involves the human body, hand, and expression estimation. Most existing methods have tackled this task in a two-stage manner, first detecting the human body part with an off-the-shelf detection model and inferring the different human body parts individually. Despite the impressive results achieved, these methods suffer… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Homepage: https://ttxskk.github.io/AiOS/

  48. arXiv:2403.17870  [pdf, other

    cs.CV cs.MM

    Boosting Diffusion Models with Moving Average Sampling in Frequency Domain

    Authors: Yurui Qian, Qi Cai, Yingwei Pan, Yehao Li, Ting Yao, Qibin Sun, Tao Mei

    Abstract: Diffusion models have recently brought a powerful revolution in image generation. Despite showing impressive generative capabilities, most of these models rely on the current sample to denoise the next one, possibly resulting in denoising instability. In this paper, we reinterpret the iterative denoising process as model optimization and leverage a moving average mechanism to ensemble all the prio… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  49. arXiv:2403.14734  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

    Authors: Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, Xiaoli Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu

    Abstract: Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronol… ▽ More

    Submitted 23 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: 64 pages, 6 figures, 10 tables, 692 references

  50. arXiv:2403.14727  [pdf, other

    cs.CY cs.CL cs.LG

    Protected group bias and stereotypes in Large Language Models

    Authors: Hadas Kotek, David Q. Sun, Zidi Xiu, Margit Bowler, Christopher Klein

    Abstract: As modern Large Language Models (LLMs) shatter many state-of-the-art benchmarks in a variety of domains, this paper investigates their behavior in the domains of ethics and fairness, focusing on protected group bias. We conduct a two-part study: first, we solicit sentence continuations describing the occupations of individuals from different protected groups, including gender, sexuality, religion,… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.