Skip to main content

Showing 1–50 of 144 results for author: Jiang, Q

  1. arXiv:2407.04587  [pdf, other

    cs.LG cs.CV

    Multimodal Classification via Modal-Aware Interactive Enhancement

    Authors: Qing-Yuan Jiang, Zhouyang Chi, Yang Yang

    Abstract: Due to the notorious modality imbalance problem, multimodal learning (MML) leads to the phenomenon of optimization imbalance, thus struggling to achieve satisfactory performance. Recently, some representative methods have been proposed to boost the performance, mainly focusing on adaptive adjusting the optimization of each modality to rebalance the learning speed of dominant and non-dominant modal… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2407.04404  [pdf

    cs.AR

    Fixed and Movable Antenna Technology for 6G Integrated Sensing and Communication

    Authors: Yong Zeng, Zhenjun Dong, Huizhi Wang, Lipeng Zhu, Ziyao Hong, Qingji Jiang, Dongming Wang, Shi Jin, Rui Zhang

    Abstract: By deploying antenna arrays at the transmitter/receiver to provide additional spatial-domain degrees of freedom (DoFs), multi-antenna technology greatly improves the reliability and efficiency of wireless communication. Meanwhile, the application of multi-antenna technology in the radar field has achieved spatial angle resolution and improved sensing DoF, thus significantly enhancing wireless sens… ▽ More

    Submitted 16 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: in Chinese language

  3. arXiv:2406.16641  [pdf, other

    cs.CV cs.AI

    Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment

    Authors: Jun Fu, Wei Zhou, Qiuping Jiang, Hantao Liu, Guangtao Zhai

    Abstract: Recently, textual prompt tuning has shown inspirational performance in adapting Contrastive Language-Image Pre-training (CLIP) models to natural image quality assessment. However, such uni-modal prompt learning method only tunes the language branch of CLIP models. This is not enough for adapting CLIP models to AI generated image quality assessment (AGIQA) since AGIs visually differ from natural im… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Signal Processing Letter

  4. arXiv:2406.13984  [pdf, other

    cs.DC cs.LG

    Reducing Memory Contention and I/O Congestion for Disk-based GNN Training

    Authors: Qisheng Jiang, Lei Jia, Chundong Wang

    Abstract: Graph neural networks (GNNs) gain wide popularity. Large graphs with high-dimensional features become common and training GNNs on them is non-trivial on an ordinary machine. Given a gigantic graph, even sample-based GNN training cannot work efficiently, since it is difficult to keep the graph's entire data in memory during the training process. Leveraging a solid-state drive (SSD) or other storage… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: This is a full version for the paper with almost the same title accepted by the 53rd International Conference on Parallel Processing (ICPP 2024)

  5. arXiv:2406.13951  [pdf, other

    cs.CV

    Towards the in-situ Trunk Identification and Length Measurement of Sea Cucumbers via Bézier Curve Modelling

    Authors: Shuaixin Liu, Kunqian Li, Yilin Ding, Kuangwei Xu, Qianli Jiang, Q. M. Jonathan Wu, Dalei Song

    Abstract: We introduce a novel vision-based framework for in-situ trunk identification and length measurement of sea cucumbers, which plays a crucial role in the monitoring of marine ranching resources and mechanized harvesting. To model sea cucumber trunk curves with varying degrees of bending, we utilize the parametric Bézier curve due to its computational simplicity, stability, and extensive range of tra… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.12119  [pdf

    cs.LG cs.AI cs.SI

    Deploying scalable traffic prediction models for efficient management in real-world large transportation networks during hurricane evacuations

    Authors: Qinhua Jiang, Brian Yueshuai He, Changju Lee, Jiaqi Ma

    Abstract: Accurate traffic prediction is vital for effective traffic management during hurricane evacuation. This paper proposes a predictive modeling system that integrates Multilayer Perceptron (MLP) and Long-Short Term Memory (LSTM) models to capture both long-term congestion patterns and short-term speed patterns. Leveraging various input variables, including archived traffic data, spatial-temporal road… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE ITS Magazine and currently under review

  7. arXiv:2406.12019  [pdf

    eess.SY cs.CR cs.ET eess.SP

    Hacking Encrypted Wireless Power: Cyber-Security of Dynamic Charging

    Authors: Hui Wang, Nima Tashakor, Wei Jiang, Wei Liu, C. Q. Jiang, Stefan M. Goetz

    Abstract: Recently, energy encryption for wireless power transfer has been developed for energy safety, which is important in public places to suppress unauthorized energy extraction. Most techniques vary the frequency so that unauthorized receivers cannot extract energy because of non-resonance. However, this strategy is unreliable. To stimulate the progress of energy encryption technology and point out se… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages, 17 figures

  8. arXiv:2406.10765  [pdf, other

    cs.DC

    PWDFT-SW: Extending the Limit of Plane-Wave DFT Calculations to 16K Atoms on the New Sunway Supercomputer

    Authors: Qingcai Jiang, Zhenwei Cao, Junshi Chen, Xinming Qin, Wei Hu, Hong An, Jinlong Yang

    Abstract: First-principles density functional theory (DFT) with plane wave (PW) basis set is the most widely used method in quantum mechanical material simulations due to its advantages in accuracy and universality. However, a perceived drawback of PW-based DFT calculations is their substantial computational cost and memory usage, which currently limits their ability to simulate large-scale complex systems… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  9. arXiv:2406.05413  [pdf, other

    cs.LG cs.AI cs.CV cs.MM

    Discover Your Neighbors: Advanced Stable Test-Time Adaptation in Dynamic World

    Authors: Qinting Jiang, Chuyang Ye, Dongyan Wei, Yuan Xue, Jingyan Jiang, Zhi Wang

    Abstract: Despite progress, deep neural networks still suffer performance declines under distribution shifts between training and test domains, leading to a substantial decrease in Quality of Experience (QoE) for multimedia applications. Existing test-time adaptation (TTA) methods are challenged by dynamic, multiple test distributions within batches. This work provides a new perspective on analyzing batch n… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages

  10. arXiv:2406.04165  [pdf, other

    cs.LG

    Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe

    Authors: Alicja Ziarko, Albert Q. Jiang, Bartosz Piotrowski, Wenda Li, Mateja Jamnik, Piotr Miłoś

    Abstract: Text embeddings are essential for many tasks, such as document retrieval, clustering, and semantic similarity assessment. In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion, given a suite of pre-trained decoder-only language models. Our innovation is an algorithm that produces optimal configurations of model sizes, data quantities, and fine-tuning… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  11. arXiv:2405.21045  [pdf

    cs.LG

    An Attention-Based Multi-Context Convolutional Encoder-Decoder Neural Network for Work Zone Traffic Impact Prediction

    Authors: Qinhua Jiang, Xishun Liao, Yaofa Gong, Jiaqi Ma

    Abstract: Work zone is one of the major causes of non-recurrent traffic congestion and road incidents. Despite the significance of its impact, studies on predicting the traffic impact of work zones remain scarce. In this paper, we propose a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms, and introduce a novel deep learning model to predict th… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  12. arXiv:2405.17468  [pdf, other

    cs.LG cs.AI

    Deep Activity Model: A Generative Approach for Human Mobility Pattern Synthesis

    Authors: Xishun Liao, Brian Yueshuai He, Qinhua Jiang, Chenchen Kuai, Jiaqi Ma

    Abstract: Human mobility significantly impacts various aspects of society, including transportation, urban planning, and public health. The increasing availability of diverse mobility data and advancements in deep learning have revolutionized mobility modeling. Existing deep learning models, however, mainly study spatio-temporal patterns using trajectories and often fall short in capturing the underlying se… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  13. arXiv:2405.13731  [pdf, other

    stat.ML cs.LG stat.CO

    Control, Transport and Sampling: Towards Better Loss Design

    Authors: Qijia Jiang, David Nabergoj

    Abstract: Leveraging connections between diffusion-based sampling, optimal transport, and optimal stochastic control through their shared links to the Schrödinger bridge problem, we propose novel objective functions that can be used to transport $ν$ to $μ$, consequently sample from the target $μ$, via optimally controlled dynamics. We highlight the importance of the pathwise perspective and the role various… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  14. arXiv:2405.10300  [pdf, other

    cs.CV

    Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

    Authors: Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

    Abstract: This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" of open-set object detection. The suite encompasses two models: Grounding DINO 1.5 Pro, a high-performance model designed for stronger generalization capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an efficient model o… ▽ More

    Submitted 31 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: homepage: https://deepdataspace.com/home

  15. arXiv:2405.06926  [pdf, other

    cs.CV

    TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt

    Authors: Xiangyu Wu, Qing-Yuan Jiang, Yang Yang, Yi-Feng Wu, Qing-Guo Chen, Jianfeng Lu

    Abstract: The recent introduction of prompt tuning based on pre-trained vision-language models has dramatically improved the performance of multi-label image classification. However, some existing strategies that have been explored still have drawbacks, i.e., either exploiting massive labeled visual data at a high cost or using text data only for text prompt tuning and thus failing to learn the diversity of… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at IJCAI 2024; 13 pages; 11 figures

  16. arXiv:2405.02942  [pdf, other

    physics.optics cs.CV cs.RO eess.IV

    Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

    Authors: Shaohua Gao, Qi Jiang, Yiqi Liao, Yi Qiu, Wanglei Ying, Kailun Yang, Kaiwei Wang, Benhao Zhang, Jian Bai

    Abstract: We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360°x(35°~110°) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 len… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to Optics & Laser Technology

  17. arXiv:2404.19201  [pdf, other

    eess.IV cs.CV cs.RO physics.optics

    Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems

    Authors: Yao Gao, Qi Jiang, Shaohua Gao, Lei Sun, Kailun Yang, Kaiwei Wang

    Abstract: The popularity of mobile vision creates a demand for advanced compact computational imaging systems, which call for the development of both a lightweight optical system and an effective image reconstruction model. Recently, joint design pipelines come to the research forefront, where the two significant components are simultaneously optimized via data-driven learning to realize the optimal system… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: The source code will be made publicly available at https://github.com/wumengshenyou/GSO

  18. arXiv:2404.08347  [pdf, other

    cs.CV cs.LG

    Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks

    Authors: Yang Yang, Hongpeng Pan, Qing-Yuan Jiang, Yi Xu, Jinghui Tang

    Abstract: Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the "modality imbalance" problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approa… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 17 pages;6 figures

  19. arXiv:2403.17122  [pdf, other

    cs.IT eess.SP

    6D Movable Antenna Enhanced Wireless Network Via Discrete Position and Rotation Optimization

    Authors: Xiaodan Shao, Rui Zhang, Qijun Jiang, Robert Schober

    Abstract: Six-dimensional movable antenna (6DMA) is an effective approach to improve wireless network capacity by adjusting the 3D positions and 3D rotations of distributed antenna surfaces based on the users' spatial distribution and statistical channel information. Although continuously positioning/rotating 6DMA surfaces can achieve the greatest flexibility and thus the highest capacity improvement, it is… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 13 pages, double column

  20. arXiv:2403.14610  [pdf, other

    cs.CV

    T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

    Authors: Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei Zhang

    Abstract: We present T-Rex2, a highly practical model for open-set object detection. Previous open-set object detection methods relying on text prompts effectively encapsulate the abstract concept of common objects, but struggle with rare or complex object representation due to data scarcity and descriptive limitations. Conversely, visual prompts excel in depicting novel objects through concrete visual exam… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Technical Report

  21. arXiv:2403.13294  [pdf, other

    cs.RO

    Map-Aware Human Pose Prediction for Robot Follow-Ahead

    Authors: Qingyuan Jiang, Burak Susam, Jun-Jee Chao, Volkan Isler

    Abstract: In the robot follow-ahead task, a mobile robot is tasked to maintain its relative position in front of a moving human actor while keeping the actor in sight. To accomplish this task, it is important that the robot understand the full 3D pose of the human (since the head orientation can be different than the torso) and predict future human poses so as to plan accordingly. This prediction task is es… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  22. arXiv:2403.10012  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    Real-World Computational Aberration Correction via Quantized Domain-Mixing Representation

    Authors: Qi Jiang, Zhonghua Yi, Shaohua Gao, Yao Gao, Xiaolong Qian, Hao Shi, Lei Sun, Zhijie Xu, Kailun Yang, Kaiwei Wang

    Abstract: Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervi… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Codes and datasets will be made publicly available at https://github.com/zju-jiangqi/QDMR

  23. arXiv:2403.08123  [pdf, other

    cs.IT eess.SP

    6D Movable Antenna Based on User Distribution: Modeling and Optimization

    Authors: Xiaodan Shao, Qijun Jiang, Rui Zhang

    Abstract: In this paper, we propose a new six-dimensional (6D) movable antenna (6DMA) system for future wireless networks to improve the communication performance. Unlike the traditional fixed-position antenna (FPA) and existing fluid antenna/two-dimensional (2D) movable antenna (FA/2DMA) systems that adjust the positions of antennas only, the proposed 6DMA system consists of distributed antenna surfaces wi… ▽ More

    Submitted 31 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Double column, 14 pages

  24. arXiv:2403.06120  [pdf, other

    cs.AR cs.ET cs.OS

    I/O Transit Caching for PMem-based Block Device

    Authors: Qing Xu, Qisheng Jiang, Chundong Wang

    Abstract: Byte-addressable non-volatile memory (NVM) sitting on the memory bus is employed to make persistent memory (PMem) in general-purpose computing systems and embedded systems for data storage. Researchers develop software drivers such as the block translation table (BTT) to build block devices on PMem, so programmers can keep using mature and reliable conventional storage stack while expecting high p… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by the Journal of Systems Architecture: Embedded Software Design (JSA)

  25. arXiv:2403.04256  [pdf, other

    cs.IR cs.AI

    Federated Recommendation via Hybrid Retrieval Augmented Generation

    Authors: Huimin Zeng, Zhenrui Yue, Qian Jiang, Dong Wang

    Abstract: Federated Recommendation (FR) emerges as a novel paradigm that enables privacy-preserving recommendations. However, traditional FR systems usually represent users/items with discrete identities (IDs), suffering from performance degradation due to the data sparsity and heterogeneity in FR. On the other hand, Large Language Models (LLMs) as recommenders have proven effective across various recommend… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  26. arXiv:2403.00303  [pdf, other

    cs.CV

    ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

    Authors: Chen Duan, Pei Fu, Shan Guo, Qianyi Jiang, Xiaoming Wei

    Abstract: In recent years, text-image joint pre-training techniques have shown promising results in various tasks. However, in Optical Character Recognition (OCR) tasks, aligning text instances with their corresponding text regions in images poses a challenge, as it requires effective alignment between text and OCR-Text (referring to the text in images as OCR-Text to distinguish from the text in natural lan… ▽ More

    Submitted 17 April, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  27. arXiv:2402.18592  [pdf, other

    cs.AR cs.PF

    A$^3$PIM: An Automated, Analytic and Accurate Processing-in-Memory Offloader

    Authors: Qingcai Jiang, Shaojie Tan, Junshi Chen, Hong An

    Abstract: The performance gap between memory and processor has grown rapidly. Consequently, the energy and wall-clock time costs associated with moving data between the CPU and main memory predominate the overall computational cost. The Processing-in-Memory (PIM) paradigm emerges as a promising architecture that mitigates the need for extensive data movements by strategically positioning computing units pro… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 6 pages, 4 figures, accepted for presentation at Design, Automation and Test in Europe Conference | The European Event for Electronic System Design & Test (DATE 2024), conference to be held in March 2024

  28. arXiv:2402.17168  [pdf, other

    cs.AI cs.CL

    Benchmarking Data Science Agents

    Authors: Yuge Zhang, Qiyang Jiang, Xingyu Han, Nan Chen, Yuqing Yang, Kan Ren

    Abstract: In the era of data-driven decision-making, the complexity of data analysis necessitates advanced expertise and tools of data science, presenting significant challenges even for specialists. Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing. Yet their practical efficacy remains constrained by the varied demands of re… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Source code and data are available at https://github.com/MetaCopilot/dseval

  29. arXiv:2402.09433  [pdf, other

    eess.SP cs.AI cs.LG eess.SY

    Electrical Behavior Association Mining for Household ShortTerm Energy Consumption Forecasting

    Authors: Heyang Yu, Yuxi Sun, Yintao Liu, Guangchao Geng, Quanyuan Jiang

    Abstract: Accurate household short-term energy consumption forecasting (STECF) is crucial for home energy management, but it is technically challenging, due to highly random behaviors of individual residential users. To improve the accuracy of STECF on a day-ahead scale, this paper proposes an novel STECF methodology that leverages association mining in electrical behaviors. First, a probabilistic associati… ▽ More

    Submitted 25 January, 2024; originally announced February 2024.

    Comments: 3 figures and 4 tables; This manuscript is submitted for possible publication

  30. arXiv:2401.14159  [pdf, other

    cs.CV

    Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

    Authors: Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

    Abstract: We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM). This integration enables the detection and segmentation of any regions based on arbitrary text inputs and opens a door to connecting various vision models. As shown in Fig.1, a wide range of vision tasks can be achieved by using the versatile Grounded SAM pipeline.… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  31. arXiv:2401.04088  [pdf, other

    cs.LG cs.CL

    Mixtral of Experts

    Authors: Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix , et al. (1 additional authors not shown)

    Abstract: We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected e… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: See more details at https://mistral.ai/news/mixtral-of-experts/

  32. arXiv:2312.06999  [pdf, other

    cs.CV

    DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image Enhancement

    Authors: Jingchun Zhou, Zongxin He, Qiuping Jiang, Kui Jiang, Xianping Fu, Xuelong Li

    Abstract: Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments. To solve this issue, previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features, limiting the generalization and adaptability of the model. Previous methods use the reference gradient… ▽ More

    Submitted 8 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  33. arXiv:2311.13601  [pdf, other

    cs.CV cs.AI cs.LG

    Visual In-Context Prompting

    Authors: Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

    Abstract: In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain. Existing visual prompting methods focus on referring segmentation to segment the most relevant object, falling short of addressing many generic vision tasks like open-set segmentation and detection. In this paper, we introduce… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: technical report

  34. arXiv:2311.13596  [pdf, other

    cs.CV

    T-Rex: Counting by Visual Prompting

    Authors: Qing Jiang, Feng Li, Tianhe Ren, Shilong Liu, Zhaoyang Zeng, Kent Yu, Lei Zhang

    Abstract: We introduce T-Rex, an interactive object counting model designed to first detect and then count any objects. We formulate object counting as an open-set object detection task with the integration of visual prompts. Users can specify the objects of interest by marking points or boxes on a reference image, and T-Rex then detects all objects with a similar pattern. Guided by the visual feedback from… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Technical report. Work in progress

  35. arXiv:2311.07056  [pdf, ps, other

    cs.NI cs.AI cs.CR

    Effective In-vehicle Intrusion Detection via Multi-view Statistical Graph Learning on CAN Messages

    Authors: Kai Wang, Qiguang Jiang, Bailing Wang, Yongzheng Zhang, Yulei Wu

    Abstract: As an important component of internet of vehicles (IoV), intelligent connected vehicles (ICVs) have to communicate with external networks frequently. In this case, the resource-constrained in-vehicle network (IVN) is facing a wide variety of complex and changing external cyber-attacks, especially the masquerade attack with high difficulty of detection while serious damaging effects that few counte… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 14 pages, 8 figures, 6 tables, 27 references

  36. arXiv:2311.03755  [pdf, other

    cs.CL cs.LG

    Multilingual Mathematical Autoformalization

    Authors: Albert Q. Jiang, Wenda Li, Mateja Jamnik

    Abstract: Autoformalization is the task of translating natural language materials into machine-verifiable formalisations. Progress in autoformalization research is hindered by the lack of a sizeable dataset consisting of informal-formal pairs expressing the same essence. Existing methods tend to circumvent this challenge by manually curating small corpora or using few-shot learning with large language model… ▽ More

    Submitted 9 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

  37. arXiv:2310.10631  [pdf, other

    cs.CL cs.AI cs.LO

    Llemma: An Open Language Model For Mathematics

    Authors: Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck

    Abstract: We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool u… ▽ More

    Submitted 15 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Updated references; corrected description of COPRA search budget

  38. arXiv:2310.08068  [pdf, other

    eess.IV cs.CV

    Frequency-Aware Re-Parameterization for Over-Fitting Based Image Compression

    Authors: Yun Ye, Yanjie Pan, Qually Jiang, Ming Lu, Xiaoran Fang, Beryl Xu

    Abstract: Over-fitting-based image compression requires weights compactness for compression and fast convergence for practical use, posing challenges for deep convolutional neural networks (CNNs) based methods. This paper presents a simple re-parameterization method to train CNNs with reduced weights storage and accelerated convergence. The convolution kernels are re-parameterized as a weighted sum of discr… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: to be published at ICIP 2023, this version fixed a mistake in Eq. (1) in the proceeding version

  39. arXiv:2310.06825  [pdf, other

    cs.CL cs.AI cs.LG

    Mistral 7B

    Authors: Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed

    Abstract: We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences o… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Models and code are available at https://mistral.ai/news/announcing-mistral-7b/

  40. arXiv:2310.05564  [pdf

    cs.NI

    A Novel Node Selection Method in Wireless Distributed Edge Storage Based on SDN and Multi-attribute Decision Model

    Authors: Yejin Yang, Miao Ye, Qiuxiang Jiang, Peng Wen

    Abstract: The distributed edge storage system can store data collected at the edge of the network in a decentralised manner, with low latency, high security, and flexibility. Traditional edge-distributed storage systems only consider one single factor, such as node capacity, when storing data, ignoring network and storage node load conditions that affecting the system's read/write performance. At the same t… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  41. arXiv:2309.16639  [pdf, other

    cs.CL cs.AI cs.HC

    MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention

    Authors: Ruolan Wu, Chun Yu, Xiaole Pan, Yujia Liu, Ningning Zhang, Yue Fu, Yuhan Wang, Zhi Zheng, Li Chen, Qiaolei Jiang, Xuhai Xu, Yuanchun Shi

    Abstract: Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users' physical contexts and mental states. We first conducted a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone… ▽ More

    Submitted 27 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Published at ACM CHI'24

    MSC Class: 68U35 ACM Class: H.5.2; I.2.7

  42. arXiv:2309.07657  [pdf, other

    cs.CR cs.OS

    Sync+Sync: A Covert Channel Built on fsync with Storage

    Authors: Qisheng Jiang, Chundong Wang

    Abstract: Scientists have built a variety of covert channels for secretive information transmission with CPU cache and main memory. In this paper, we turn to a lower level in the memory hierarchy, i.e., persistent storage. Most programs store intermediate or eventual results in the form of files and some of them call fsync to synchronously persist a file with storage device for orderly persistence. Our quan… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: A full version for the paper with the same title accepted by the 33rd USENIX Security Symposium (USENIX Security 2024)

  43. arXiv:2308.07104  [pdf, other

    cs.CV cs.RO eess.IV

    FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous Driving

    Authors: Zhonghua Yi, Hao Shi, Kailun Yang, Qi Jiang, Yaozu Ye, Ze Wang, Huajian Ni, Kaiwei Wang

    Abstract: Key-point-based scene understanding is fundamental for autonomous driving applications. At the same time, optical flow plays an important role in many vision tasks. However, due to the implicit bias of equal attention on all points, classic data-driven optical flow estimation methods yield less satisfactory performance on key points, limiting their implementations in key-point-critical safety-rele… ▽ More

    Submitted 22 September, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted to IEEE Transactions on Intelligent Vehicles (T-IV). The source code of FocusFlow will be available at https://github.com/ZhonghuaYi/FocusFlow_official

  44. arXiv:2308.00134  [pdf, other

    cs.RO

    Onboard View Planning of a Flying Camera for High Fidelity 3D Reconstruction of a Moving Actor

    Authors: Qingyuan Jiang, Volkan Isler

    Abstract: Capturing and reconstructing a human actor's motion is important for filmmaking and gaming. Currently, motion capture systems with static cameras are used for pixel-level high-fidelity reconstructions. Such setups are costly, require installation and calibration and, more importantly, confine the user to a predetermined area. In this work, we present a drone-based motion capture system that can al… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

  45. arXiv:2307.10782  [pdf, other

    cs.CV

    See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data

    Authors: Yuhang Lu, Qi Jiang, Runnan Chen, Yuenan Hou, Xinge Zhu, Yuexin Ma

    Abstract: Zero-shot point cloud segmentation aims to make deep models capable of recognizing novel objects in point cloud that are unseen in the training phase. Recent trends favor the pipeline which transfers knowledge from seen classes with labels to unseen classes without labels. They typically align visual features with semantic features obtained from word embedding by the supervision of seen classes' a… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV 2023

  46. arXiv:2307.08723  [pdf, other

    cs.CV

    Revisiting Scene Text Recognition: A Data Perspective

    Authors: Qing Jiang, Jiapeng Wang, Dezhi Peng, Chongyu Liu, Lianwen Jin

    Abstract: This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective. We begin by revisiting the six commonly used benchmarks in STR and observe a trend of performance saturation, whereby only 2.91% of the benchmark images cannot be accurately recognized by an ensemble of 13 representative models. While these results are impressive and suggest that STR could be considered sol… ▽ More

    Submitted 19 July, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV2023

  47. arXiv:2307.05935  [pdf

    cs.RO

    GRAINS: Proximity Sensing of Objects in Granular Materials

    Authors: Zeqing Zhang, Ruixing Jia, Youcan Yan, Ruihua Han, Shijie Lin, Qian Jiang, Liangjun Zhang, Jia Pan

    Abstract: Proximity sensing detects an object's presence without contact. However, research has rarely explored proximity sensing in granular materials (GM) due to GM's lack of visual and complex properties. In this paper, we propose a granular-material-embedded autonomous proximity sensing system (GRAINS) based on three granular phenomena (fluidization, jamming, and failure wedge zone). GRAINS can automati… ▽ More

    Submitted 18 July, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: 35 pages, 5 figures,2 tables. Videos available at https://sites.google.com/view/grains2/home

  48. arXiv:2307.00924  [pdf, other

    cs.LG cs.CV

    Semi-supervised multi-view concept decomposition

    Authors: Qi Jiang, Guoxu Zhou, Qibin Zhao

    Abstract: Concept Factorization (CF), as a novel paradigm of representation learning, has demonstrated superior performance in multi-view clustering tasks. It overcomes limitations such as the non-negativity constraint imposed by traditional matrix factorization methods and leverages kernel methods to learn latent representations that capture the underlying structure of the data, thereby improving data repr… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  49. arXiv:2306.12992  [pdf, other

    cs.CV eess.IV physics.optics

    Minimalist and High-Quality Panoramic Imaging with PSF-aware Transformers

    Authors: Qi Jiang, Shaohua Gao, Yao Gao, Kailun Yang, Zhonghua Yi, Hao Shi, Lei Sun, Kaiwei Wang

    Abstract: High-quality panoramic images with a Field of View (FoV) of 360° are essential for contemporary panoramic computer vision tasks. However, conventional imaging systems come with sophisticated lens designs and heavy optical components. This disqualifies their usage in many mobile and wearable applications where thin and portable, minimalist imaging systems are desired. In this paper, we propose a Pa… ▽ More

    Submitted 4 July, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: Accepted to IEEE Transactions on Image Processing (TIP). The dataset and code will be available at https://github.com/zju-jiangqi/PCIE-PART

  50. arXiv:2306.01694  [pdf, other

    cs.LG cs.HC

    Evaluating Language Models for Mathematics through Interactions

    Authors: Katherine M. Collins, Albert Q. Jiang, Simon Frieder, Lionel Wong, Miri Zilka, Umang Bhatt, Thomas Lukasiewicz, Yuhuai Wu, Joshua B. Tenenbaum, William Hart, Timothy Gowers, Wenda Li, Adrian Weller, Mateja Jamnik

    Abstract: There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs, and is insufficient for making an informed decision about which LLMs and under which assistive settings can they be sensibly used. Static assessment fails to a… ▽ More

    Submitted 5 November, 2023; v1 submitted 2 June, 2023; originally announced June 2023.