Skip to main content

Showing 1–50 of 374 results for author: Tian, Q

  1. arXiv:2407.05554  [pdf, other

    cs.CV

    PANS: Probabilistic Airway Navigation System for Real-time Robust Bronchoscope Localization

    Authors: Qingyao Tian, Zhen Chen, Huai Liao, Xinyan Huang, Bingyu Yang, Lujie Li, Hongbin Liu

    Abstract: Accurate bronchoscope localization is essential for pulmonary interventions, by providing six degrees of freedom (DOF) in airway navigation. However, the robustness of current vision-based methods is often compromised in clinical practice, and they struggle to perform in real-time and to generalize across cases unseen during training. To overcome these challenges, we propose a novel Probabilistic… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  2. arXiv:2407.04504  [pdf, other

    cs.CV

    Segment Any 4D Gaussians

    Authors: Shengxiang Ji, Guanjun Wu, Jiemin Fang, Jiazhong Cen, Taoran Yi, Wenyu Liu, Qi Tian, Xinggang Wang

    Abstract: Modeling, understanding, and reconstructing the real world are crucial in XR/VR. Recently, 3D Gaussian Splatting (3D-GS) methods have shown remarkable success in modeling and understanding 3D scenes. Similarly, various 4D representations have demonstrated the ability to capture the dynamics of the 4D world. However, there is a dearth of research focusing on segmentation within 4D representations.… ▽ More

    Submitted 12 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 22 pages

  3. arXiv:2407.02272  [pdf, other

    cs.CV cs.GR

    Aligning Human Motion Generation with Human Perceptions

    Authors: Haoru Wang, Wentao Zhu, Luyi Miao, Yishu Xu, Feng Gao, Qi Tian, Yizhou Wang

    Abstract: Human motion generation is a critical task with a wide range of applications. Achieving high realism in generated motions requires naturalness, smoothness, and plausibility. Despite rapid advancements in the field, current generation methods often fall short of these goals. Furthermore, existing evaluation metrics typically rely on ground-truth-based errors, simple heuristics, or distribution dist… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Project page: https://motioncritic.github.io/

  4. arXiv:2406.18462  [pdf, other

    cs.CV cs.GR

    GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

    Authors: Taoran Yi, Jiemin Fang, Zanwei Zhou, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Xinggang Wang, Qi Tian

    Abstract: Recently, 3D Gaussian splatting (3D-GS) has achieved great success in reconstructing and rendering real-world scenes. To transfer the high rendering quality to generation tasks, a series of research works attempt to generate 3D-Gaussian assets from text. However, the generated assets have not achieved the same quality as those in reconstruction tasks. We observe that Gaussians tend to grow without… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Project page: https://taoranyi.com/gaussiandreamerpro/

  5. arXiv:2406.17777  [pdf, other

    cs.CV

    Text-Animator: Controllable Visual Text Video Generation

    Authors: Lin Liu, Quande Liu, Shengju Qian, Yuan Zhou, Wengang Zhou, Houqiang Li, Lingxi Xie, Qi Tian

    Abstract: Video generation is a challenging yet pivotal task in various industries, such as gaming, e-commerce, and advertising. One significant unresolved aspect within T2V is the effective visualization of text within generated videos. Despite the progress achieved in Text-to-Video~(T2V) generation, current methods still cannot effectively visualize texts in videos directly, as they mainly focus on summar… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Project Page: https://laulampaul.github.io/text-animator.html

  6. arXiv:2406.17679  [pdf, other

    cs.CV

    Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation

    Authors: Xuming Zhang, Naoto Yokoya, Xingfa Gu, Qingjiu Tian, Lorenzo Bruzzone

    Abstract: Hyperspectral image (HSI) classification has recently reached its performance bottleneck. Multimodal data fusion is emerging as a promising approach to overcome this bottleneck by providing rich complementary information from the supplementary modality (X-modality). However, achieving comprehensive cross-modal interaction and fusion that can be generalized across different sensing modalities is ch… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  7. arXiv:2406.13625  [pdf

    cs.CV cs.AI physics.med-ph

    Enhance the Image: Super Resolution using Artificial Intelligence in MRI

    Authors: Ziyu Li, Zihan Li, Haoxiang Li, Qiuyun Fan, Karla L. Miller, Wenchuan Wu, Akshay S. Chaudhari, Qiyuan Tian

    Abstract: This chapter provides an overview of deep learning techniques for improving the spatial resolution of MRI, ranging from convolutional neural networks, generative adversarial networks, to more advanced models including transformers, diffusion models, and implicit neural representations. Our exploration extends beyond the methodologies to scrutinize the impact of super-resolved images on clinical an… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: A book chapter in Machine Learning in MRI: From methods to clinical translation. Copyright may be transferred without notice, after which this version may no longer be accessible

  8. arXiv:2406.05982  [pdf

    eess.IV cs.LG physics.med-ph

    Artificial Intelligence for Neuro MRI Acquisition: A Review

    Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

    Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Magn Reson Mater Phy (2024)

  9. arXiv:2406.04675  [pdf, other

    cs.CV

    OVMR: Open-Vocabulary Recognition with Multi-Modal References

    Authors: Zehong Ma, Shiliang Zhang, Longhui Wei, Qi Tian

    Abstract: The challenge of open-vocabulary recognition lies in the model has no clue of new categories it is applied to. Existing works have proposed different methods to embed category cues into the model, \eg, through few-shot fine-tuning, providing category names or textual descriptions to Vision-Language Models. Fine-tuning is time-consuming and degrades the generalization capability. Textual descriptio… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: CVPR2024

  10. arXiv:2406.03035  [pdf, other

    cs.CV

    Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

    Authors: Jingyun Xue, Hongfa Wang, Qi Tian, Yue Ma, Andong Wang, Zhiyuan Zhao, Shaobo Min, Wenzhe Zhao, Kaihao Zhang, Heung-Yeung Shum, Wei Liu, Mengyang Liu, Wenhan Luo

    Abstract: Pose-controllable character video generation is in high demand with extensive applications for fields such as automatic advertising and content creation on social media platforms. While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple characte… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  11. arXiv:2405.20071  [pdf

    physics.med-ph cs.LG

    A Staged Approach using Machine Learning and Uncertainty Quantification to Predict the Risk of Hip Fracture

    Authors: Anjum Shaik, Kristoffer Larsen, Nancy E. Lane, Chen Zhao, Kuan-Jui Su, Joyce H. Keyak, Qing Tian, Qiuying Sha, Hui Shen, Hong-Wen Deng, Weihua Zhou

    Abstract: Despite advancements in medical care, hip fractures impose a significant burden on individuals and healthcare systems. This paper focuses on the prediction of hip fracture risk in older and middle-aged adults, where falls and compromised bone quality are predominant factors. We propose a novel staged model that combines advanced imaging and clinical data to improve predictive performance. By using… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 29 pages, 5 figures, 6 tables

  12. arXiv:2405.18840  [pdf, other

    cs.CV

    Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation

    Authors: Zelin Peng, Zhengqin Xu, Zhilin Zeng, Yaoming Wang, Lingxi Xie, Qi Tian, Wei Shen

    Abstract: Open-vocabulary semantic segmentation seeks to label each pixel in an image with arbitrary text descriptions. Vision-language foundation models, especially CLIP, have recently emerged as powerful tools for acquiring open-vocabulary capabilities. However, fine-tuning CLIP to equip it with pixel-level prediction ability often suffers three issues: 1) high computational cost, 2) misalignment between… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  13. arXiv:2405.12328  [pdf, other

    cs.CV

    Multi-dimension Transformer with Attention-based Filtering for Medical Image Segmentation

    Authors: Wentao Wang, Xi Xiao, Mingjie Liu, Qing Tian, Xuanyao Huang, Qizhen Lan, Swalpa Kumar Roy, Tianyang Wang

    Abstract: The accurate segmentation of medical images is crucial for diagnosing and treating diseases. Recent studies demonstrate that vision transformer-based methods have significantly improved performance in medical image segmentation, primarily due to their superior ability to establish global relationships among features and adaptability to various inputs. However, these methods struggle with the low s… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  14. arXiv:2405.11236  [pdf, other

    cs.CV

    TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation

    Authors: Chengcheng Feng, Mu He, Qiuyu Tian, Haojie Yin, Xiaofang Zhao, Hongwei Tang, Xingqiang Wei

    Abstract: As deep learning technology continues to advance, image generation models, especially models like Stable Diffusion, are finding increasingly widespread application in visual arts creation. However, these models often face challenges such as overfitting, lack of stability in generated results, and difficulties in accurately capturing the features desired by creators during the fine-tuning process.… ▽ More

    Submitted 13 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  15. arXiv:2405.09592  [pdf, other

    cs.LG cs.AI cs.CE

    A Survey of Generative Techniques for Spatial-Temporal Data Mining

    Authors: Qianru Zhang, Haixin Wang, Cheng Long, Liangcai Su, Xingwei He, Jianlong Chang, Tailin Wu, Hongzhi Yin, Siu-Ming Yiu, Qi Tian, Christian S. Jensen

    Abstract: This paper focuses on the integration of generative techniques into spatial-temporal data mining, considering the significant growth and diverse nature of spatial-temporal data. With the advancements in RNNs, CNNs, and other non-generative techniques, researchers have explored their application in capturing temporal and spatial dependencies within spatial-temporal data. However, the emergence of g… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 19 pages

  16. arXiv:2405.05022  [pdf, other

    cs.CR cs.SI

    Adversarial Threats to Automatic Modulation Open Set Recognition in Wireless Networks

    Authors: Yandie Yang, Sicheng Zhang, Kuixian Li, Qiao Tian, Yun Lin

    Abstract: Automatic Modulation Open Set Recognition (AMOSR) is a crucial technological approach for cognitive radio communications, wireless spectrum management, and interference monitoring within wireless networks. Numerous studies have shown that AMR is highly susceptible to minimal perturbations carefully designed by malicious attackers, leading to misclassification of signals. However, the adversarial s… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  17. arXiv:2404.06819  [pdf, other

    cs.CR cs.DB

    Enc2DB: A Hybrid and Adaptive Encrypted Query Processing Framework

    Authors: Hui Li, Jingwen Shi, Qi Tian, Zheng Li, Yan Fu, Bingqing Shen, Yaofeng Tu

    Abstract: As cloud computing gains traction, data owners are outsourcing their data to cloud service providers (CSPs) for Database Service (DBaaS), bringing in a deviation of data ownership and usage, and intensifying privacy concerns, especially with potential breaches by hackers or CSP insiders. To address that, encrypted database services propose encrypting every tuple and query statement before submitti… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 33 pages,33 figures, DASAFAA24

  18. arXiv:2404.05667  [pdf, other

    cs.CV

    AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation

    Authors: Jiannan Ge, Lingxi Xie, Hongtao Xie, Pandeng Li, Xiaopeng Zhang, Yongdong Zhang, Qi Tian

    Abstract: A serious issue that harms the performance of zero-shot visual recognition is named objective misalignment, i.e., the learning objective prioritizes improving the recognition accuracy of seen classes rather than unseen classes, while the latter is the true target to pursue. This issue becomes more significant in zero-shot image segmentation because the stronger (i.e., pixel-level) supervision brin… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  19. arXiv:2404.00714  [pdf, other

    cs.CV

    Neural Radiance Field-based Visual Rendering: A Comprehensive Review

    Authors: Mingyuan Yao, Yukang Huo, Yang Ran, Qingbin Tian, Ruifeng Wang, Haihua Wang

    Abstract: In recent years, Neural Radiance Fields (NeRF) has made remarkable progress in the field of computer vision and graphics, providing strong technical support for solving key tasks including 3D scene understanding, new perspective synthesis, human body reconstruction, robotics, and so on, the attention of academics to this research result is growing. As a revolutionary neural implicit field represen… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 35 pages, 22 figures, 14 tables, 18 formulas

  20. arXiv:2403.19600  [pdf, other

    cs.CV

    Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

    Authors: Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian

    Abstract: Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications. However, the effective integration of T2I models into fundamental image classification tasks remains an open question. A prevalent strategy to bolster image classification performance is through augmenting the training set w… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  21. arXiv:2403.18435  [pdf, other

    cs.IR cs.CL

    DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment

    Authors: Haitao Li, Qingyao Ai, Xinyan Han, Jia Chen, Qian Dong, Yiqun Liu, Chong Chen, Qi Tian

    Abstract: Recent research demonstrates the effectiveness of using pre-trained language models for legal case retrieval. Most of the existing works focus on improving the representation ability for the contextualized embedding of the [CLS] token and calculate relevance using textual semantic similarity. However, in the legal domain, textual semantic similarity does not always imply that the cases are relevan… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 11 pages

  22. arXiv:2403.18365  [pdf, other

    cs.CL

    BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models

    Authors: Haitao Li, Qingyao Ai, Jia Chen, Qian Dong, Zhijing Wu, Yiqun Liu, Chong Chen, Qi Tian

    Abstract: Large Language Models (LLMs) like ChatGPT and GPT-4 are versatile and capable of addressing a diverse range of tasks. However, general LLMs, which are developed on open-domain data, may lack the domain-specific knowledge essential for tasks in vertical domains, such as legal, medical, etc. To address this issue, previous approaches either conduct continuous pre-training with domain-specific data o… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 11pages

  23. arXiv:2403.16334  [pdf, other

    cs.LG cs.AI

    Graphs Generalization under Distribution Shifts

    Authors: Qin Tian, Wenjun Wang, Chen Zhao, Minglai Shao, Wang Zhang, Dong Li

    Abstract: Traditional machine learning methods heavily rely on the independent and identically distribution assumption, which imposes limitations when the test distribution deviates from the training distribution. To address this crucial issue, out-of-distribution (OOD) generalization, which aims to achieve satisfactory generalization performance when faced with unknown distribution shifts, has made a signi… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  24. arXiv:2403.01683  [pdf, other

    cs.CV

    DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy

    Authors: Qingyao Tian, Huai Liao, Xinyan Huang, Jian Chen, Zihui Zhang, Bingyu Yang, Sebastien Ourselin, Hongbin Liu

    Abstract: Real-time 6 DOF localization of bronchoscopes is crucial for enhancing intervention quality. However, current vision-based technologies struggle to balance between generalization to unseen data and computational speed. In this study, we propose a Depth-based Dual-Loop framework for real-time Visually Navigated Bronchoscopy (DD-VNB) that can generalize across patient cases without the need of re-tr… ▽ More

    Submitted 15 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  25. arXiv:2403.01483  [pdf, other

    cs.RO

    BronchoCopilot: Towards Autonomous Robotic Bronchoscopy via Multimodal Reinforcement Learning

    Authors: Jianbo Zhao, Hao Chen, Qingyao Tian, Jian Chen, Bingyu Yang, Hongbin Liu

    Abstract: Bronchoscopy plays a significant role in the early diagnosis and treatment of lung diseases. This process demands physicians to maneuver the flexible endoscope for reaching distal lesions, particularly requiring substantial expertise when examining the airways of the upper lung lobe. With the development of artificial intelligence and robotics, reinforcement learning (RL) method has been applied t… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  26. arXiv:2402.12763  [pdf, other

    cs.CV

    BronchoTrack: Airway Lumen Tracking for Branch-Level Bronchoscopic Localization

    Authors: Qingyao Tian, Huai Liao, Xinyan Huang, Bingyu Yang, Jinlin Wu, Jian Chen, Lujie Li, Hongbin Liu

    Abstract: Localizing the bronchoscope in real time is essential for ensuring intervention quality. However, most existing methods struggle to balance between speed and generalization. To address these challenges, we present BronchoTrack, an innovative real-time framework for accurate branch-level localization, encompassing lumen detection, tracking, and airway association.To achieve real-time performance, w… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  27. arXiv:2402.10398  [pdf, ps, other

    cs.SE

    Prompt Learning for Multi-Label Code Smell Detection: A Promising Approach

    Authors: Haiyang Liu, Yang Zhang, Vidya Saikrishna, Quanquan Tian, Kun Zheng

    Abstract: Code smells indicate the potential problems of software quality so that developers can identify refactoring opportunities by detecting code smells. State-of-the-art approaches leverage heuristics, machine learning, and deep learning to detect code smells. However, existing approaches have not fully explored the potential of large language models (LLMs). In this paper, we propose \textit{PromptSmel… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  28. arXiv:2402.10259  [pdf, other

    cs.CV cs.GR

    GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

    Authors: Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

    Abstract: Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or hig… ▽ More

    Submitted 20 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Project page: https://gaussianobject.github.io/

  29. arXiv:2402.01327  [pdf, other

    cs.LG cs.AI cs.CY

    Supervised Algorithmic Fairness in Distribution Shifts: A Survey

    Authors: Minglai Shao, Dong Li, Chen Zhao, Xintao Wu, Yujie Lin, Qin Tian

    Abstract: Supervised fairness-aware machine learning under distribution shifts is an emerging field that addresses the challenge of maintaining equitable and unbiased predictions when faced with changes in data distributions from source to target domains. In real-world applications, machine learning models are often trained on a specific dataset but deployed in environments where the data distribution may s… ▽ More

    Submitted 4 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: IJCAI 2024

  30. arXiv:2401.16410  [pdf, other

    stat.ML cs.LG

    ReTaSA: A Nonparametric Functional Estimation Approach for Addressing Continuous Target Shift

    Authors: Hwanwoo Kim, Xin Zhang, Jiwei Zhao, Qinglong Tian

    Abstract: The presence of distribution shifts poses a significant challenge for deploying modern machine learning models in real-world applications. This work focuses on the target shift problem in a regression setting (Zhang et al., 2013; Nguyen et al., 2016). More specifically, the target variable y (also known as the response variable), which is continuous, has different marginal distributions in the tra… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted by ICLR 2024

  31. arXiv:2401.13923  [pdf, other

    cs.LG cs.IR q-bio.BM

    Towards 3D Molecule-Text Interpretation in Language Models

    Authors: Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian

    Abstract: Language Models (LMs) have greatly influenced diverse domains. However, their inherent limitation in comprehending 3D molecular structures has considerably constrained their potential in the biomolecular domain. To bridge this gap, we focus on 3D molecule-text interpretation, and propose 3D-MoLM: 3D-Molecular Language Modeling. Specifically, 3D-MoLM enables an LM to interpret and analyze 3D molecu… ▽ More

    Submitted 17 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  32. arXiv:2401.13307  [pdf, other

    cs.CV

    ChatterBox: Multi-round Multimodal Referring and Grounding

    Authors: Yunjie Tian, Tianren Ma, Lingxi Xie, Jihao Qiu, Xi Tang, Yuan Zhang, Jianbin Jiao, Qi Tian, Qixiang Ye

    Abstract: In this study, we establish a baseline for a new task named multimodal multi-round referring and grounding (MRG), opening up a promising direction for instance-level multimodal dialogues. We present a new benchmark and an efficient vision-language model for this purpose. The new benchmark, named CB-300K, spans challenges including multi-round dialogue, complex spatial relationships among multiple… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 17 pages, 6 tables, 9 figurs. Code, data, and model are available at: https://github.com/sunsmarterjie/ChatterBox

  33. arXiv:2401.06397   

    cs.CV

    UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding

    Authors: Bowen Shi, Peisen Zhao, Zichen Wang, Yuhang Zhang, Yaoming Wang, Jin Li, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian, Xiaopeng Zhang

    Abstract: Vision-language foundation models, represented by Contrastive language-image pre-training (CLIP), have gained increasing attention for jointly understanding both vision and textual tasks. However, existing approaches primarily focus on training models to match global image representations with textual descriptions, thereby overlooking the critical alignment between local regions and corresponding… ▽ More

    Submitted 18 January, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: The paper is undergoing internal legal review and will be resubmitted once it passes the review

  34. arXiv:2401.06345  [pdf, other

    cs.CV

    Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering

    Authors: Chang Yu, Junran Peng, Xiangyu Zhu, Zhaoxiang Zhang, Qi Tian, Zhen Lei

    Abstract: The text-to-image synthesis by diffusion models has recently shown remarkable performance in generating high-quality images. Although performs well for simple texts, the models may get confused when faced with complex texts that contain multiple objects or spatial relationships. To get the desired images, a feasible way is to manually adjust the textual descriptions, i.e., narrating the texts or a… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  35. arXiv:2401.04749  [pdf, other

    cs.LG cs.AI cs.SE

    LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection

    Authors: Hongcheng Guo, Jian Yang, Jiaheng Liu, Jiaqi Bai, Boyang Wang, Zhoujun Li, Tieqiao Zheng, Bo Zhang, Junran peng, Qi Tian

    Abstract: Log anomaly detection is a key component in the field of artificial intelligence for IT operations (AIOps). Considering log data of variant domains, retraining the whole network for unknown domains is inefficient in real industrial scenarios. However, previous deep models merely focused on extracting the semantics of log sequences in the same domain, leading to poor generalization on multi-domain… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: text overlap with arXiv:2201.00016

  36. arXiv:2401.03105  [pdf, other

    cs.CV cs.MM

    Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models

    Authors: Xin He, Longhui Wei, Lingxi Xie, Qi Tian

    Abstract: Multimodal Large Language Models (MLLMs) are experiencing rapid growth, yielding a plethora of noteworthy contributions in recent months. The prevailing trend involves adopting data-driven methodologies, wherein diverse instruction-following datasets are collected. However, a prevailing challenge persists in these approaches, specifically in relation to the limited visual perception ability, as CL… ▽ More

    Submitted 13 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  37. arXiv:2312.16931  [pdf, other

    cs.CV

    DeLR: Active Learning for Detection with Decoupled Localization and Recognition Query

    Authors: Yuhang Zhang, Yuang Deng, Xiaopeng Zhang, Jie Li, Robert C. Qiu, Qi Tian

    Abstract: Active learning has been demonstrated effective to reduce labeling cost, while most progress has been designed for image recognition, there still lacks instance-level active learning for object detection. In this paper, we rethink two key components, i.e., localization and recognition, for object detection, and find that the correctness of them are highly related, therefore, it is not necessary to… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  38. arXiv:2312.15599  [pdf, other

    cs.IR

    Preliminary Study on Incremental Learning for Large Language Model-based Recommender Systems

    Authors: Tianhao Shi, Yang Zhang, Zhijian Xu, Chong Chen, Fuli Feng, Xiangnan He, Qi Tian

    Abstract: Adapting Large Language Models for recommendation (LLM4Rec)has garnered substantial attention and demonstrated promising results. However, the challenges of practically deploying LLM4Rec are largely unexplored, with the need for incremental adaptation to evolving user preferences being a critical concern. Nevertheless, the suitability of traditional incremental learning within LLM4Rec remains ambi… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: 8 pages, 8 figures

  39. arXiv:2312.12458  [pdf, other

    cs.CL cs.AI

    When Parameter-efficient Tuning Meets General-purpose Vision-language Models

    Authors: Yihang Zhai, Haixin Wang, Jianlong Chang, Xinlong Yang, Jinan Sun, Shikun Zhang, Qi Tian

    Abstract: Instruction tuning has shown promising potential for developing general-purpose AI capabilities by using large-scale pre-trained models and boosts growing research to integrate multimodal information for creative applications. However, existing works still face two main limitations: the high training costs and heavy computing resource dependence of full model fine-tuning, and the lack of semantic… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  40. arXiv:2312.07364  [pdf, other

    cs.CV

    Collapse-Aware Triplet Decoupling for Adversarially Robust Image Retrieval

    Authors: Qiwei Tian, Chenhao Lin, Zhengyu Zhao, Qian Li, Chao Shen

    Abstract: Adversarial training has achieved substantial performance in defending image retrieval against adversarial examples. However, existing studies in deep metric learning (DML) still suffer from two major limitations: weak adversary and model collapse. In this paper, we address these two limitations by proposing Collapse-Aware TRIplet DEcoupling (CA-TRIDE). Specifically, TRIDE yields a stronger advers… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by ICML2024

  41. arXiv:2312.04424  [pdf, other

    cs.CV cs.GR

    Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views

    Authors: Yabo Chen, Jiemin Fang, Yuyang Huang, Taoran Yi, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian

    Abstract: Synthesizing multi-view 3D from one single image is a significant and challenging task. For this goal, Zero-1-to-3 methods aim to extend a 2D latent diffusion model to the 3D scope. These approaches generate the target-view image with a single-view source image and the camera pose as condition information. However, the one-to-one manner adopted in Zero-1-to-3 incurs challenges for building geometr… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Project page: https://cascadezero123.github.io/

  42. arXiv:2312.03628  [pdf, other

    cs.CV

    Boosting Segment Anything Model Towards Open-Vocabulary Learning

    Authors: Xumeng Han, Longhui Wei, Xuehui Yu, Zhiyang Dou, Xin He, Kuiran Wang, Zhenjun Han, Qi Tian

    Abstract: The recent Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model, showcasing potent zero-shot generalization and flexible prompting. Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics. In this paper, we present Sambor to seamlessly integrate SAM with the open-vocabulary object… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  43. arXiv:2312.00860  [pdf, other

    cs.CV

    Segment Any 3D Gaussians

    Authors: Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

    Abstract: This paper presents SAGA (Segment Any 3D GAussians), a highly efficient 3D promptable segmentation method based on 3D Gaussian Splatting (3D-GS). Given 2D visual prompts as input, SAGA can segment the corresponding 3D target represented by 3D Gaussians within 4 ms. This is achieved by attaching an scale-gated affinity feature to each 3D Gaussian to endow it a new property towards multi-granularity… ▽ More

    Submitted 27 May, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Work in progress. Project page: https://jumpat.github.io/SAGA

  44. arXiv:2311.17112  [pdf, other

    cs.CV

    Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model

    Authors: Zelin Peng, Zhengqin Xu, Zhilin Zeng, Lingxi Xie, Qi Tian, Wei Shen

    Abstract: Parameter-efficient fine-tuning (PEFT) is an effective methodology to unleash the potential of large foundation models in novel scenarios with limited training data. In the computer vision community, PEFT has shown effectiveness in image classification, but little research has studied its ability for image segmentation. Fine-tuning segmentation models usually require a heavier adjustment of parame… ▽ More

    Submitted 28 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR2024

  45. arXiv:2311.16037  [pdf, other

    cs.CV cs.GR

    GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions

    Authors: Jiemin Fang, Junjie Wang, Xiaopeng Zhang, Lingxi Xie, Qi Tian

    Abstract: Recently, impressive results have been achieved in 3D scene editing with text instructions based on a 2D diffusion model. However, current diffusion models primarily generate images by predicting noise in the latent space, and the editing is usually applied to the whole image, which makes it challenging to perform delicate, especially localized, editing for 3D scenes. Inspired by recent 3D Gaussia… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Project page: https://GaussianEditor.github.io

  46. One-bit Supervision for Image Classification: Problem, Solution, and Beyond

    Authors: Hengtong Hu, Lingxi Xie, Xinyue Hue, Richang Hong, Qi Tian

    Abstract: This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification. Instead of training model using the accurate label of each sample, our setting requires the model to interact with the system by predicting the class label of each sample and learn from the answer whether the guess is correct, which provides one bit (yes or no) of information. An intri… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: ACM TOMM. arXiv admin note: text overlap with arXiv:2009.06168

  47. arXiv:2311.13982  [pdf, other

    cs.CL cs.AI

    Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions

    Authors: Shulin Cao, Jiajie Zhang, Jiaxin Shi, Xin Lv, Zijun Yao, Qi Tian, Juanzi Li, Lei Hou

    Abstract: Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning. However, they tend to generate factually incorrect reasoning steps when the required knowledge is not available or up-to-date in models' parameters. Recent works turn to retrieving external knowledge to augment CoT reasoning. Despite being promising, these chain-based… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: Accepted by EMNLP 2023

  48. arXiv:2311.13614  [pdf, other

    cs.CV cs.AI

    HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

    Authors: Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang

    Abstract: Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks. However, the hallucinations inherent in machine-generated data, which could lead to hallucinatory outputs in MLLMs, remain under-explored. This work aims to investigate various hallucinations (i.e., objec… ▽ More

    Submitted 24 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 2024

  49. arXiv:2311.11525  [pdf, other

    cs.CV

    Generalized Category Discovery in Semantic Segmentation

    Authors: Zhengyuan Peng, Qijian Tian, Jianqing Xu, Yizhang Jin, Xuequan Lu, Xin Tan, Yuan Xie, Lizhuang Ma

    Abstract: This paper explores a novel setting called Generalized Category Discovery in Semantic Segmentation (GCDSS), aiming to segment unlabeled images given prior knowledge from a labeled set of base classes. The unlabeled images contain pixels of the base class or novel class. In contrast to Novel Category Discovery in Semantic Segmentation (NCDSS), there is no prerequisite for prior knowledge mandating… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

  50. FlaCGEC: A Chinese Grammatical Error Correction Dataset with Fine-grained Linguistic Annotation

    Authors: Hanyue Du, Yike Zhao, Qingyuan Tian, Jiani Wang, Lei Wang, Yunshi Lan, Xuesong Lu

    Abstract: Chinese Grammatical Error Correction (CGEC) has been attracting growing attention from researchers recently. In spite of the fact that multiple CGEC datasets have been developed to support the research, these datasets lack the ability to provide a deep linguistic topology of grammar errors, which is critical for interpreting and diagnosing CGEC approaches. To address this limitation, we introduce… ▽ More

    Submitted 26 September, 2023; originally announced November 2023.