Skip to main content

Showing 1–50 of 200 results for author: Hu, D

  1. Market or Markets? Investigating Google Search's Market Shares Under Horizontal and Vertical Segmentation

    Authors: Desheng Hu, Muhammad Abu Bakar Aziz, Jeffrey Gleason, Alice Koeninger, Nikolas Guggenberger, Ronald E. Robertson, Christo Wilson

    Abstract: Is Google Search a monopoly with gatekeeping power? Regulators from the US, UK, and Europe have argued that it is based on the assumption that Google Search dominates the market for horizontal (a.k.a. "general") web search. Google disputes this, claiming that competition extends to all vertical (a.k.a. "specialized") search engines, and that under this market definition it does not have monopoly p… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Extended version of Hu et al. paper that was published in Proceedings of the International AAAI Conference on Weblogs and Social Media (2024). Includes additional analysis of the horizontal search market that did not appear in the published manuscript

  2. arXiv:2407.11820  [pdf, other

    cs.CV cs.AI

    Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation

    Authors: Juncheng Ma, Peiwen Sun, Yaoting Wang, Di Hu

    Abstract: Audio-Visual Segmentation (AVS) aims to achieve pixel-level localization of sound sources in videos, while Audio-Visual Semantic Segmentation (AVSS), as an extension of AVS, further pursues semantic understanding of audio-visual scenes. However, since the AVSS task requires the establishment of audio-visual correspondence and semantic understanding simultaneously, we observe that previous methods… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV2024 accepted. Project url: https://gewu-lab.github.io/stepping_stones

  3. arXiv:2407.10957  [pdf, other

    cs.CV cs.AI

    Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

    Authors: Yaoting Wang, Peiwen Sun, Dongzhan Zhou, Guangyao Li, Honggang Zhang, Di Hu

    Abstract: Traditional reference segmentation tasks have predominantly focused on silent visual scenes, neglecting the integral role of multimodal perception and interaction in human experiences. In this work, we introduce a novel task called Reference Audio-Visual Segmentation (Ref-AVS), which seeks to segment objects within the visual domain based on expressions containing multimodal cues. Such expressions… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  4. arXiv:2407.10947  [pdf, other

    cs.CV

    Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

    Authors: Yaoting Wang, Peiwen Sun, Yuanchao Li, Honggang Zhang, Di Hu

    Abstract: The Audio-Visual Segmentation (AVS) task aims to segment sounding objects in the visual space using audio cues. However, in this work, it is recognized that previous AVS methods show a heavy reliance on detrimental segmentation preferences related to audible objects, rather than precise audio guidance. We argue that the primary reason is that audio lacks robust semantics compared to vision, especi… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  5. arXiv:2407.09894  [pdf, other

    cs.SI cs.AI cs.CL

    Transferring Structure Knowledge: A New Task to Fake news Detection Towards Cold-Start Propagation

    Authors: Lingwei Wei, Dou Hu, Wei Zhou, Songlin Hu

    Abstract: Many fake news detection studies have achieved promising performance by extracting effective semantic and structure features from both content and propagation trees. However, it is challenging to apply them to practical situations, especially when using the trained propagation-based models to detect news with no propagation data. Towards this scenario, we study a new task named cold-start fake new… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: ICASSP 2024

  6. arXiv:2407.09705  [pdf, other

    cs.CV cs.AI cs.MM

    Diagnosing and Re-learning for Balanced Multimodal Learning

    Authors: Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu

    Abstract: To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal performance discrepancy as the basis. However, the intrinsic limitation of modality capacity is ignored. The scarcely informative modalities can be recognized as ``… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  7. arXiv:2407.00167  [pdf, other

    cs.CL cs.AI cs.ET cs.HC cs.SI

    Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach

    Authors: Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Wyatt Bellamy, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang

    Abstract: In recent years, the United States has witnessed a significant surge in the popularity of vaping or e-cigarette use, leading to a notable rise in cases of e-cigarette and vaping use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend vaping behaviors and develop effective strategies for cessation. Due… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Accepted for the AI Applications in Public Health and Social Services workshop at the 22nd International Conference on Artificial Intelligence in Medicine (AIME 2024)

  8. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  9. arXiv:2406.13272  [pdf, other

    cs.CV

    AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models

    Authors: Ken Chen, Sachith Seneviratne, Wei Wang, Dongting Hu, Sanjay Saha, Md. Tarek Hasan, Sanka Rasnayaka, Tamasha Malepathirana, Mingming Gong, Saman Halgamuge

    Abstract: Face reenactment refers to the process of transferring the pose and facial expressions from a reference (driving) video onto a static facial (source) image while maintaining the original identity of the source image. Previous research in this domain has made significant progress by training controllable deep generative models to generate faces based on specific identity, pose and expression condit… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  10. arXiv:2406.05513  [pdf, ps, other

    cs.CV

    A Two-Stage Adverse Weather Semantic Segmentation Method for WeatherProof Challenge CVPR 2024 Workshop UG2+

    Authors: Jianzhao Wang, Yanyan Wei, Dehua Hu, Yilin Zhang, Shengeng Tang, Kun Li, Zhao Zhang

    Abstract: This technical report presents our team's solution for the WeatherProof Dataset Challenge: Semantic Segmentation in Adverse Weather at CVPR'24 UG2+. We propose a two-stage deep learning framework for this task. In the first stage, we preprocess the provided dataset by concatenating images into video sequences. Subsequently, we leverage a low-rank video deraining method to generate high-fidelity ps… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  11. arXiv:2406.05510  [pdf, other

    cs.LG cs.CL

    Representation Learning with Conditional Information Flow Maximization

    Authors: Dou Hu, Lingwei Wei, Wei Zhou, Songlin Hu

    Abstract: This paper proposes an information-theoretic representation learning framework, named conditional information flow maximization, to extract noise-invariant sufficient representations for the input data and target task. It promotes the learned representations have good feature uniformity and sufficient predictive ability, which can enhance the generalization of pre-trained language models (PLMs) fo… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 16 pages, accepted to ACL 2024 (main conference)

  12. arXiv:2406.02827  [pdf, other

    cs.LG cs.AI

    Stochastic Diffusion: A Diffusion Probabilistic Model for Stochastic Time Series Forecasting

    Authors: Yuansan Liu, Sudanthi Wijewickrema, Dongting Hu, Christofer Bester, Stephen O'Leary, James Bailey

    Abstract: Recent innovations in diffusion probabilistic models have paved the way for significant progress in image, text and audio generation, leading to their applications in generative time series forecasting. However, leveraging such abilities to model highly stochastic time series data remains a challenge. In this paper, we propose a novel Stochastic Diffusion (StochDiff) model which learns data-driven… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 15 pages, 4 figures

  13. arXiv:2406.02134  [pdf, other

    cs.CL

    The current status of large language models in summarizing radiology report impressions

    Authors: Danqing Hu, Shanyuan Zhang, Qing Liu, Xiaofeng Zhu, Bing Liu

    Abstract: Large language models (LLMs) like ChatGPT show excellent capabilities in various natural language processing tasks, especially for text generation. The effectiveness of LLMs in summarizing radiology report impressions remains unclear. In this study, we explore the capability of eight LLMs on the radiology report impression summarization. Three types of radiology reports, i.e., CT, PET-CT, and Ultr… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  14. arXiv:2406.01767  [pdf, other

    cs.RO

    Region-aware Grasp Framework with Normalized Grasp Space for 6-DoF Grasping in Cluttered Scene

    Authors: Siang Chen, Pengwei Xie, Wei Tang, Dingchang Hu, Guijin Wang

    Abstract: Regional geometric information is crucial for determining grasp poses. A series of region-based methods succeed in extracting regional features and enhancing grasp detection quality. However, faced with a cluttered scene with multiple objects and potential collision, the definition of the grasp-relevant region remains inconsistent among methods, and the relationship between grasps and regional spa… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  15. arXiv:2406.00439  [pdf, other

    cs.RO cs.CV

    Learning Manipulation by Predicting Interaction

    Authors: Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li

    Abstract: Representation learning approaches for robotic manipulation have boomed in recent years. Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large-scale human video datasets to extract generalizable features for visuomotor policy learning. Despite the progress achieved, prior endeavors disregard the interactive dynamics that capture behavior patterns and physical… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to RSS 2024. Project page: https://github.com/OpenDriveLab/MPI

  16. arXiv:2405.17730  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

    Authors: Yake Wei, Di Hu

    Abstract: Multimodal learning methods with targeted unimodal learning objectives have exhibited their superior efficacy in alleviating the imbalanced multimodal learning problem. However, in this paper, we identify the previously ignored gradient conflict between multimodal and unimodal learning objectives, potentially misleading the unimodal encoder optimization. To well diminish these conflicts, we observ… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML2024

  17. arXiv:2405.14334  [pdf, other

    cs.CV

    Hierarchical Salient Patch Identification for Interpretable Fundus Disease Localization

    Authors: Yitao Peng, Lianghua He, Die Hu

    Abstract: With the widespread application of deep learning technology in medical image analysis, how to effectively explain model decisions and improve diagnosis accuracy has become an urgent problem that needs to be solved. Attribution methods have become a key tool to help doctors better understand the diagnostic basis of models, and they are used to explain and localize diseases in medical images. Howeve… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  18. arXiv:2405.07027  [pdf, other

    cs.CV cs.AI cs.RO

    TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization

    Authors: Zhen Tan, Zongtan Zhou, Yangbing Ge, Zi Wang, Xieyuanli Chen, Dewen Hu

    Abstract: The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncate… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  19. arXiv:2404.18947  [pdf, other

    cs.LG cs.AI

    Multimodal Fusion on Low-quality Data: A Comprehensive Survey

    Authors: Qingyang Zhang, Yake Wei, Zongbo Han, Huazhu Fu, Xi Peng, Cheng Deng, Qinghua Hu, Cai Xu, Jie Wen, Di Hu, Changqing Zhang

    Abstract: Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical diagnosis. However, the reliability of multimodal fusion remains largely unexplored especially under low-quality data settings. This paper surveys the common challenges… ▽ More

    Submitted 5 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: Feel free to comment on our manuscript: qingyangzhang@tju.edu.cn

  20. arXiv:2404.17607  [pdf, other

    cs.IR cs.AI cs.CL cs.LG cs.SI

    Utilizing Large Language Models to Identify Reddit Users Considering Vaping Cessation for Digital Interventions

    Authors: Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Caleb Henry, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang

    Abstract: The widespread adoption of social media platforms globally not only enhances users' connectivity and communication but also emerges as a vital channel for the dissemination of health-related information, thereby establishing social media data as an invaluable organic data resource for public health research. The surge in popularity of vaping or e-cigarette use in the United States and other countr… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  21. arXiv:2404.16292  [pdf, other

    cs.GR cs.CV cs.LG

    One Noise to Rule Them All: Learning a Unified Model of Spatially-Varying Noise Patterns

    Authors: Arman Maesumi, Dylan Hu, Krishi Saripalli, Vladimir G. Kim, Matthew Fisher, Sören Pirk, Daniel Ritchie

    Abstract: Procedural noise is a fundamental component of computer graphics pipelines, offering a flexible way to generate textures that exhibit "natural" random variation. Many different types of noise exist, each produced by a separate algorithm. In this paper, we present a single generative model which can learn to generate multiple types of noise as well as blend between them. In addition, it is capable… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: In ACM Transactions on Graphics (Proceedings of SIGGRAPH) 2024, 21 pages

  22. arXiv:2404.15028  [pdf, other

    cs.CV

    PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts

    Authors: Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

    Abstract: In this paper, we present PRISM, a Promptable and Robust Interactive Segmentation Model, aiming for precise segmentation of 3D medical images. PRISM accepts various visual inputs, including points, boxes, and scribbles as sparse prompts, as well as masks as dense prompts. Specifically, PRISM is designed with four principles to achieve robustness: (1) Iterative learning. The model produces segmenta… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  23. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  24. arXiv:2404.00362  [pdf, other

    cs.CV eess.IV

    STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario

    Authors: Renyang Liu, Kwok-Yan Lam, Wei Zhou, Sixing Wu, Jun Zhao, Dongting Hu, Mingming Gong

    Abstract: Many attack techniques have been proposed to explore the vulnerability of DNNs and further help to improve their robustness. Despite the significant progress made recently, existing black-box attack methods still suffer from unsatisfactory performance due to the vast number of queries needed to optimize desired perturbations. Besides, the other critical challenge is that adversarial examples built… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  25. arXiv:2403.19924  [pdf, other

    cs.CV

    SceneTracker: Long-term Scene Flow Estimation Network

    Authors: Bo Wang, Jian Li, Yang Yu, Li Liu, Zhenping Sun, Dewen Hu

    Abstract: Considering the complementarity of scene flow estimation in the spatial domain's focusing capability and 3D object tracking in the temporal domain's coherence, this study aims to address a comprehensive new task that can simultaneously capture fine-grained and long-term 3D motion in an online manner: long-term scene flow estimation (LSFE). We introduce SceneTracker, a novel learning-based LSFE net… ▽ More

    Submitted 6 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  26. arXiv:2403.15054  [pdf, other

    cs.RO

    Rethinking 6-Dof Grasp Detection: A Flexible Framework for High-Quality Grasping

    Authors: Wei Tang, Siang Chen, Pengwei Xie, Dingchang Hu, Wenming Yang, Guijin Wang

    Abstract: Robotic grasping is a primitive skill for complex tasks and is fundamental to intelligence. For general 6-Dof grasping, most previous methods directly extract scene-level semantic or geometric information, while few of them consider the suitability for various downstream applications, such as target-oriented grasping. Addressing this issue, we rethink 6-Dof grasp detection from a grasp-centric vie… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 8 pages, 8 figures

  27. arXiv:2403.10044  [pdf, other

    cs.CV

    SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model

    Authors: Tao Wu, Xuewei Li, Zhongang Qi, Di Hu, Xintao Wang, Ying Shan, Xi Li

    Abstract: Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains.However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation.In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI2024

  28. Pre-trained Transformer-Enabled Strategies with Human-Guided Fine-Tuning for End-to-end Navigation of Autonomous Vehicles

    Authors: Dong Hu, Chao Huang, Jingda Wu, Hongbo Gao

    Abstract: Autonomous driving (AD) technology, leveraging artificial intelligence, strives for vehicle automation. End-toend strategies, emerging to simplify traditional driving systems by integrating perception, decision-making, and control, offer new avenues for advanced driving functionalities. Despite their potential, current challenges include data efficiency, training complexities, and poor generalizat… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 11 pages, 7 figures, references added

  29. arXiv:2402.08581  [pdf, other

    cs.CL

    Improving Factual Error Correction for Abstractive Summarization via Data Distillation and Conditional-generation Cloze

    Authors: Yiyang Li, Lei Li, Dingxin Hu, Xueyi Hao, Marina Litvak, Natalia Vanetik, Yanquan Zhou

    Abstract: Improving factual consistency in abstractive summarization has been a focus of current research. One promising approach is the post-editing method. However, previous works have yet to make sufficient use of factual factors in summaries and suffers from the negative effect of the training datasets. In this paper, we first propose a novel factual error correction model FactCloze based on a condition… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: manuscript

  30. arXiv:2402.06244  [pdf, other

    cs.CV cs.MM

    Quantifying and Enhancing Multi-modal Robustness with Modality Preference

    Authors: Zequn Yang, Yake Wei, Ce Liang, Di Hu

    Abstract: Multi-modal models have shown a promising capability to effectively integrate information from various sources, yet meanwhile, they are found vulnerable to pervasive perturbations, such as uni-modal attacks and missing conditions. To counter these perturbations, robust multi-modal representations are highly expected, which are positioned well away from the discriminative multi-modal decision bound… ▽ More

    Submitted 18 April, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: Accepted to ICLR 2024

  31. arXiv:2312.13933  [pdf, other

    cs.CL cs.LG

    Structured Probabilistic Coding

    Authors: Dou Hu, Lingwei Wei, Yaxin Liu, Wei Zhou, Songlin Hu

    Abstract: This paper presents a new supervised representation learning framework, namely structured probabilistic coding (SPC), to learn compact and informative representations from input related to the target task. SPC is an encoder-only probabilistic coding technology with a structured regularization from the target space. It can enhance the generalization ability of pre-trained language models for better… ▽ More

    Submitted 2 May, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 11 pages, accepted by AAAI 2024 (Oral)

  32. arXiv:2312.10115  [pdf, other

    cs.CV

    SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery

    Authors: Xin Guo, Jiangwei Lao, Bo Dang, Yingying Zhang, Lei Yu, Lixiang Ru, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingxiang Hu, Huimei He, Jian Wang, Jingdong Chen, Ming Yang, Yongjun Zhang, Yansheng Li

    Abstract: Prior studies on Remote Sensing Foundation Model (RSFM) reveal immense potential towards a generic model for Earth Observation. Nevertheless, these works primarily focus on a single modality without temporal and geo-context modeling, hampering their capabilities for diverse tasks. In this study, we present SkySense, a generic billion-scale model, pre-trained on a curated multi-modal Remote Sensing… ▽ More

    Submitted 22 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR2024

  33. arXiv:2312.08718  [pdf, other

    cs.RO

    Trajectory Planning and Tracking of Hybrid Flying-Crawling Quadrotors

    Authors: Dongnan Hu, Ruihao Xia, Xin Jin, Yang Tang

    Abstract: Hybrid Flying-Crawling Quadrotors (HyFCQs) are transformable robots with the ability of terrestrial and aerial hybrid motion. This article presents a trajectory planning and tracking framework designed for HyFCQs. In this framework, a terrestrial-aerial path-searching method with the crawling limitation of HyFCQs is proposed to guarantee the dynamical feasibility of trajectories. Additionally, a t… ▽ More

    Submitted 14 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  34. arXiv:2312.02358  [pdf, other

    cs.HC cs.AI

    Peer attention enhances student learning

    Authors: Songlin Xu, Dongyin Hu, Ru Wang, Xinyu Zhang

    Abstract: Human visual attention is susceptible to social influences. In education, peer effects impact student learning, but their precise role in modulating attention remains unclear. Our experiment (N=311) demonstrates that displaying peer visual attention regions when students watch online course videos enhances their focus and engagement. However, students retain adaptability in following peer attentio… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  35. arXiv:2312.01871  [pdf, other

    cs.CV

    FeaInfNet: Diagnosis in Medical Image with Feature-Driven Inference and Visual Explanations

    Authors: Yitao Peng, Lianghua He, Die Hu, Yihang Liu, Longzhen Yang, Shaohua Shang

    Abstract: Interpretable deep learning models have received widespread attention in the field of image recognition. Due to the unique multi-instance learning of medical images and the difficulty in identifying decision-making regions, many interpretability models that have been proposed still have problems of insufficient accuracy and interpretability in medical image disease diagnosis. To solve these proble… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  36. arXiv:2311.17104  [pdf, other

    cs.LG cs.AI q-bio.MN

    Single-Cell Deep Clustering Method Assisted by Exogenous Gene Information: A Novel Approach to Identifying Cell Types

    Authors: Dayu Hu, Ke Liang, Hao Yu, Xinwang Liu

    Abstract: In recent years, the field of single-cell data analysis has seen a marked advancement in the development of clustering methods. Despite advancements, most of these algorithms still concentrate on analyzing the provided single-cell matrix data. However, in medical applications, single-cell data often involves a wealth of exogenous information, including gene networks. Overlooking this aspect could… ▽ More

    Submitted 15 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  37. arXiv:2311.17103  [pdf, other

    q-bio.GN cs.AI cs.LG

    Single-cell Multi-view Clustering via Community Detection with Unknown Number of Clusters

    Authors: Dayu Hu, Zhibin Dong, Ke Liang, Jun Wang, Siwei Wang, Xinwang Liu

    Abstract: Single-cell multi-view clustering enables the exploration of cellular heterogeneity within the same cell from different views. Despite the development of several multi-view clustering methods, two primary challenges persist. Firstly, most existing methods treat the information from both single-cell RNA (scRNA) and single-cell Assay of Transposase Accessible Chromatin (scATAC) views as equally sign… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  38. arXiv:2311.13052  [pdf, other

    eess.IV cs.CV cs.LG

    Novel OCT mosaicking pipeline with Feature- and Pixel-based registration

    Authors: Jiacheng Wang, Hao Li, Dewei Hu, Yuankai K. Tao, Ipek Oguz

    Abstract: High-resolution Optical Coherence Tomography (OCT) images are crucial for ophthalmology studies but are limited by their relatively narrow field of view (FoV). Image mosaicking is a technique for aligning multiple overlapping images to obtain a larger FoV. Current mosaicking pipelines often struggle with substantial noise and considerable displacement between the input sub-fields. In this paper, w… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  39. arXiv:2311.07871  [pdf, other

    cs.CV

    Dual-channel Prototype Network for few-shot Classification of Pathological Images

    Authors: Hao Quan, Xinjia Li, Dayu Hu, Tianhang Nan, Xiaoyu Cui

    Abstract: In pathology, the rarity of certain diseases and the complexity in annotating pathological images significantly hinder the creation of extensive, high-quality datasets. This limitation impedes the progress of deep learning-assisted diagnostic systems in pathology. Consequently, it becomes imperative to devise a technology that can discern new disease categories from a minimal number of annotated e… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  40. arXiv:2311.07806  [pdf, other

    cs.CV

    Assessing Test-time Variability for Interactive 3D Medical Image Segmentation with Diverse Point Prompts

    Authors: Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

    Abstract: Interactive segmentation model leverages prompts from users to produce robust segmentation. This advancement is facilitated by prompt engineering, where interactive prompts serve as strong priors during test-time. However, this is an inherently subjective and hard-to-reproduce process. The variability in user expertise and inherently ambiguous boundaries in medical images can lead to inconsistent… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  41. arXiv:2311.05185  [pdf, other

    cs.LG cs.AI

    Mixture of Weak & Strong Experts on Graphs

    Authors: Hanqing Zeng, Hanjia Lyu, Diyi Hu, Yinglong Xia, Jiebo Luo

    Abstract: Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN. To adapt the… ▽ More

    Submitted 22 June, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: Accepted for publication in ICLR 2024

  42. arXiv:2311.02847  [pdf, other

    cs.RO cs.AI

    Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

    Authors: Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li

    Abstract: Generalizable articulated object manipulation is essential for home-assistant robots. Recent efforts focus on imitation learning from demonstrations or reinforcement learning in simulation, however, due to the prohibitive costs of real-world data collection and precise object simulation, it still remains challenging for these works to achieve broad adaptability across diverse articulated objects.… ▽ More

    Submitted 20 February, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: Accepted by ICRA 2024

  43. arXiv:2310.19721  [pdf, other

    eess.IV cs.CV

    Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained Image Foundation Models

    Authors: Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

    Abstract: To address prevalent issues in medical imaging, such as data acquisition challenges and label availability, transfer learning from natural to medical image domains serves as a viable strategy to produce reliable segmentation results. However, several existing barriers between domains need to be broken down, including addressing contrast discrepancies, managing anatomical variability, and adapting… ▽ More

    Submitted 13 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: updated acknowledgments and fixed typos

  44. arXiv:2309.16249  [pdf, other

    cs.CV

    FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding

    Authors: Pengxiang Wu, Siman Wang, Kevin Dela Rosa, Derek Hao Hu

    Abstract: Image retrieval is a fundamental task in computer vision. Despite recent advances in this field, many techniques have been evaluated on a limited number of domains, with a small number of instance categories. Notably, most existing works only consider domains like 3D landmarks, making it difficult to generalize the conclusions made by these works to other domains, e.g., logo and other 2D flat obje… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  45. arXiv:2309.11845  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification

    Authors: Meng Liu, Ke Liang, Dayu Hu, Hao Yu, Yue Liu, Lingyuan Meng, Wenxuan Tu, Sihang Zhou, Xinwang Liu

    Abstract: Audiovisual data is everywhere in this digital age, which raises higher requirements for the deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inheren… ▽ More

    Submitted 26 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: This work has been accepted by ACM MM 2023 for publication

  46. arXiv:2309.07929  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer

    Authors: Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li

    Abstract: Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio? In this work, we concentrate on the Audio-Visual Localization and Segmentation tasks but under the demanding zero-shot and few-shot scenarios. To achieve this goal, different from existing approaches that mostly employ the encoder-fusion-decoder paradigm… ▽ More

    Submitted 2 February, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by AAAI 2024

  47. arXiv:2309.06255  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Enhancing multimodal cooperation via sample-level modality valuation

    Authors: Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu

    Abstract: One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities. However most models often suffer from unsatisfactory multimodal cooperation which cannot jointly utilize all modalities well. Some methods are proposed to identify and enhance the worse learnt modality but they are often hard to provide the fine-grained observation of multimodal… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: Accepted by CVPR 2024

  48. Zero-shot information extraction from radiological reports using ChatGPT

    Authors: Danqing Hu, Bing Liu, Xiaofeng Zhu, Xudong Lu, Nan Wu

    Abstract: Electronic health records contain an enormous amount of valuable information, but many are recorded in free text. Information extraction is the strategy to transform the sequence of characters into structured data, which can be employed for secondary analysis. However, the traditional information extraction components, such as named entity recognition and relation extraction, require annotated dat… ▽ More

    Submitted 6 September, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

  49. arXiv:2309.01286  [pdf, other

    cs.CV

    MAP: Domain Generalization via Meta-Learning on Anatomy-Consistent Pseudo-Modalities

    Authors: Dewei Hu, Hao Li, Han Liu, Xing Yao, Jiacheng Wang, Ipek Oguz

    Abstract: Deep models suffer from limited generalization capability to unseen domains, which has severely hindered their clinical applicability. Specifically for the retinal vessel segmentation task, although the model is supposed to learn the anatomy of the target, it can be distracted by confounding factors like intensity and contrast. We propose Meta learning on Anatomy-consistent Pseudo-modalities (MAP)… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  50. arXiv:2308.16383  [pdf, other

    cs.CV cs.MM

    Separate and Locate: Rethink the Text in Text-based Visual Question Answering

    Authors: Chengyang Fang, Jiangnan Li, Liang Li, Can Ma, Dayong Hu

    Abstract: Text-based Visual Question Answering (TextVQA) aims at answering questions about the text in images. Most works in this field focus on designing network structures or pre-training tasks. All these methods list the OCR texts in reading order (from left to right and top to bottom) to form a sequence, which is treated as a natural language ``sentence''. However, they ignore the fact that most OCR wor… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023