Skip to main content

Showing 1–50 of 72 results for author: Dai, P

  1. arXiv:2407.04237  [pdf, other

    cs.CV cs.GR

    GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction

    Authors: Yuxuan Mu, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofeng Wu, Songcen Xu, Peng Dai, Youliang Yan, Li Cheng

    Abstract: We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted for ECCV 2024

  2. arXiv:2407.00367  [pdf, other

    cs.CV

    SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix

    Authors: Peng Dai, Feitong Tan, Qiangeng Xu, David Futschik, Ruofei Du, Sean Fanello, Xiaojuan Qi, Yinda Zhang

    Abstract: Video generation models have demonstrated great capabilities of producing impressive monocular videos, however, the generation of 3D stereoscopic video remains under-explored. We propose a pose-free and training-free approach for generating 3D stereoscopic videos using an off-the-shelf monocular video generation model. Our method warps a generated monocular video into camera views on stereoscopic… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 3D stereoscopic video generation, video diffusion, inpainting

  3. arXiv:2406.10569  [pdf, other

    cs.LG cs.CV

    MDA: An Interpretable Multi-Modal Fusion with Missing Modalities and Intrinsic Noise

    Authors: Lin Fan, Yafei Ou, Cenyang Zheng, Pengyu Dai, Tamotsu Kamishima, Masayuki Ikebe, Kenji Suzuki, Xun Gong

    Abstract: Multi-modal fusion is crucial in medical data research, enabling a comprehensive understanding of diseases and improving diagnostic performance by combining diverse modalities. However, multi-modal fusion faces challenges, including capturing interactions between modalities, addressing missing modalities, handling erroneous modal information, and ensuring interpretability. Many existing researcher… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    ACM Class: I.5.2; I.2.7; I.2.10; J.3

  4. arXiv:2404.17774  [pdf, other

    cs.CV cs.GR

    High-quality Surface Reconstruction using Gaussian Surfels

    Authors: Pinxuan Dai, Jiamin Xu, Wenxiang Xie, Xinguo Liu, Huamin Wang, Weiwei Xu

    Abstract: We propose a novel point-based representation, Gaussian surfels, to combine the advantages of the flexible optimization procedure in 3D Gaussian points and the surface alignment property of surfels. This is achieved by directly setting the z-scale of 3D Gaussian points to 0, effectively flattening the original 3D ellipsoid into a 2D ellipse. Such a design provides clear guidance to the optimizer.… ▽ More

    Submitted 29 April, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: Results added and improved

  5. arXiv:2404.08886  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM

    Authors: Henry Peng Zou, Gavin Heqing Yu, Ziwei Fan, Dan Bu, Han Liu, Peng Dai, Dongmei Jia, Cornelia Caragea

    Abstract: In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency of retailers. However, previous approaches to multimodal attribute value extraction often struggle with implicit attribute values embedded in images or text, rely heavily on extensive labeled data, and can easily confuse similar attribute values. To… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted by NAACL 2024 Industry Track

  6. arXiv:2403.19314  [pdf, other

    cs.CV

    Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction

    Authors: Xiaoyang Lyu, Chirui Chang, Peng Dai, Yang-Tian Sun, Xiaojuan Qi

    Abstract: Scene reconstruction from multi-view images is a fundamental problem in computer vision and graphics. Recent neural implicit surface reconstruction methods have achieved high-quality results; however, editing and manipulating the 3D geometry of reconstructed scenes remains challenging due to the absence of naturally decomposed object entities and complex object/background compositions. In this pap… ▽ More

    Submitted 30 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, accepted by CVPR 2024

  7. arXiv:2403.03561  [pdf, ps, other

    cs.CV

    HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

    Authors: Peng Dai, Yang Zhang, Tao Liu, Zhen Fan, Tianyuan Du, Zhuo Su, Xiaozheng Zheng, Zeming Li

    Abstract: It is especially challenging to achieve real-time human motion tracking on a standalone VR Head-Mounted Display (HMD) such as Meta Quest and PICO. In this paper, we propose HMD-Poser, the first unified approach to recover full-body motions using scalable sparse observations from HMD and body-worn IMUs. In particular, it can support a variety of input scenarios, such as HMD, HMD+2IMUs, HMD+3IMUs, e… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: CVPR2024 Accepted

  8. arXiv:2402.04587  [pdf, other

    cs.CV

    Sparse Anatomical Prompt Semi-Supervised Learning with Masked Image Modeling for CBCT Tooth Segmentation

    Authors: Pengyu Dai, Yafei Ou, Yang Liu, Yue Zhao

    Abstract: Accurate tooth identification and segmentation in Cone Beam Computed Tomography (CBCT) dental images can significantly enhance the efficiency and precision of manual diagnoses performed by dentists. However, existing segmentation methods are mainly developed based on large data volumes training, on which their annotations are extremely time-consuming. Meanwhile, the teeth of each class in CBCT den… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    ACM Class: I.4.6

  9. arXiv:2401.13505  [pdf, other

    cs.CV

    Generative Human Motion Stylization in Latent Space

    Authors: Chuan Guo, Yuxuan Mu, Xinxin Zuo, Peng Dai, Youliang Yan, Juwei Lu, Li Cheng

    Abstract: Human motion stylization aims to revise the style of an input motion while keeping its content unaltered. Unlike existing works that operate directly in pose space, we leverage the latent space of pretrained autoencoders as a more expressive and robust representation for motion extraction and infusion. Building upon this, we present a novel generative model that produces diverse stylization result… ▽ More

    Submitted 23 February, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted for ICLR2024

  10. arXiv:2401.11949  [pdf, other

    cs.CV

    Feature Denoising Diffusion Model for Blind Image Quality Assessment

    Authors: Xudong Li, Jingyuan Zheng, Runze Hu, Yan Zhang, Ke Li, Yunhang Shen, Xiawu Zheng, Yutao Liu, ShengChuan Zhang, Pingyang Dai, Rongrong Ji

    Abstract: Blind Image Quality Assessment (BIQA) aims to evaluate image quality in line with human perception, without reference benchmarks. Currently, deep learning BIQA methods typically depend on using features from high-level tasks for transfer learning. However, the inherent differences between BIQA and these high-level tasks inevitably introduce noise into the quality-aware features. In this paper, we… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  11. arXiv:2401.05750  [pdf, other

    cs.CV

    GO-NeRF: Generating Virtual Objects in Neural Radiance Fields

    Authors: Peng Dai, Feitong Tan, Xin Yu, Yinda Zhang, Xiaojuan Qi

    Abstract: Despite advances in 3D generation, the direct creation of 3D objects within an existing 3D scene represented as NeRF remains underexplored. This process requires not only high-quality 3D object generation but also seamless composition of the generated 3D content into the existing NeRF. To this end, we propose a new method, GO-NeRF, capable of utilizing scene context for high-quality and harmonious… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 12 pages

    MSC Class: ACM-class

  12. arXiv:2401.02173  [pdf, ps, other

    cs.CV cs.AI

    Prompt Decoupling for Text-to-Image Person Re-identification

    Authors: Weihao Li, Lei Tan, Pingyang Dai, Yan Zhang

    Abstract: Text-to-image person re-identification (TIReID) aims to retrieve the target person from an image gallery via a textual description query. Recently, pre-trained vision-language models like CLIP have attracted significant attention and have been widely utilized for this task due to their robust capacity for semantic concept learning and rich multi-modal knowledge. However, recent CLIP-based TIReID m… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  13. arXiv:2312.09262  [pdf, other

    cs.LG cs.AR

    Random resistive memory-based deep extreme point learning machine for unified visual processing

    Authors: Shaocong Wang, Yizhao Gao, Yi Li, Woyu Zhang, Yifei Yu, Bo Wang, Ning Lin, Hegan Chen, Yue Zhang, Yang Jiang, Dingchen Wang, Jia Chen, Peng Dai, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Xiaoxin Xu, Hayden So, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Visual sensors, including 3D LiDAR, neuromorphic DVS sensors, and conventional frame cameras, are increasingly integrated into edge-side intelligent machines. Realizing intensive multi-sensory data analysis directly on edge intelligent machines is crucial for numerous emerging edge applications, such as augmented and virtual reality and unmanned aerial vehicles, which necessitates unified data rep… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  14. arXiv:2312.06158  [pdf, other

    cs.CV

    Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

    Authors: Xudong Li, Timin Gao, Runze Hu, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Jingyuan Zheng, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Rongrong Ji

    Abstract: The current state-of-the-art No-Reference Image Quality Assessment (NR-IQA) methods typically rely on feature extraction from upstream semantic backbone networks, assuming that all extracted features are relevant. However, we make a key observation that not all features are beneficial, and some may even be harmful, necessitating careful selection. Empirically, we find that many image pairs with sm… ▽ More

    Submitted 26 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  15. arXiv:2312.00591  [pdf, other

    cs.CV cs.AI

    Less is More: Learning Reference Knowledge Using No-Reference Image Quality Assessment

    Authors: Xudong Li, Jingyuan Zheng, Xiawu Zheng, Runze Hu, Enwei Zhang, Yuting Gao, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Yan Zhang, Rongrong Ji

    Abstract: Image Quality Assessment (IQA) with reference images have achieved great success by imitating the human vision system, in which the image quality is effectively assessed by comparing the query image with its pristine reference image. However, for the images in the wild, it is quite difficult to access accurate reference images. We argue that it is possible to learn reference knowledge under the No… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  16. arXiv:2308.10490  [pdf, other

    cs.CV cs.AI cs.GR

    Texture Generation on 3D Meshes with Point-UV Diffusion

    Authors: Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Zhengzhe Liu, Xiaojuan Qi

    Abstract: In this work, we focus on synthesizing high-quality textures on 3D meshes. We present Point-UV diffusion, a coarse-to-fine pipeline that marries the denoising diffusion model with UV mapping to generate 3D consistent and high-quality texture images in UV space. We start with introducing a point diffusion model to synthesize low-frequency texture components with our tailored style guidance to tackl… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023, Oral

  17. arXiv:2306.15706  [pdf, other

    cs.CV

    Approximated Prompt Tuning for Vision-Language Pre-trained Models

    Authors: Qiong Wu, Shubin Huang, Yiyi Zhou, Pingyang Dai, Annan Shu, Guannan Jiang, Rongrong Ji

    Abstract: Prompt tuning is a parameter-efficient way to deploy large-scale pre-trained models to downstream tasks by adding task-specific tokens. In terms of vision-language pre-trained (VLP) models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks, which greatly exacerbates the already high computational overhead. In this paper,… ▽ More

    Submitted 21 August, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  18. arXiv:2304.12652  [pdf, other

    cs.CV

    Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur

    Authors: Peng Dai, Yinda Zhang, Xin Yu, Xiaoyang Lyu, Xiaojuan Qi

    Abstract: Rendering novel view images is highly desirable for many applications. Despite recent progress, it remains challenging to render high-fidelity and view-consistent novel views of large-scale scenes from in-the-wild images with inevitable artifacts (e.g., motion blur). To this end, we develop a hybrid neural rendering model that makes image-based representation and neural 3D representation join forc… ▽ More

    Submitted 9 July, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

  19. arXiv:2303.15181  [pdf, other

    cs.CV

    DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation

    Authors: Zhengzhe Liu, Peng Dai, Ruihui Li, Xiaojuan Qi, Chi-Wing Fu

    Abstract: In this paper, we present a new text-guided 3D shape generation approach DreamStone that uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data. The core of our approach is a two-stage feature-space alignment strategy that leverages a pre-trained single-view reconstruction (SVR) model to map CLIP featur… ▽ More

    Submitted 23 September, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  20. arXiv:2303.10976  [pdf, other

    cs.CV

    Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-identification

    Authors: Jiaer Xia, Lei Tan, Pingyang Dai, Mingbo Zhao, Yongjian Wu, Liujuan Cao

    Abstract: Occluded person re-identification (Re-ID) aims to address the potential occlusion problem when matching occluded or holistic pedestrians from different camera views. Many methods use the background as artificial occlusion and rely on attention networks to exclude noisy interference. However, the significant discrepancy between simple background occlusion and realistic occlusion can negatively impa… ▽ More

    Submitted 22 February, 2024; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: AAAI2024

  21. arXiv:2303.09152  [pdf, other

    cs.CV

    Learning a Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation

    Authors: Xiaoyang Lyu, Peng Dai, Zizhang Li, Dongyu Yan, Yi Lin, Yifan Peng, Xiaojuan Qi

    Abstract: Implicit neural rendering, which uses signed distance function (SDF) representation with geometric priors (such as depth or surface normal), has led to impressive progress in the surface reconstruction of large-scale scenes. However, applying this method to reconstruct a room-level scene from images may miss structures in low-intensity areas or small and thin objects. We conducted experiments on t… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  22. arXiv:2303.04347  [pdf, ps, other

    cs.NE

    Optimal ANN-SNN Conversion for High-accuracy and Ultra-low-latency Spiking Neural Networks

    Authors: Tong Bu, Wei Fang, Jianhao Ding, PengLin Dai, Zhaofei Yu, Tiejun Huang

    Abstract: Spiking Neural Networks (SNNs) have gained great attraction due to their distinctive properties of low power consumption and fast inference on neuromorphic hardware. As the most effective method to get deep SNNs, ANN-SNN conversion has achieved comparable performance as ANNs on large-scale datasets. Despite this, it requires long time-steps to match the firing rates of SNNs to the activation of AN… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Journal ref: International Conference on Learning Representations (2022)

  23. Multi-Behavior Graph Neural Networks for Recommender System

    Authors: Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Liefeng Bo

    Abstract: Recommender systems have been demonstrated to be effective to meet user's personalized interests for many online services (e.g., E-commerce and online advertising platforms). Recent years have witnessed the emerging success of many deep learning-based recommendation models for augmenting collaborative filtering architectures with various neural network architectures, such as multi-layer perceptron… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: Published at IEEE Transactions on Nueral Networks and Learning Systems, 2022

  24. arXiv:2302.01512  [pdf, other

    cs.CV

    Spectral Aware Softmax for Visible-Infrared Person Re-Identification

    Authors: Lei Tan, Pingyang Dai, Qixiang Ye, Mingliang Xu, Yongjian Wu, Rongrong Ji

    Abstract: Visible-infrared person re-identification (VI-ReID) aims to match specific pedestrian images from different modalities. Although suffering an extra modality discrepancy, existing methods still follow the softmax loss training paradigm, which is widely used in single-modality classification tasks. The softmax loss lacks an explicit penalty for the apparent modality gap, which adversely limits the p… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  25. arXiv:2302.00884  [pdf, other

    cs.CV

    Exploring Invariant Representation for Visible-Infrared Person Re-Identification

    Authors: Lei Tan, Yukang Zhang, Shengmei Shen, Yan Wang, Pingyang Dai, Xianming Lin, Yongjian Wu, Rongrong Ji

    Abstract: Cross-spectral person re-identification, which aims to associate identities to pedestrians across different spectra, faces a main challenge of the modality discrepancy. In this paper, we address the problem from both image-level and feature-level in an end-to-end hybrid learning framework named robust feature mining network (RFM). In particular, we observe that the reflective intensity of the same… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  26. arXiv:2301.12439  [pdf, other

    cs.CV

    Unsupervised Domain Adaptation on Person Re-Identification via Dual-level Asymmetric Mutual Learning

    Authors: Qiong Wu, Jiahan Li, Pingyang Dai, Qixiang Ye, Liujuan Cao, Yongjian Wu, Rongrong Ji

    Abstract: Unsupervised domain adaptation person re-identification (Re-ID) aims to identify pedestrian images within an unlabeled target domain with an auxiliary labeled source-domain dataset. Many existing works attempt to recover reliable identity information by considering multiple homogeneous networks. And take these generated labels to train the model in the target domain. However, these homogeneous net… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

  27. arXiv:2210.17386  [pdf, other

    cs.NI eess.SP

    Cooperative Sensing and Uploading for Quality-Cost Tradeoff of Digital Twins in VEC

    Authors: Kai Liu, Xincao Xu, Penglin Dai, Biwen Chen

    Abstract: Recent advances in sensing technologies, wireless communications, and computing paradigms drive the evolution of vehicles in becoming an intelligent and electronic consumer products. This paper investigates enabling digital twins in vehicular edge computing (DT-VEC) via cooperative sensing and uploading, and makes the first attempt to achieve the quality-cost tradeoff in DT-VEC. First, a DT-VEC ar… ▽ More

    Submitted 27 January, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: text overlap with arXiv:2209.12265

  28. Joint Task Offloading and Resource Optimization in NOMA-based Vehicular Edge Computing: A Game-Theoretic DRL Approach

    Authors: Xincao Xu, Kai Liu, Penglin Dai, Feiyu Jin, Hualing Ren, Choujun Zhan, Songtao Guo

    Abstract: Vehicular edge computing (VEC) becomes a promising paradigm for the development of emerging intelligent transportation systems. Nevertheless, the limited resources and massive transmission demands bring great challenges on implementing vehicular applications with stringent deadline requirements. This work presents a non-orthogonal multiple access (NOMA) based architecture in VEC, where heterogeneo… ▽ More

    Submitted 24 October, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

    Journal ref: Journal of Systems Architecture 134 (2023) 102780

  29. arXiv:2209.12265  [pdf, other

    cs.NI eess.SY

    Cooperative Sensing and Heterogeneous Information Fusion in VCPS: A Multi-agent Deep Reinforcement Learning Approach

    Authors: Xincao Xu, Kai Liu, Penglin Dai, Ruitao Xie, Jingjing Cao, Jiangtao Luo

    Abstract: Cooperative sensing and heterogeneous information fusion are critical to realize vehicular cyber-physical systems (VCPSs). This paper makes the first attempt to quantitatively measure the quality of VCPS by designing a new metric called Age of View (AoV). Specifically, we first present the system architecture where heterogeneous information can be cooperatively sensed and uploaded via vehicle-to-i… ▽ More

    Submitted 27 January, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

  30. arXiv:2209.04145  [pdf, other

    cs.CV

    ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation

    Authors: Zhengzhe Liu, Peng Dai, Ruihui Li, Xiaojuan Qi, Chi-Wing Fu

    Abstract: Text-guided 3D shape generation remains challenging due to the absence of large paired text-shape data, the substantial semantic gap between these two modalities, and the structural complexity of 3D shapes. This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities and to eliminate the need for pai… ▽ More

    Submitted 23 February, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

    Comments: ICLR 2023 spotlight

  31. arXiv:2208.09844  [pdf, other

    cs.CV

    CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification

    Authors: Qiong Wu, Jiaer Xia, Pingyang Dai, Yiyi Zhou, Yongjian Wu, Rongrong Ji

    Abstract: Visible-infrared person re-identification (VI-ReID) is a task of matching the same individuals across the visible and infrared modalities. Its main challenge lies in the modality gap caused by cameras operating on different spectra. Existing VI-ReID methods mainly focus on learning general features across modalities, often at the expense of feature discriminability. To address this issue, we prese… ▽ More

    Submitted 21 August, 2022; originally announced August 2022.

  32. arXiv:2207.09935  [pdf, other

    cs.CV

    Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing

    Authors: Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Jiajun Shen, Jia Li, Xiaojuan Qi

    Abstract: With the rapid development of mobile devices, modern widely-used mobile phones typically allow users to capture 4K resolution (i.e., ultra-high-definition) images. However, for image demoireing, a challenging task in low-level vision, existing works are generally carried out on low-resolution or synthetic images. Hence, the effectiveness of these methods on 4K resolution images is still unknown. I… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  33. arXiv:2207.09046  [pdf, other

    cs.CV

    Dynamic Prototype Mask for Occluded Person Re-Identification

    Authors: Lei Tan, Pingyang Dai, Rongrong Ji, Yongjian Wu

    Abstract: Although person re-identification has achieved an impressive improvement in recent years, the common occlusion case caused by different obstacles is still an unsettled issue in real application scenarios. Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part. Nevertheless, the inevitable domain gap between the assistant mode… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Accepted by ACM MM 2022

  34. arXiv:2205.15495  [pdf, other

    cs.CV

    Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

    Authors: Peng Dai, Yiqiang Feng, Renliang Weng, Changshui Zhang

    Abstract: The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. In this paper, we propose a novel solution named TransSTAM, which leverages Transformer to effectively model both the appearance features of each object and the spatial-temporal relationships among objects. TransSTAM consists of two major parts: (1) The encoder utilizes… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  35. arXiv:2205.09925  [pdf, other

    cs.AI cs.LG cs.NI

    On Jointly Optimizing Partial Offloading and SFC Mapping: A Cooperative Dual-agent Deep Reinforcement Learning Approach

    Authors: Xinhan Wang, Huanlai Xing, Fuhong Song, Shouxi Luo, Penglin Dai, Bowen Zhao

    Abstract: Multi-access edge computing (MEC) and network function virtualization (NFV) are promising technologies to support emerging IoT applications, especially those computation-intensive. In NFV-enabled MEC environment, service function chain (SFC), i.e., a set of ordered virtual network functions (VNFs), can be mapped on MEC servers. Mobile devices (MDs) can offload computation-intensive applications, w… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  36. arXiv:2204.10513  [pdf

    eess.IV cs.CV

    MIPR:Automatic Annotation of Medical Images with Pixel Rearrangement

    Authors: Pingping Dai, Haiming Zhu, Shuang Ge, Ruihan Zhang, Xiang Qian, Xi Li, Kehong Yuan

    Abstract: Most of the state-of-the-art semantic segmentation reported in recent years is based on fully supervised deep learning in the medical domain. How?ever, the high-quality annotated datasets require intense labor and domain knowledge, consuming enormous time and cost. Previous works that adopt semi?supervised and unsupervised learning are proposed to address the lack of anno?tated data through assist… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

  37. arXiv:2204.02957  [pdf, other

    cs.CV

    Video Demoireing with Relation-Based Temporal Consistency

    Authors: Peng Dai, Xin Yu, Lan Ma, Baoheng Zhang, Jia Li, Wenbo Li, Jiajun Shen, Xiaojuan Qi

    Abstract: Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras. Considering the increasing demands for capturing videos, we study how to remove such undesirable moire patterns in videos, namely video demoireing. To this end, we introduce the first hand-held video demoireing dataset with a dedicated data collection pipeline to e… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

  38. arXiv:2203.10507  [pdf

    eess.IV cs.CV cs.LG

    Soft-CP: A Credible and Effective Data Augmentation for Semantic Segmentation of Medical Lesions

    Authors: Pingping Dai, Licong Dong, Ruihan Zhang, Haiming Zhu, Jie Wu, Kehong Yuan

    Abstract: The medical datasets are usually faced with the problem of scarcity and data imbalance. Moreover, annotating large datasets for semantic segmentation of medical lesions is domain-knowledge and time-consuming. In this paper, we propose a new object-blend method(short in soft-CP) that combines the Copy-Paste augmentation method for semantic segmentation of medical lesions offline, ensuring the corre… ▽ More

    Submitted 20 March, 2022; originally announced March 2022.

    Comments: 9 pages, 6 figures, 1 table

  39. arXiv:2202.12028  [pdf, other

    eess.SP cs.LG

    Evolutionary Multi-Objective Reinforcement Learning Based Trajectory Control and Task Offloading in UAV-Assisted Mobile Edge Computing

    Authors: Fuhong Song, Huanlai Xing, Xinhan Wang, Shouxi Luo, Penglin Dai, Zhiwen Xiao, Bowen Zhao

    Abstract: This paper studies the trajectory control and task offloading (TCTO) problem in an unmanned aerial vehicle (UAV)-assisted mobile edge computing system, where a UAV flies along a planned trajectory to collect computation tasks from smart devices (SDs). We consider a scenario that SDs are not directly connected by the base station (BS) and the UAV has two roles to play: MEC server or wireless relay.… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  40. arXiv:2201.10761  [pdf, other

    cs.LG cs.CR cs.DC

    An Efficient and Robust System for Vertically Federated Random Forest

    Authors: Houpu Yao, Jiazhou Wang, Peng Dai, Liefeng Bo, Yanqing Chen

    Abstract: As there is a growing interest in utilizing data across multiple resources to build better machine learning models, many vertically federated learning algorithms have been proposed to preserve the data privacy of the participating organizations. However, the efficiency of existing vertically federated learning algorithms remains to be a big problem, especially when applied to large-scale real-worl… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

  41. Spatial-Temporal Sequential Hypergraph Network for Crime Prediction with Dynamic Multiplex Relation Learning

    Authors: Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Liefeng Bo, Xiyue Zhang, Tianyi Chen

    Abstract: Crime prediction is crucial for public safety and resource optimization, yet is very challenging due to two aspects: i) the dynamics of criminal patterns across time and space, crime events are distributed unevenly on both spatial and temporal domains; ii) time-evolving dependencies between different types of crimes (e.g., Theft, Robbery, Assault, Damage) which reveal fine-grained semantics of cri… ▽ More

    Submitted 23 April, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

    Comments: This paper has been published as a research paper at IJCAI 2021

  42. Multi-Behavior Enhanced Recommendation with Cross-Interaction Collaborative Relation Modeling

    Authors: Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Mengyin Lu, Liefeng Bo

    Abstract: Many previous studies aim to augment collaborative filtering with deep neural network techniques, so as to achieve better recommendation performance. However, most existing deep learning-based recommender systems are designed for modeling singular type of user-item interaction behavior, which can hardly distill the heterogeneous relations between user and item. In practical recommendation scenario… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

    Comments: Published on ICDE 2021

  43. arXiv:2112.11547  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Decompose the Sounds and Pixels, Recompose the Events

    Authors: Varshanth R. Rao, Md Ibrahim Khalil, Haoda Li, Peng Dai, Juwei Lu

    Abstract: In this paper, we propose a framework centering around a novel architecture called the Event Decomposition Recomposition Network (EDRNet) to tackle the Audio-Visual Event (AVE) localization problem in the supervised and weakly supervised settings. AVEs in the real world exhibit common unravelling patterns (termed as Event Progress Checkpoints (EPC)), which humans can perceive through the cooperati… ▽ More

    Submitted 21 December, 2021; originally announced December 2021.

    Comments: Accepted at AAAI 2022

  44. arXiv:2112.05883  [pdf, other

    cs.CV cs.LG

    Self-supervised Spatiotemporal Representation Learning by Exploiting Video Continuity

    Authors: Hanwen Liang, Niamul Quader, Zhixiang Chi, Lizhe Chen, Peng Dai, Juwei Lu, Yang Wang

    Abstract: Recent self-supervised video representation learning methods have found significant success by exploring essential properties of videos, e.g. speed, temporal order, etc. This work exploits an essential yet under-explored property of videos, the video continuity, to obtain supervision signals for self-supervised representation learning. Specifically, we formulate three novel continuity-related pret… ▽ More

    Submitted 12 January, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

  45. arXiv:2110.04038  [pdf, other

    cs.LG cs.AI

    Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network

    Authors: Xiyue Zhang, Chao Huang, Yong Xu, Lianghao Xia, Peng Dai, Liefeng Bo, Junbo Zhang, Yu Zheng

    Abstract: Accurate forecasting of citywide traffic flow has been playing critical role in a variety of spatial-temporal mining applications, such as intelligent traffic control and public risk assessment. While previous work has made significant efforts to learn traffic temporal dynamics and spatial dependencies, two key limitations exist in current models. First, only the neighboring spatial correlations a… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: Published as a paper at AAAI 2021

  46. Multiplex Behavioral Relation Learning for Recommendation via Memory Augmented Transformer Network

    Authors: Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Bo Zhang, Liefeng Bo

    Abstract: Capturing users' precise preferences is of great importance in various recommender systems (eg., e-commerce platforms), which is the basis of how to present personalized interesting product lists to individual users. In spite of significant progress has been made to consider relations between users and items, most of the existing recommendation techniques solely focus on singular type of user-item… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: Published as a full paper at SIGIR 2020

  47. arXiv:2110.04000  [pdf, other

    cs.IR cs.AI

    Knowledge-Enhanced Hierarchical Graph Transformer Network for Multi-Behavior Recommendation

    Authors: Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Xiyue Zhang, Hongsheng Yang, Jian Pei, Liefeng Bo

    Abstract: Accurate user and item embedding learning is crucial for modern recommender systems. However, most existing recommendation techniques have thus far focused on modeling users' preferences over singular type of user-item interactions. Many practical recommendation scenarios involve multi-typed user interactive behaviors (e.g., page view, add-to-favorite and purchase), which presents unique challenge… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  48. arXiv:2110.03996  [pdf, other

    cs.IR cs.AI

    Graph-Enhanced Multi-Task Learning of Multi-Level Transition Dynamics for Session-based Recommendation

    Authors: Chao Huang, Jiahui Chen, Lianghao Xia, Yong Xu, Peng Dai, Yanqing Chen, Liefeng Bo, Jiashu Zhao, Jimmy Xiangji Huang

    Abstract: Session-based recommendation plays a central role in a wide spectrum of online applications, ranging from e-commerce to online advertising services. However, the majority of existing session-based recommendation techniques (e.g., attention-based recurrent network or graph neural network) are not well-designed for capturing the complex transition dynamics exhibited with temporally-ordered and multi… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: Published as a paper at AAAI 2021

  49. arXiv:2110.03987  [pdf, other

    cs.IR cs.AI

    Knowledge-aware Coupled Graph Neural Network for Social Recommendation

    Authors: Chao Huang, Huance Xu, Yong Xu, Peng Dai, Lianghao Xia, Mengyin Lu, Liefeng Bo, Hao Xing, Xiaoping Lai, Yanfang Ye

    Abstract: Social recommendation task aims to predict users' preferences over items with the incorporation of social connections among users, so as to alleviate the sparse issue of collaborative filtering. While many recent efforts show the effectiveness of neural network-based social recommender systems, several important challenges have not been well addressed yet: (i) The majority of models only consider… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: Published as a paper at AAAI 2021

  50. Graph Meta Network for Multi-Behavior Recommendation

    Authors: Lianghao Xia, Yong Xu, Chao Huang, Peng Dai, Liefeng Bo

    Abstract: Modern recommender systems often embed users and items into low-dimensional latent representations, based on their observed interactions. In practical recommendation scenarios, users often exhibit various intents which drive them to interact with items with multiple behavior types (e.g., click, tag-as-favorite, purchase). However, the diversity of user behaviors is ignored in most of the existing… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: Published as a full paper at SIGIR 2021