subscribe to arXiv mailings

An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges

Authors: Laifa Tao, Shangyu Li, Haifei Liu, Qixuan Huang, Liang Ma, Guoao Ning, Yiling Chen, Yunlong Wu, Bin Li, Weiwei Zhang, Zhengduo Zhao, Wenchao Zhan, Wenyan Cao, Chao Wang, Hongmei Liu, Jian Ma, Mingliang Suo, Yujie Cheng, Yu Ding, Dengwei Song, Chen Lu

Abstract: Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Larg… ▽ More Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Large Model, heralds a technological revolution with the potential to fundamentally reshape traditional technological fields and human production methods. Its capabilities, including strong generalization, reasoning, and generative attributes, present opportunities to address PHM's bottlenecks. To this end, based on a systematic analysis of the current challenges and bottlenecks in PHM, as well as the research status and advantages of Large Model, we propose a novel concept and three progressive paradigms of Prognosis and Health Management Large Model (PHM-LM) through the integration of the Large Model with PHM. Subsequently, we provide feasible technical approaches for PHM-LM to bolster PHM's core capabilities within the framework of the three paradigms. Moreover, to address core issues confronting PHM, we discuss a series of technical challenges of PHM-LM throughout the entire process of construction and application. This comprehensive effort offers a holistic PHM-LM technical framework, and provides avenues for new PHM technologies, methodologies, tools, platforms and applications, which also potentially innovates design, research & development, verification and application mode of PHM. And furthermore, a new generation of PHM with AI will also capably be realized, i.e., from custom to generalized, from discriminative to generative, and from theoretical conditions to practical applications. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.18204 [pdf, other]

Analysis of Channel Uncertainty in Trusted Wireless Services via Repeated Interactions

Authors: Bingwen Chen, Xintong Ling, Weihang Cao, Jiaheng Wang, Zhi Ding

Abstract: The coexistence of heterogeneous sub-networks in 6G poses new security and trust concerns and thus calls for a perimeterless-security model. Blockchain radio access network (B-RAN) provides a trust-building approach via repeated interactions rather than relying on pre-established trust or central authentication. Such a trust-building process naturally supports dynamic trusted services across vario… ▽ More The coexistence of heterogeneous sub-networks in 6G poses new security and trust concerns and thus calls for a perimeterless-security model. Blockchain radio access network (B-RAN) provides a trust-building approach via repeated interactions rather than relying on pre-established trust or central authentication. Such a trust-building process naturally supports dynamic trusted services across various service providers (SP) without the need for perimeter-based authentications; however, it remains vulnerable to environmental and system unreliability such as wireless channel uncertainty. In this study, we investigate channel unreliability in the trust-building framework based on repeated interactions for secure wireless services. We derive specific requirements for achieving cooperation between SP and client via a repeated game model and illustrate the implications of channel unreliability on sustaining trusted access services. We consider the framework optimization to guarantee SP-client cooperation, given a worst-case channel condition. Furthermore, we introduce the concept of cooperation region to represent the robustness of the trust-building process and explore the maximum cooperation area to enhance service resilience. Finally, we present simulations to demonstrate the system performance over fading channels and verify our results. △ Less

Submitted 2 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.16710 [pdf, other]

Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image

Authors: Jinkun Hao, Junshu Tang, Jiangning Zhang, Ran Yi, Yijia Hong, Moran Li, Weijian Cao, Yating Wang, Lizhuang Ma

Abstract: While recent works have achieved great success on one-shot 3D common object generation, high quality and fidelity 3D head generation from a single image remains a great challenge. Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry. To handle this challenging problem, we propose a novel framew… ▽ More While recent works have achieved great success on one-shot 3D common object generation, high quality and fidelity 3D head generation from a single image remains a great challenge. Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry. To handle this challenging problem, we propose a novel framework, Portrait3D, to generate high-quality 3D heads while preserving their identities. Our work incorporates the identity information of the portrait image into three parts: 1) geometry initialization, 2) geometry sculpting, and 3) texture generation stages. Given a reference portrait image, we first align the identity features with text features to realize ID-aware guidance enhancement, which contains the control signals representing the face information. We then use the canny map, ID features of the portrait image, and a pre-trained text-to-normal/depth diffusion model to generate ID-aware geometry supervision, and 3D-GAN inversion is employed to generate ID-aware geometry initialization. Furthermore, with the ability to inject identity information into 3D head generation, we use ID-aware guidance to calculate ID-aware Score Distillation (ISD) for geometry sculpting. For texture generation, we adopt the ID Consistent Texture Inpainting and Refinement which progressively expands the view for texture inpainting to obtain an initialization UV texture map. We then use the id-aware guidance to provide image-level supervision for noisy multi-view images to obtain a refined texture map. Extensive experiments demonstrate that we can generate high-quality 3D heads with accurate geometry and texture from single in-the-wild portrait images. The project page is at https://jinkun-hao.github.io/Portrait3D/. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: https://jinkun-hao.github.io/Portrait3D/

arXiv:2406.01380 [pdf, other]

Convolutional Unscented Kalman Filter for Multi-Object Tracking with Outliers

Authors: Shiqi Liu, Wenhan Cao, Chang Liu, Tianyi Zhang, Shengbo Eben Li

Abstract: Multi-object tracking (MOT) is an essential technique for navigation in autonomous driving. In tracking-by-detection systems, biases, false positives, and misses, which are referred to as outliers, are inevitable due to complex traffic scenarios. Recent tracking methods are based on filtering algorithms that overlook these outliers, leading to reduced tracking accuracy or even loss of the objects… ▽ More Multi-object tracking (MOT) is an essential technique for navigation in autonomous driving. In tracking-by-detection systems, biases, false positives, and misses, which are referred to as outliers, are inevitable due to complex traffic scenarios. Recent tracking methods are based on filtering algorithms that overlook these outliers, leading to reduced tracking accuracy or even loss of the objects trajectory. To handle this challenge, we adopt a probabilistic perspective, regarding the generation of outliers as misspecification between the actual distribution of measurement data and the nominal measurement model used for filtering. We further demonstrate that, by designing a convolutional operation, we can mitigate this misspecification. Incorporating this operation into the widely used unscented Kalman filter (UKF) in commonly adopted tracking algorithms, we derive a variant of the UKF that is robust to outliers, called the convolutional UKF (ConvUKF). We show that ConvUKF maintains the Gaussian conjugate property, thus allowing for real-time tracking. We also prove that ConvUKF has a bounded tracking error in the presence of outliers, which implies robust stability. The experimental results on the KITTI and nuScenes datasets show improved accuracy compared to representative baseline algorithms for MOT tasks. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 11 pages, 5 figures

arXiv:2405.19027 [pdf, other]

A Dual-functional Blockchain Framework for Solving Distributed Optimization

Authors: Weihang Cao, Xintong Ling, Jiaheng Wang, Xiqi Gao, Zhi Ding

Abstract: Proof of Work (PoW) has been extensively utilized as the foundation of blockchain's security, consistency, and tamper-resistance. However, long has it been criticized for its tremendous and inefficient utilization of computational power and energy. In this work, we design a dual-functional blockchain framework that uses solving optimization problems to reach consensus as an alternative to PoW, cha… ▽ More Proof of Work (PoW) has been extensively utilized as the foundation of blockchain's security, consistency, and tamper-resistance. However, long has it been criticized for its tremendous and inefficient utilization of computational power and energy. In this work, we design a dual-functional blockchain framework that uses solving optimization problems to reach consensus as an alternative to PoW, channeling wasted resources into useful work. We model and analyze our framework by developing discrete Markov chains, and derive the security conditions to ensure that selfish miners behave honestly. Based on the security conditions, we derive a lower bound for the security overhead and analyze the trade-off between useful work efficiency and PoW safeguard. We further dive deep into the reward function design for the proposed dual-functional blockchain and provide practical design guidelines for reward functions assuming concavity and linearity respectively. Finally, simulation results are presented to validate and illustrate our analytical results. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18156 [pdf, other]

VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

Authors: Qilin Wang, Zhengkai Jiang, Chengming Xu, Jiangning Zhang, Yabiao Wang, Xinyi Zhang, Yun Cao, Weijian Cao, Chengjie Wang, Yanwei Fu

Abstract: Human image animation involves generating a video from a static image by following a specified pose sequence. Current approaches typically adopt a multi-stage pipeline that separately learns appearance and motion, which often leads to appearance degradation and temporal inconsistencies. To address these issues, we propose VividPose, an innovative end-to-end pipeline based on Stable Video Diffusion… ▽ More Human image animation involves generating a video from a static image by following a specified pose sequence. Current approaches typically adopt a multi-stage pipeline that separately learns appearance and motion, which often leads to appearance degradation and temporal inconsistencies. To address these issues, we propose VividPose, an innovative end-to-end pipeline based on Stable Video Diffusion (SVD) that ensures superior temporal stability. To enhance the retention of human identity, we propose an identity-aware appearance controller that integrates additional facial information without compromising other appearance details such as clothing texture and background. This approach ensures that the generated videos maintain high fidelity to the identity of human subject, preserving key facial features across various poses. To accommodate diverse human body shapes and hand movements, we introduce a geometry-aware pose controller that utilizes both dense rendering maps from SMPL-X and sparse skeleton maps. This enables accurate alignment of pose and shape in the generated videos, providing a robust framework capable of handling a wide range of body shapes and dynamic hand movements. Extensive qualitative and quantitative experiments on the UBCFashion and TikTok benchmarks demonstrate that our method achieves state-of-the-art performance. Furthermore, VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset. Codes and models will be available. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.15763 [pdf, other]

FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis

Authors: Ke Fan, Junshu Tang, Weijian Cao, Ran Yi, Moran Li, Jingyu Gong, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Lizhuang Ma

Abstract: Text-to-motion synthesis is a crucial task in computer vision. Existing methods are limited in their universality, as they are tailored for single-person or two-person scenarios and can not be applied to generate motions for more individuals. To achieve the number-free motion synthesis, this paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditi… ▽ More Text-to-motion synthesis is a crucial task in computer vision. Existing methods are limited in their universality, as they are tailored for single-person or two-person scenarios and can not be applied to generate motions for more individuals. To achieve the number-free motion synthesis, this paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditional motion distribution. Furthermore, a generation module and an interaction module are designed for our FreeMotion framework to decouple the process of conditional motion generation and finally support the number-free motion synthesis. Besides, based on our framework, the current single-person motion spatial control method could be seamlessly integrated, achieving precise control of multi-person motion. Extensive experiments demonstrate the superior performance of our method and our capability to infer single and multi-human motions simultaneously. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.03008 [pdf, other]

DVMSR: Distillated Vision Mamba for Efficient Super-Resolution

Authors: Xiaoyan Lei, Wenlong Zhang, Weifeng Cao

Abstract: Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational comple… ▽ More Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates Vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several Vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at https://github.com/nathan66666/DVMSR.git △ Less

Submitted 11 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: 8 pages, 8 figures

arXiv:2405.00027 [pdf, other]

doi 10.5220/0012431300003660

Multidimensional Compressed Sensing for Spectral Light Field Imaging

Authors: Wen Cao, Ehsan Miandji, Jonas Unger

Abstract: This paper considers a compressive multi-spectral light field camera model that utilizes a one-hot spectralcoded mask and a microlens array to capture spatial, angular, and spectral information using a single monochrome sensor. We propose a model that employs compressed sensing techniques to reconstruct the complete multi-spectral light field from undersampled measurements. Unlike previous work wh… ▽ More This paper considers a compressive multi-spectral light field camera model that utilizes a one-hot spectralcoded mask and a microlens array to capture spatial, angular, and spectral information using a single monochrome sensor. We propose a model that employs compressed sensing techniques to reconstruct the complete multi-spectral light field from undersampled measurements. Unlike previous work where a light field is vectorized to a 1D signal, our method employs a 5D basis and a novel 5D measurement model, hence, matching the intrinsic dimensionality of multispectral light fields. We mathematically and empirically show the equivalence of 5D and 1D sensing models, and most importantly that the 5D framework achieves orders of magnitude faster reconstruction while requiring a small fraction of the memory. Moreover, our new multidimensional sensing model opens new research directions for designing efficient visual data acquisition algorithms and hardware. △ Less

Submitted 27 February, 2024; originally announced May 2024.

Comments: 8 pages, published of VISAPP 2024

Journal ref: In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP 2024, ISBN 978-989-758-679-8, ISSN 2184-4321, pages 349-356

arXiv:2404.16687 [pdf, other]

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2404.04936 [pdf, other]

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

Authors: Weiwei Cao, Jianpeng Zhang, Yingda Xia, Tony C. W. Mok, Zi Li, Xianghua Ye, Le Lu, Jian Zheng, Yuxing Tang, Ling Zhang

Abstract: Radiologists highly desire fully automated versatile AI for medical imaging interpretation. However, the lack of extensively annotated large-scale multi-disease datasets has hindered the achievement of this goal. In this paper, we explore the feasibility of leveraging language as a naturally high-quality supervision for chest CT imaging. In light of the limited availability of image-report pairs,… ▽ More Radiologists highly desire fully automated versatile AI for medical imaging interpretation. However, the lack of extensively annotated large-scale multi-disease datasets has hindered the achievement of this goal. In this paper, we explore the feasibility of leveraging language as a naturally high-quality supervision for chest CT imaging. In light of the limited availability of image-report pairs, we bootstrap the understanding of 3D chest CT images by distilling chest-related diagnostic knowledge from an extensively pre-trained 2D X-ray expert model. Specifically, we propose a language-guided retrieval method to match each 3D CT image with its semantically closest 2D X-ray image, and perform pair-wise and semantic relation knowledge distillation. Subsequently, we use contrastive learning to align images and reports within the same patient while distinguishing them from the other patients. However, the challenge arises when patients have similar semantic diagnoses, such as healthy patients, potentially confusing if treated as negatives. We introduce a robust contrastive learning that identifies and corrects these false negatives. We train our model with over 12,000 pairs of chest CT images and radiology reports. Extensive experiments across multiple scenarios, including zero-shot learning, report generation, and fine-tuning processes, demonstrate the model's feasibility in interpreting chest CT images. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.00481 [pdf, other]

Convolutional Bayesian Filtering

Authors: Wenhan Cao, Shiqi Liu, Chang Liu, Zeyu He, Stephen S. -T. Yau, Shengbo Eben Li

Abstract: Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence proba… ▽ More Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence probability of one event, given the second event. In this paper, we find that by adding an additional event that stipulates an inequality condition, we can transform the conditional probability into a special integration that is analogous to convolution. Based on this transformation, we show that both transition probability and output probability can be generalized to convolutional forms, resulting in a more general filtering framework that we call convolutional Bayesian filtering. This new framework encompasses standard Bayesian filtering as a special case when the distance metric of the inequality condition is selected as Dirac delta function. It also allows for a more nuanced consideration of model mismatch by choosing different types of inequality conditions. For instance, when the distance metric is defined in a distributional sense, the transition probability and output probability can be approximated by simply rescaling them into fractional powers. Under this framework, a robust version of Kalman filter can be constructed by only altering the noise covariance matrix, while maintaining the conjugate nature of Gaussian distributions. Finally, we exemplify the effectiveness of our approach by reshaping classic filtering algorithms into convolutional versions, including Kalman filter, extended Kalman filter, unscented Kalman filter and particle filter. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.17664 [pdf, other]

DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation

Authors: Qilin Wang, Jiangning Zhang, Chengming Xu, Weijian Cao, Ying Tai, Yue Han, Yanhao Ge, Hong Gu, Chengjie Wang, Yanwei Fu

Abstract: Facial Appearance Editing (FAE) aims to modify physical attributes, such as pose, expression and lighting, of human facial images while preserving attributes like identity and background, showing great importance in photograph. In spite of the great progress in this area, current researches generally meet three challenges: low generation fidelity, poor attribute preservation, and inefficient infer… ▽ More Facial Appearance Editing (FAE) aims to modify physical attributes, such as pose, expression and lighting, of human facial images while preserving attributes like identity and background, showing great importance in photograph. In spite of the great progress in this area, current researches generally meet three challenges: low generation fidelity, poor attribute preservation, and inefficient inference. To overcome above challenges, this paper presents DiffFAE, a one-stage and highly-efficient diffusion-based framework tailored for high-fidelity FAE. For high-fidelity query attributes transfer, we adopt Space-sensitive Physical Customization (SPC), which ensures the fidelity and generalization ability by utilizing rendering texture derived from 3D Morphable Model (3DMM). In order to preserve source attributes, we introduce the Region-responsive Semantic Composition (RSC). This module is guided to learn decoupled source-regarding features, thereby better preserving the identity and alleviating artifacts from non-facial attributes such as hair, clothes, and background. We further introduce a consistency regularization for our pipeline to enhance editing controllability by leveraging prior knowledge in the attention matrices of diffusion model. Extensive experiments demonstrate the superiority of DiffFAE over existing methods, achieving state-of-the-art performance in facial appearance editing. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.12906 [pdf, other]

TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation

Authors: Yufei Liu, Junwei Zhu, Junshu Tang, Shijie Zhang, Jiangning Zhang, Weijian Cao, Chengjie Wang, Yunsheng Wu, Dongjin Huang

Abstract: Texturing 3D humans with semantic UV maps remains a challenge due to the difficulty of acquiring reasonably unfolded UV. Despite recent text-to-3D advancements in supervising multi-view renderings using large text-to-image (T2I) models, issues persist with generation speed, text consistency, and texture quality, resulting in data scarcity among existing datasets. We present TexDreamer, the first z… ▽ More Texturing 3D humans with semantic UV maps remains a challenge due to the difficulty of acquiring reasonably unfolded UV. Despite recent text-to-3D advancements in supervising multi-view renderings using large text-to-image (T2I) models, issues persist with generation speed, text consistency, and texture quality, resulting in data scarcity among existing datasets. We present TexDreamer, the first zero-shot multimodal high-fidelity 3D human texture generation model. Utilizing an efficient texture adaptation finetuning strategy, we adapt large T2I model to a semantic UV structure while preserving its original generalization capability. Leveraging a novel feature translator module, the trained model is capable of generating high-fidelity 3D human textures from either text or image within seconds. Furthermore, we introduce ArTicuLated humAn textureS (ATLAS), the largest high-resolution (1024 X 1024) 3D human texture dataset which contains 50k high-fidelity textures with text descriptions. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Project Page: https://ggxxii.github.io/texdreamer/

arXiv:2403.07954 [pdf, other]

Optimizing Polynomial Graph Filters: A Novel Adaptive Krylov Subspace Approach

Authors: Keke Huang, Wencai Cao, Hoang Ta, Xiaokui Xiao, Pietro Liò

Abstract: Graph Neural Networks (GNNs), known as spectral graph filters, find a wide range of applications in web networks. To bypass eigendecomposition, polynomial graph filters are proposed to approximate graph filters by leveraging various polynomial bases for filter training. However, no existing studies have explored the diverse polynomial graph filters from a unified perspective for optimization. In… ▽ More Graph Neural Networks (GNNs), known as spectral graph filters, find a wide range of applications in web networks. To bypass eigendecomposition, polynomial graph filters are proposed to approximate graph filters by leveraging various polynomial bases for filter training. However, no existing studies have explored the diverse polynomial graph filters from a unified perspective for optimization. In this paper, we first unify polynomial graph filters, as well as the optimal filters of identical degrees into the Krylov subspace of the same order, thus providing equivalent expressive power theoretically. Next, we investigate the asymptotic convergence property of polynomials from the unified Krylov subspace perspective, revealing their limited adaptability in graphs with varying heterophily degrees. Inspired by those facts, we design a novel adaptive Krylov subspace approach to optimize polynomial bases with provable controllability over the graph spectrum so as to adapt various heterophily graphs. Subsequently, we propose AdaptKry, an optimized polynomial graph filter utilizing bases from the adaptive Krylov subspaces. Meanwhile, in light of the diverse spectral properties of complex graphs, we extend AdaptKry by leveraging multiple adaptive Krylov bases without incurring extra training costs. As a consequence, extended AdaptKry is able to capture the intricate characteristics of graphs and provide insights into their inherent complexity. We conduct extensive experiments across a series of real-world datasets. The experimental results demonstrate the superior filtering capability of AdaptKry, as well as the optimized efficacy of the adaptive Krylov basis. △ Less

Submitted 20 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.04268 [pdf]

Qubit-Wise Architecture Search Method for Variational Quantum Circuits

Authors: Jialin Chen, Zhiqiang Cai, Ke Xu, Di Wu, Wei Cao

Abstract: Considering the noise level limit, one crucial aspect for quantum machine learning is to design a high-performing variational quantum circuit architecture with small number of quantum gates. As the classical neural architecture search (NAS), quantum architecture search methods (QAS) employ methods like reinforcement learning, evolutionary algorithms and supernet optimiza-tion to improve the search… ▽ More Considering the noise level limit, one crucial aspect for quantum machine learning is to design a high-performing variational quantum circuit architecture with small number of quantum gates. As the classical neural architecture search (NAS), quantum architecture search methods (QAS) employ methods like reinforcement learning, evolutionary algorithms and supernet optimiza-tion to improve the search efficiency. In this paper, we propose a novel qubit-wise architec-ture search (QWAS) method, which progres-sively search one-qubit configuration per stage, and combine with Monte Carlo Tree Search al-gorithm to find good quantum architectures by partitioning the search space into several good and bad subregions. The numerical experimental results indicate that our proposed method can balance the exploration and exploitation of cir-cuit performance and size in some real-world tasks, such as MNIST, Fashion and MOSI. As far as we know, QWAS achieves the state-of-art re-sults of all tasks in the terms of accuracy and circuit size. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.02905 [pdf, other]

MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model

Authors: Sen Wang, Jiangning Zhang, Weijian Cao, Xiaobin Hu, Moran Li, Xiaozhong Ji, Xin Tan, Mengtian Li, Zhifeng Xie, Chengjie Wang, Lizhuang Ma

Abstract: The body movements accompanying speech aid speakers in expressing their ideas. Co-speech motion generation is one of the important approaches for synthesizing realistic avatars. Due to the intricate correspondence between speech and motion, generating realistic and diverse motion is a challenging task. In this paper, we propose MMoFusion, a Multi-modal co-speech Motion generation framework based o… ▽ More The body movements accompanying speech aid speakers in expressing their ideas. Co-speech motion generation is one of the important approaches for synthesizing realistic avatars. Due to the intricate correspondence between speech and motion, generating realistic and diverse motion is a challenging task. In this paper, we propose MMoFusion, a Multi-modal co-speech Motion generation framework based on the diffusion model to ensure both the authenticity and diversity of generated motion. We propose a progressive fusion strategy to enhance the interaction of inter-modal and intra-modal, efficiently integrating multi-modal information. Specifically, we employ a masked style matrix based on emotion and identity information to control the generation of different motion styles. Temporal modeling of speech and motion is partitioned into style-guided specific feature encoding and shared feature encoding, aiming to learn both inter-modal and intra-modal features. Besides, we propose a geometric loss to enforce the joints' velocity and acceleration coherence among frames. Our framework generates vivid, diverse, and style-controllable motion of arbitrary length through inputting speech and editing identity and emotion. Extensive experiments demonstrate that our method outperforms current co-speech motion generation methods including upper body and challenging full body. △ Less

Submitted 17 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.17375 [pdf, other]

Impact of Computation in Integral Reinforcement Learning for Continuous-Time Control

Authors: Wenhan Cao, Wei Pan

Abstract: Integral reinforcement learning (IntRL) demands the precise computation of the utility function's integral at its policy evaluation (PEV) stage. This is achieved through quadrature rules, which are weighted sums of utility functions evaluated from state samples obtained in discrete time. Our research reveals a critical yet underexplored phenomenon: the choice of the computational method -- in this… ▽ More Integral reinforcement learning (IntRL) demands the precise computation of the utility function's integral at its policy evaluation (PEV) stage. This is achieved through quadrature rules, which are weighted sums of utility functions evaluated from state samples obtained in discrete time. Our research reveals a critical yet underexplored phenomenon: the choice of the computational method -- in this case, the quadrature rule -- can significantly impact control performance. This impact is traced back to the fact that computational errors introduced in the PEV stage can affect the policy iteration's convergence behavior, which in turn affects the learned controller. To elucidate how computation impacts control, we draw a parallel between IntRL's policy iteration and Newton's method applied to the Hamilton-Jacobi-Bellman equation. In this light, computational error in PEV manifests as an extra error term in each iteration of Newton's method, with its upper bound proportional to the computational error. Further, we demonstrate that when the utility function resides in a reproducing kernel Hilbert space (RKHS), the optimal quadrature is achievable by employing Bayesian quadrature with the RKHS-inducing kernel function. We prove that the local convergence rates for IntRL using the trapezoidal rule and Bayesian quadrature with a Matérn kernel to be $O(N^{-2})$ and $O(N^{-b})$, where $N$ is the number of evenly-spaced samples and $b$ is the Matérn kernel's smoothness parameter. These theoretical findings are finally validated by two canonical control tasks. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.16200 [pdf, other]

IR2: Information Regularization for Information Retrieval

Authors: Jianyou Wang, Kaicheng Wang, Xiaoyue Wang, Weili Cao, Ramamohan Paturi, Leon Bergen

Abstract: Effective information retrieval (IR) in settings with limited training data, particularly for complex queries, remains a challenging task. This paper introduces IR2, Information Regularization for Information Retrieval, a technique for reducing overfitting during synthetic data generation. This approach, representing a novel application of regularization techniques in synthetic data creation for I… ▽ More Effective information retrieval (IR) in settings with limited training data, particularly for complex queries, remains a challenging task. This paper introduces IR2, Information Regularization for Information Retrieval, a technique for reducing overfitting during synthetic data generation. This approach, representing a novel application of regularization techniques in synthetic data creation for IR, is tested on three recent IR tasks characterized by complex queries: DORIS-MAE, ArguAna, and WhatsThatBook. Experimental results indicate that our regularization techniques not only outperform previous synthetic query generation methods on the tasks considered but also reduce cost by up to 50%. Furthermore, this paper categorizes and explores three regularization methods at different stages of the query synthesis pipeline-input, prompt, and output-each offering varying degrees of performance improvement compared to models where no regularization is applied. This provides a systematic approach for optimizing synthetic data generation in data-limited, complex-query IR scenarios. All code, prompts and synthetic data are available at https://github.com/Info-Regularization/Information-Regularization. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: Accepted by LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation

arXiv:2402.16072 [pdf]

Demonstration of 3 V Programmable Josephson Junction Arrays Using Non-Integer-Multiple Logic

Authors: Wenhui Cao, Erkun Yang, Jinjin Li, Huan Qiao, Yuan Zhong, Qing Zhong, Da Xu, Xueshen Wang, Xiaolong Xu, Shijian Wang, Jian Chen

Abstract: This article demonstrates a new kind of programmable logic for the representation of an integer that can be used for the programmable Josephson voltage standard. It can enable the numbers of junctions in most bits to be variable integer values, which is different from normal binary logic or ternary logic. Consequently, missing junctions due to superconducting short circuits can be tolerated under… ▽ More This article demonstrates a new kind of programmable logic for the representation of an integer that can be used for the programmable Josephson voltage standard. It can enable the numbers of junctions in most bits to be variable integer values, which is different from normal binary logic or ternary logic. Consequently, missing junctions due to superconducting short circuits can be tolerated under this logic. This logic can also have nearly the same segmentation efficiency as ternary logic. The completeness of the sequences using this logic is proven by the recursive method in mathematics in this paper. After that, a new algorithm for the representation of integers is presented according to the proven process, and an analysis of the number of fault-tolerant junctions for each bit is provided. Although the first and second bits are not tolerant to missing junctions, bits beyond these can tolerate one to hundreds of missing junctions. Due to the non-fixed multiples between the bits of the sequence, this logic is called non-integer-multiple logic. Finally, the design and fabrication of a 3 V programmable Josephson junction array using this logic are described, and the measurements and analysis of the characteristic parameters are presented. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.14151 [pdf, other]

BIRCO: A Benchmark of Information Retrieval Tasks with Complex Objectives

Authors: Xiaoyue Wang, Jianyou Wang, Weili Cao, Kaicheng Wang, Ramamohan Paturi, Leon Bergen

Abstract: We present the Benchmark of Information Retrieval (IR) tasks with Complex Objectives (BIRCO). BIRCO evaluates the ability of IR systems to retrieve documents given multi-faceted user objectives. The benchmark's complexity and compact size make it suitable for evaluating large language model (LLM)-based information retrieval systems. We present a modular framework for investigating factors that may… ▽ More We present the Benchmark of Information Retrieval (IR) tasks with Complex Objectives (BIRCO). BIRCO evaluates the ability of IR systems to retrieve documents given multi-faceted user objectives. The benchmark's complexity and compact size make it suitable for evaluating large language model (LLM)-based information retrieval systems. We present a modular framework for investigating factors that may influence LLM performance on retrieval tasks, and identify a simple baseline model which matches or outperforms existing approaches and more complex alternatives. No approach achieves satisfactory performance on all benchmark tasks, suggesting that stronger models and new retrieval protocols are necessary to address complex user needs. △ Less

Submitted 3 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.04059 [pdf, other]

Deep Learning for Multivariate Time Series Imputation: A Survey

Authors: Jun Wang, Wenjie Du, Wei Cao, Keli Zhang, Wenjia Wang, Yuxuan Liang, Qingsong Wen

Abstract: The ubiquitous missing values cause the multivariate time series data to be partially observed, destroying the integrity of time series and hindering the effective time series data analysis. Recently deep learning imputation methods have demonstrated remarkable success in elevating the quality of corrupted time series data, subsequently enhancing performance in downstream tasks. In this paper, we… ▽ More The ubiquitous missing values cause the multivariate time series data to be partially observed, destroying the integrity of time series and hindering the effective time series data analysis. Recently deep learning imputation methods have demonstrated remarkable success in elevating the quality of corrupted time series data, subsequently enhancing performance in downstream tasks. In this paper, we conduct a comprehensive survey on the recently proposed deep learning imputation methods. First, we propose a taxonomy for the reviewed methods, and then provide a structured review of these methods by highlighting their strengths and limitations. We also conduct empirical experiments to study different methods and compare their enhancement for downstream tasks. Finally, the open issues for future research on multivariate time series imputation are pointed out. All code and configurations of this work, including a regularly maintained multivariate time series imputation paper list, can be found in the GitHub repository~\url{https://github.com/WenjieDu/Awesome\_Imputation}. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 9 pages, 1 figure, 5 tables, 58 referred papers

arXiv:2401.16827 [pdf]

3D-Printed Hydraulic Fluidic Logic Circuitry for Soft Robots

Authors: Yuxin Lin, Xinyi Zhou, Wenhan Cao

Abstract: Fluidic logic circuitry analogous to its electric counterpart could potentially provide soft robots with machine intelligence due to its supreme adaptability, dexterity, and seamless compatibility using state-of-the-art additive manufacturing processes. However, conventional microfluidic channel based circuitry suffers from limited driving force, while macroscopic pneumatic logic lacks timely resp… ▽ More Fluidic logic circuitry analogous to its electric counterpart could potentially provide soft robots with machine intelligence due to its supreme adaptability, dexterity, and seamless compatibility using state-of-the-art additive manufacturing processes. However, conventional microfluidic channel based circuitry suffers from limited driving force, while macroscopic pneumatic logic lacks timely responsivity and desirable accuracy. Producing heavy duty, highly responsive and integrated fluidic soft robotic circuitry for control and actuation purposes for biomedical applications has yet to be accomplished in a hydraulic manner. Here, we present a 3D printed hydraulic fluidic half-adder system, composing of three basic hydraulic fluidic logic building blocks: AND, OR, and NOT gates. Furthermore, a hydraulic soft robotic half-adder system is implemented using an XOR operation and modified dual NOT gate system based on an electrical oscillator structure. This half-adder system possesses binary arithmetic capability as a key component of arithmetic logic unit in modern computers. With slight modifications, it can realize the control over three different directions of deformation of a three degree-of-freedom soft actuation mechanism solely by changing the states of the two fluidic inputs. This hydraulic fluidic system utilizing a small number of inputs to control multiple distinct outputs, can alter the internal state of the circuit solely based on external inputs, holding significant promises for the development of microfluidics, fluidic logic, and intricate internal systems of untethered soft robots with machine intelligence. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 26 pages, 14 figures

arXiv:2401.06614 [pdf, other]

Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking

Authors: Wei Cao, Chang Luo, Biao Zhang, Matthias Nießner, Jiapeng Tang

Abstract: We introduce Motion2VecSets, a 4D diffusion model for dynamic surface reconstruction from point cloud sequences. While existing state-of-the-art methods have demonstrated success in reconstructing non-rigid objects using neural field representations, conventional feed-forward networks encounter challenges with ambiguous observations from noisy, partial, or sparse point clouds. To address these cha… ▽ More We introduce Motion2VecSets, a 4D diffusion model for dynamic surface reconstruction from point cloud sequences. While existing state-of-the-art methods have demonstrated success in reconstructing non-rigid objects using neural field representations, conventional feed-forward networks encounter challenges with ambiguous observations from noisy, partial, or sparse point clouds. To address these challenges, we introduce a diffusion model that explicitly learns the shape and motion distribution of non-rigid objects through an iterative denoising process of compressed latent representations. The diffusion-based priors enable more plausible and probabilistic reconstructions when handling ambiguous inputs. We parameterize 4D dynamics with latent sets instead of using global latent codes. This novel 4D representation allows us to learn local shape and deformation patterns, leading to more accurate non-linear motion capture and significantly improving generalizability to unseen motions and identities. For more temporally-coherent object tracking, we synchronously denoise deformation latent sets and exchange information across multiple frames. To avoid computational overhead, we designed an interleaved space and time attention block to alternately aggregate deformation latents along spatial and temporal domains. Extensive comparisons against state-of-the-art methods demonstrate the superiority of our Motion2VecSets in 4D reconstruction from various imperfect observations. More detailed information can be found at https://vveicao.github.io/projects/Motion2VecSets/. △ Less

Submitted 13 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.06196 [pdf]

VW-PINNs: A volume weighting method for PDE residuals in physics-informed neural networks

Authors: Jiahao Song, Wenbo Cao, Fei Liao, Weiwei Zhang

Abstract: Physics-informed neural networks (PINNs) have shown remarkable prospects in the solving the forward and inverse problems involving partial differential equations (PDEs). The method embeds PDEs into the neural network by calculating PDE loss at a series of collocation points, providing advantages such as meshfree and more convenient adaptive sampling. However, when solving PDEs using nonuniform col… ▽ More Physics-informed neural networks (PINNs) have shown remarkable prospects in the solving the forward and inverse problems involving partial differential equations (PDEs). The method embeds PDEs into the neural network by calculating PDE loss at a series of collocation points, providing advantages such as meshfree and more convenient adaptive sampling. However, when solving PDEs using nonuniform collocation points, PINNs still face challenge regarding inefficient convergence of PDE residuals or even failure. In this work, we first analyze the ill-conditioning of the PDE loss in PINNs under nonuniform collocation points. To address the issue, we define volume-weighted residual and propose volume-weighted physics-informed neural networks (VW-PINNs). Through weighting the PDE residuals by the volume that the collocation points occupy within the computational domain, we embed explicitly the spatial distribution characteristics of collocation points in the residual evaluation. The fast and sufficient convergence of the PDE residuals for the problems involving nonuniform collocation points is guaranteed. Considering the meshfree characteristics of VW-PINNs, we also develop a volume approximation algorithm based on kernel density estimation to calculate the volume of the collocation points. We verify the universality of VW-PINNs by solving the forward problems involving flow over a circular cylinder and flow over the NACA0012 airfoil under different inflow conditions, where conventional PINNs fail; By solving the Burgers' equation, we verify that VW-PINNs can enhance the efficiency of existing the adaptive sampling method in solving the forward problem by 3 times, and can reduce the relative error of conventional PINNs in solving the inverse problem by more than one order of magnitude. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2312.13537 [pdf, other]

HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in Image Editing via Hypernetworks

Authors: Hai Zhang, Chunwei Wu, Guitao Cao, Hailing Wang, Wenming Cao

Abstract: Editing real images authentically while also achieving cross-domain editing remains a challenge. Recent studies have focused on converting real images into latent codes and accomplishing image editing by manipulating these codes. However, merely manipulating the latent codes would constrain the edited images to the generator's image domain, hindering the attainment of diverse editing goals. In res… ▽ More Editing real images authentically while also achieving cross-domain editing remains a challenge. Recent studies have focused on converting real images into latent codes and accomplishing image editing by manipulating these codes. However, merely manipulating the latent codes would constrain the edited images to the generator's image domain, hindering the attainment of diverse editing goals. In response, we propose an innovative image editing method called HyperEditor, which utilizes weight factors generated by hypernetworks to reassign the weights of the pre-trained StyleGAN2's generator. Guided by CLIP's cross-modal image-text semantic alignment, this innovative approach enables us to simultaneously accomplish authentic attribute editing and cross-domain style transfer, a capability not realized in previous methods. Additionally, we ascertain that modifying only the weights of specific layers in the generator can yield an equivalent editing result. Therefore, we introduce an adaptive layer selector, enabling our hypernetworks to autonomously identify the layers requiring output weight factors, which can further improve our hypernetworks' efficiency. Extensive experiments on abundant challenging datasets demonstrate the effectiveness of our method. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI2024

arXiv:2310.16491 [pdf]

TSONN: Time-stepping-oriented neural network for solving partial differential equations

Authors: Wenbo Cao, Weiwei Zhang

Abstract: Deep neural networks (DNNs), especially physics-informed neural networks (PINNs), have recently become a new popular method for solving forward and inverse problems governed by partial differential equations (PDEs). However, these methods still face challenges in achieving stable training and obtaining correct results in many problems, since minimizing PDE residuals with PDE-based soft constraint… ▽ More Deep neural networks (DNNs), especially physics-informed neural networks (PINNs), have recently become a new popular method for solving forward and inverse problems governed by partial differential equations (PDEs). However, these methods still face challenges in achieving stable training and obtaining correct results in many problems, since minimizing PDE residuals with PDE-based soft constraint make the problem ill-conditioned. Different from all existing methods that directly minimize PDE residuals, this work integrates time-stepping method with deep learning, and transforms the original ill-conditioned optimization problem into a series of well-conditioned sub-problems over given pseudo time intervals. The convergence of model training is significantly improved by following the trajectory of the pseudo time-stepping process, yielding a robust optimization-based PDE solver. Our results show that the proposed method achieves stable training and correct results in many problems that standard PINNs fail to solve, requiring only a simple modification on the loss function. In addition, we demonstrate several novel properties and advantages of time-stepping methods within the framework of neural network-based optimization approach, in comparison to traditional grid-based numerical method. Specifically, explicit scheme allows significantly larger time step, while implicit scheme can be implemented as straightforwardly as explicit scheme. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.07402 [pdf, other]

NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining

Authors: Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, Zhirong Wu

Abstract: Recent research on time-series self-supervised models shows great promise in learning semantic representations. However, it has been limited to small-scale datasets, e.g., thousands of temporal sequences. In this work, we make key technical contributions that are tailored to the numerical properties of time-series data and allow the model to scale to large datasets, e.g., millions of temporal sequ… ▽ More Recent research on time-series self-supervised models shows great promise in learning semantic representations. However, it has been limited to small-scale datasets, e.g., thousands of temporal sequences. In this work, we make key technical contributions that are tailored to the numerical properties of time-series data and allow the model to scale to large datasets, e.g., millions of temporal sequences. We adopt the Transformer architecture by first partitioning the input into non-overlapping windows. Each window is then characterized by its normalized shape and two scalar values denoting the mean and standard deviation within each window. To embed scalar values that may possess arbitrary numerical amplitudes in a high-dimensional space, we propose a numerically multi-scaled embedding module enumerating all possible numerical scales for the scalars. The model undergoes pretraining with a simple contrastive objective on a large-scale dataset over a million sequences collected by merging existing public data. We study its transfer performance on a number of univariate and multivariate classification tasks, few shot learning, unsupervised clustering and anomaly detection benchmarks. Our method exhibits remarkable improvement against previous pretraining approaches and establishes the new state of the art, even compared with domain-specific non-learning-based methods. Code is available at: \url{https://github.com/chenguolin/NuTime}. △ Less

Submitted 10 July, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: Accepted by TMLR 2024

arXiv:2309.07808 [pdf, other]

What Matters to Enhance Traffic Rule Compliance of Imitation Learning for Automated Driving

Authors: Hongkuan Zhou, Aifen Sui, Wei Cao, Zhenshan Bing

Abstract: More research attention has recently been given to end-to-end autonomous driving technologies where the entire driving pipeline is replaced with a single neural network because of its simpler structure and faster inference time. Despite this appealing approach largely reducing the components in the driving pipeline, its simplicity also leads to interpretability problems and safety issues. The trai… ▽ More More research attention has recently been given to end-to-end autonomous driving technologies where the entire driving pipeline is replaced with a single neural network because of its simpler structure and faster inference time. Despite this appealing approach largely reducing the components in the driving pipeline, its simplicity also leads to interpretability problems and safety issues. The trained policy is not always compliant with the traffic rules and it is also hard to discover the reason for the misbehavior because of the lack of intermediate outputs. Meanwhile, sensors are also critical to autonomous driving's security and feasibility to perceive the surrounding environment under complex driving scenarios. In this paper, we proposed P-CSG, a penalty-based imitation learning approach with cross semantics generation sensor fusion technologies to increase the overall performance of end-to-end autonomous driving. In this method, we introduce three penalties - red light, stop sign, and curvature speed penalty to make the agent more sensitive to traffic rules. The proposed cross semantics generation helps to align the shared information from different input modalities. We assessed our model's performance using the CARLA leaderboard - Town 05 Long benchmark and Longest6 Benchmark, achieving an impressive driving score improvement. Furthermore, we conducted robustness evaluations against adversarial attacks like FGSM and Dot attacks, revealing a substantial increase in robustness compared to baseline models. More detailed information, such as code base resources, and videos can be found at https://hk-zh.github.io/p-csg-plus. △ Less

Submitted 20 February, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: 10 pages, 2 figures

arXiv:2308.16406 [pdf, other]

CktGNN: Circuit Graph Neural Network for Electronic Design Automation

Authors: Zehao Dong, Weidong Cao, Muhan Zhang, Dacheng Tao, Yixin Chen, Xuan Zhang

Abstract: The electronic design automation of analog circuits has been a longstanding challenge in the integrated circuit field due to the huge design space and complex design trade-offs among circuit specifications. In the past decades, intensive research efforts have mostly been paid to automate the transistor sizing with a given circuit topology. By recognizing the graph nature of circuits, this paper pr… ▽ More The electronic design automation of analog circuits has been a longstanding challenge in the integrated circuit field due to the huge design space and complex design trade-offs among circuit specifications. In the past decades, intensive research efforts have mostly been paid to automate the transistor sizing with a given circuit topology. By recognizing the graph nature of circuits, this paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing based on the encoder-dependent optimization subroutines. Particularly, CktGNN encodes circuit graphs using a two-level GNN framework (of nested GNN) where circuits are represented as combinations of subgraphs in a known subgraph basis. In this way, it significantly improves design efficiency by reducing the number of subgraphs to perform message passing. Nonetheless, another critical roadblock to advancing learning-assisted circuit design automation is a lack of public benchmarks to perform canonical assessment and reproducible research. To tackle the challenge, we introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers with carefully-extracted circuit specifications. OCB is also equipped with communicative circuit generation and evaluation capabilities such that it can help to generalize CktGNN to design various analog circuits by producing corresponding datasets. Experiments on OCB show the extraordinary advantages of CktGNN through representation-based optimization frameworks over other recent powerful GNN baselines and human experts' manual designs. Our work paves the way toward a learning-based open-sourced design automation for analog circuits. Our source code is available at \url{https://github.com/zehao-dong/CktGNN}. △ Less

Submitted 9 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Accepted by ICLR (International Conference on Learning Representations) 2023

arXiv:2307.13837 [pdf, other]

Scaling Integer Arithmetic in Probabilistic Programs

Authors: William X. Cao, Poorva Garg, Ryan Tjoa, Steven Holtzen, Todd Millstein, Guy Van den Broeck

Abstract: Distributions on integers are ubiquitous in probabilistic modeling but remain challenging for many of today's probabilistic programming languages (PPLs). The core challenge comes from discrete structure: many of today's PPL inference strategies rely on enumeration, sampling, or differentiation in order to scale, which fail for high-dimensional complex discrete distributions involving integers. Our… ▽ More Distributions on integers are ubiquitous in probabilistic modeling but remain challenging for many of today's probabilistic programming languages (PPLs). The core challenge comes from discrete structure: many of today's PPL inference strategies rely on enumeration, sampling, or differentiation in order to scale, which fail for high-dimensional complex discrete distributions involving integers. Our insight is that there is structure in arithmetic that these approaches are not using. We present a binary encoding strategy for discrete distributions that exploits the rich logical structure of integer operations like summation and comparison. We leverage this structured encoding with knowledge compilation to perform exact probabilistic inference, and show that this approach scales to much larger integer distributions with arithmetic. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Accepted to UAI 2023

arXiv:2307.12309 [pdf, other]

Building Extraction from Remote Sensing Images via an Uncertainty-Aware Network

Authors: Wei He, Jiepan Li, Weinan Cao, Liangpei Zhang, Hongyan Zhang

Abstract: Building extraction aims to segment building pixels from remote sensing images and plays an essential role in many applications, such as city planning and urban dynamic monitoring. Over the past few years, deep learning methods with encoder-decoder architectures have achieved remarkable performance due to their powerful feature representation capability. Nevertheless, due to the varying scales and… ▽ More Building extraction aims to segment building pixels from remote sensing images and plays an essential role in many applications, such as city planning and urban dynamic monitoring. Over the past few years, deep learning methods with encoder-decoder architectures have achieved remarkable performance due to their powerful feature representation capability. Nevertheless, due to the varying scales and styles of buildings, conventional deep learning models always suffer from uncertain predictions and cannot accurately distinguish the complete footprints of the building from the complex distribution of ground objects, leading to a large degree of omission and commission. In this paper, we realize the importance of uncertain prediction and propose a novel and straightforward Uncertainty-Aware Network (UANet) to alleviate this problem. To verify the performance of our proposed UANet, we conduct extensive experiments on three public building datasets, including the WHU building dataset, the Massachusetts building dataset, and the Inria aerial image dataset. Results demonstrate that the proposed UANet outperforms other state-of-the-art algorithms by a large margin. △ Less

Submitted 23 July, 2023; originally announced July 2023.

arXiv:2307.03983 [pdf, ps, other]

Hybrid Successive Interference Cancellation and Power Adaptation: a Win-Win Strategy for Robust Uplink NOMA Transmission

Authors: Yanshi Sun, Wei Cao, Momiao Zhou, Zhiguo Ding

Abstract: The aim of this paper is to reveal the importance of hybrid successive interference cancellation (SIC) and power adaptation (PA) for improving transmission robustness of uplink non-orthogonal multiple access (NOMA). Particularly, a cognitive radio inspired uplink NOMA communication scenario is considered, where one primary user is allocated one dedicated resource block, while M secondary users com… ▽ More The aim of this paper is to reveal the importance of hybrid successive interference cancellation (SIC) and power adaptation (PA) for improving transmission robustness of uplink non-orthogonal multiple access (NOMA). Particularly, a cognitive radio inspired uplink NOMA communication scenario is considered, where one primary user is allocated one dedicated resource block, while M secondary users compete with each other to be opportunistically served by using the same resource block of the primary user. Two novel schemes are proposed for the considered scenario, namely hybrid SIC with PA (HSIC-PA) scheme and fixed SIC with PA (FSIC-PA) scheme. Both schemes can ensure that the secondary users are served without degrading the transmission reliability of the primary user compared to conventional orthogonal multiple access (OMA) based schemes. Rigorous analytical results are presented to evaluate the performance of the proposed two schemes. It is shown that both schemes can avoid outage probability error floors without any constraints on users' target rates in the high SNR regime. Furthermore, it is shown that the diversity gain achieved by the HSIC-PA scheme is M, while that of the FISC-PA scheme is only 1. Numerical results are provided to verify the developed analytical results and also demonstrate the superior performance achieved by the proposed schemes by comparing with the existing HSIC without PA (HSIC-NPA) scheme. The presented simulation results also show that HSIC-PA scheme performs the best among the three schemes, which indicates the importance of the combination of HSIC and PA for improving transmission robustness. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2307.01517

arXiv:2307.01517 [pdf, ps, other]

New Designs of Robust Uplink NOMA in Cognitive Radio Inspired Communications

Authors: Yanshi Sun, Wei Cao, Momiao Zhou, Zhiguo Ding

Abstract: This paper considers a cognitive radio inspired uplink communication scenario, where one primary user is allocated with one dedicated resource block, while $M$ secondary users compete with each other to opportunistically access the primary user's channel. Two new designs of NOMA schemes, namely hybrid successive interference cancellation with power adaptation (HSIC-PA) and fixed successive interfe… ▽ More This paper considers a cognitive radio inspired uplink communication scenario, where one primary user is allocated with one dedicated resource block, while $M$ secondary users compete with each other to opportunistically access the primary user's channel. Two new designs of NOMA schemes, namely hybrid successive interference cancellation with power adaptation (HSIC-PA) and fixed successive interference cancellation with power adaptation (FSIC-PA), are proposed. The significant advantages of the proposed schemes are two folds. First, the proposed two schemes can ensure that the secondary users are opportunistically served without degrading the transmission reliability of the primary user. Besides, the transmission robustness of the served secondary users can be guaranteed. Specifically, the outage probability error floors can be avoided for the secondary users, which is proved by asymptotic analysis in the paper. Extensive simulation results are also provided to demonstrate the superior performance of the proposed schemes. △ Less

Submitted 4 July, 2023; originally announced July 2023.

arXiv:2306.09368 [pdf, other]

doi 10.1145/3580305.3599543

Warpformer: A Multi-scale Modeling Approach for Irregular Clinical Time Series

Authors: Jiawen Zhang, Shun Zheng, Wei Cao, Jiang Bian, Jia Li

Abstract: Irregularly sampled multivariate time series are ubiquitous in various fields, particularly in healthcare, and exhibit two key characteristics: intra-series irregularity and inter-series discrepancy. Intra-series irregularity refers to the fact that time-series signals are often recorded at irregular intervals, while inter-series discrepancy refers to the significant variability in sampling rates… ▽ More Irregularly sampled multivariate time series are ubiquitous in various fields, particularly in healthcare, and exhibit two key characteristics: intra-series irregularity and inter-series discrepancy. Intra-series irregularity refers to the fact that time-series signals are often recorded at irregular intervals, while inter-series discrepancy refers to the significant variability in sampling rates among diverse series. However, recent advances in irregular time series have primarily focused on addressing intra-series irregularity, overlooking the issue of inter-series discrepancy. To bridge this gap, we present Warpformer, a novel approach that fully considers these two characteristics. In a nutshell, Warpformer has several crucial designs, including a specific input representation that explicitly characterizes both intra-series irregularity and inter-series discrepancy, a warping module that adaptively unifies irregular time series in a given scale, and a customized attention module for representation learning. Additionally, we stack multiple warping and attention modules to learn at different scales, producing multi-scale representations that balance coarse-grained and fine-grained signals for downstream tasks. We conduct extensive experiments on widely used datasets and a new large-scale benchmark built from clinical databases. The results demonstrate the superiority of Warpformer over existing state-of-the-art approaches. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: KDD23 Research Track

arXiv:2306.01997 [pdf, other]

UADB: Unsupervised Anomaly Detection Booster

Authors: Hangting Ye, Zhining Liu, Xinyi Shen, Wei Cao, Shun Zheng, Xiaofan Gui, Huishuai Zhang, Yi Chang, Jiang Bian

Abstract: Unsupervised Anomaly Detection (UAD) is a key data mining problem owing to its wide real-world applications. Due to the complete absence of supervision signals, UAD methods rely on implicit assumptions about anomalous patterns (e.g., scattered/sparsely/densely clustered) to detect anomalies. However, real-world data are complex and vary significantly across different domains. No single assumption… ▽ More Unsupervised Anomaly Detection (UAD) is a key data mining problem owing to its wide real-world applications. Due to the complete absence of supervision signals, UAD methods rely on implicit assumptions about anomalous patterns (e.g., scattered/sparsely/densely clustered) to detect anomalies. However, real-world data are complex and vary significantly across different domains. No single assumption can describe such complexity and be valid in all scenarios. This is also confirmed by recent research that shows no UAD method is omnipotent. Based on above observations, instead of searching for a magic universal winner assumption, we seek to design a general UAD Booster (UADB) that empowers any UAD models with adaptability to different data. This is a challenging task given the heterogeneous model structures and assumptions adopted by existing UAD methods. To achieve this, we dive deep into the UAD problem and find that compared to normal data, anomalies (i) lack clear structure/pattern in feature space, thus (ii) harder to learn by model without a suitable assumption, and finally, leads to (iii) high variance between different learners. In light of these findings, we propose to (i) distill the knowledge of the source UAD model to an imitation learner (booster) that holds no data assumption, then (ii) exploit the variance between them to perform automatic correction, and thus (iii) improve the booster over the original UAD model. We use a neural network as the booster for its strong expressive power as a universal approximator and ability to perform flexible post-hoc tuning. Note that UADB is a model-agnostic framework that can enhance heterogeneous UAD models in a unified way. Extensive experiments on over 80 tabular datasets demonstrate the effectiveness of UADB. △ Less

Submitted 26 December, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

Comments: IEEE 39th International Conference on Data Engineering (ICDE 2023)

arXiv:2306.00426 [pdf]

Speaker verification using attentive multi-scale convolutional recurrent network

Authors: Yanxiong Li, Zhongjie Jiang, Wenchang Cao, Qisheng Huang

Abstract: In this paper, we propose a speaker verification method by an Attentive Multi-scale Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local spatial information and global sequential information from the input speech recordings. In the proposed method, logarithm Mel spectrum is extracted from each speech recording and then fed to the proposed AMCRN for learning speaker em… ▽ More In this paper, we propose a speaker verification method by an Attentive Multi-scale Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local spatial information and global sequential information from the input speech recordings. In the proposed method, logarithm Mel spectrum is extracted from each speech recording and then fed to the proposed AMCRN for learning speaker embedding. Afterwards, the learned speaker embedding is fed to the back-end classifier (such as cosine similarity metric) for scoring in the testing stage. The proposed method is compared with state-of-the-art methods for speaker verification. Experimental data are three public datasets that are selected from two large-scale speech corpora (VoxCeleb1 and VoxCeleb2). Experimental results show that our method exceeds baseline methods in terms of equal error rate and minimal detection cost function, and has advantages over most of baseline methods in terms of computational complexity and memory requirement. In addition, our method generalizes well across truncated speech segments with different durations, and the speaker embedding learned by the proposed AMCRN has stronger generalization ability across two back-end classifiers. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 21 pages, 6 figures, 8 tables. Accepted for publication in Applied Soft Computing

arXiv:2305.18045 [pdf, ps, other]

Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes

Authors: Wei Xie, Yanxiong Li, Qianhua He, Wenchang Cao, Tuomas Virtanen

Abstract: New classes of sounds constantly emerge with a few samples, making it challenging for models to adapt to dynamic acoustic environments. This challenge motivates us to address the new problem of few-shot class-incremental audio classification. This study aims to enable a model to continuously recognize new classes of sounds with a few training samples of new classes while remembering the learned on… ▽ More New classes of sounds constantly emerge with a few samples, making it challenging for models to adapt to dynamic acoustic environments. This challenge motivates us to address the new problem of few-shot class-incremental audio classification. This study aims to enable a model to continuously recognize new classes of sounds with a few training samples of new classes while remembering the learned ones. To this end, we propose a method to generate discriminative prototypes and use them to expand the model's classifier for recognizing sounds of new and learned classes. The model is first trained with a random episodic training strategy, and then its backbone is used to generate the prototypes. A dynamic relation projection module refines the prototypes to enhance their discriminability. Results on two datasets (derived from the corpora of Nsynth and FSD-MIX-CLIPS) show that the proposed method exceeds three state-of-the-art methods in average accuracy and performance dropping rate. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 5 pages,2 figures, Accepted by Interspeech 2023

arXiv:2305.05320 [pdf, ps, other]

Minimal Linear Codes Constructed from partial spreads

Authors: W. Lu, X. Wu, X. W. Cao, G. J. Luo, X. P. Qin

Abstract: Partial spread is important in finite geometry and can be used to construct linear codes. From the results in (Designs, Codes and Cryptography 90:1-15, 2022) by Xia Li, Qin Yue and Deng Tang, we know that if the number of the elements in a partial spread is ``big enough", then the corresponding linear code is minimal. They used the sufficient condition in (IEEE Trans. Inf. Theory 44(5): 2010-201… ▽ More Partial spread is important in finite geometry and can be used to construct linear codes. From the results in (Designs, Codes and Cryptography 90:1-15, 2022) by Xia Li, Qin Yue and Deng Tang, we know that if the number of the elements in a partial spread is ``big enough", then the corresponding linear code is minimal. They used the sufficient condition in (IEEE Trans. Inf. Theory 44(5): 2010-2017, 1998) to prove the minimality of such linear codes. In this paper, we use the geometric approach to study the minimality of linear codes constructed from partial spreads in all cases. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2305.05317 [pdf, ps, other]

Minimal Linear Codes Constructed from hierarchical posets with two levels

Authors: X. Wu, W. Lu, X. P. Qin, X. W. Cao

Abstract: J. Y. Hyun, et al. (Des. Codes Cryptogr., vol. 88, pp. 2475-2492, 2020) constructed some optimal and minimal binary linear codes generated by one or two order ideals in hierarchical posets of two levels. At the end of their paper, they left an open problem: it also should be interesting to investigate the cases of more than two orders in hierarchical posets with two levels or many levels. In this… ▽ More J. Y. Hyun, et al. (Des. Codes Cryptogr., vol. 88, pp. 2475-2492, 2020) constructed some optimal and minimal binary linear codes generated by one or two order ideals in hierarchical posets of two levels. At the end of their paper, they left an open problem: it also should be interesting to investigate the cases of more than two orders in hierarchical posets with two levels or many levels. In this paper, we use the geometric method to determine the minimality of linear codes generated by any orders in hierarchical posets with two levels. We generalize their cases of one or two orders to any orders and determine the minimality of the linear codes completely. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: arXiv admin note: text overlap with arXiv:1911.11632, arXiv:1911.07648

arXiv:2303.02545 [pdf, other]

MINER: A Hybrid Data-Driven Approach for REST API Fuzzing

Authors: Chenyang Lyu, Jiacheng Xu, Shouling Ji, Xuhong Zhang, Qinying Wang, Binbin Zhao, Gaoning Pan, Wei Cao, Raheem Beyah

Abstract: In recent years, REST API fuzzing has emerged to explore errors on a cloud service. Its performance highly depends on the sequence construction and request generation. However, existing REST API fuzzers have trouble generating long sequences with well-constructed requests to trigger hard-to-reach states in a cloud service, which limits their performance of finding deep errors and security bugs. Fu… ▽ More In recent years, REST API fuzzing has emerged to explore errors on a cloud service. Its performance highly depends on the sequence construction and request generation. However, existing REST API fuzzers have trouble generating long sequences with well-constructed requests to trigger hard-to-reach states in a cloud service, which limits their performance of finding deep errors and security bugs. Further, they cannot find the specific errors caused by using undefined parameters during request generation. Therefore, in this paper, we propose a novel hybrid data-driven solution, named MINER, with three new designs working together to address the above limitations. First, MINER collects the valid sequences whose requests pass the cloud service's checking as the templates, and assigns more executions to long sequence templates. Second, to improve the generation quality of requests in a sequence template, MINER creatively leverages the state-of-the-art neural network model to predict key request parameters and provide them with appropriate parameter values. Third, MINER implements a new data-driven security rule checker to capture the new kind of errors caused by undefined parameters. We evaluate MINER against the state-of-the-art fuzzer RESTler on GitLab, Bugzilla, and WordPress via 11 REST APIs. The results demonstrate that the average pass rate of MINER is 23.42% higher than RESTler. MINER finds 97.54% more unique errors than RESTler on average and 142.86% more reproducible errors after manual analysis. We have reported all the newly found errors, and 7 of them have been confirmed as logic bugs by the corresponding vendors. △ Less

Submitted 4 March, 2023; originally announced March 2023.

Comments: Accepted as a full paper at USENIX Security '23

arXiv:2302.10959 [pdf, other]

Dealing with Collinearity in Large-Scale Linear System Identification Using Gaussian Regression

Authors: Wenqi Cao, Gianluigi Pillonetto

Abstract: Many problems arising in control require the determination of a mathematical model of the application. This has often to be performed starting from input-output data, leading to a task known as system identification in the engineering literature. One emerging topic in this field is estimation of networks consisting of several interconnected dynamic systems. We consider the linear setting assuming… ▽ More Many problems arising in control require the determination of a mathematical model of the application. This has often to be performed starting from input-output data, leading to a task known as system identification in the engineering literature. One emerging topic in this field is estimation of networks consisting of several interconnected dynamic systems. We consider the linear setting assuming that system outputs are the result of many correlated inputs, hence making system identification severely ill-conditioned. This is a scenario often encountered when modeling complex cybernetics systems composed by many sub-units with feedback and algebraic loops. We develop a strategy cast in a Bayesian regularization framework where any impulse response is seen as realization of a zero-mean Gaussian process. Any covariance is defined by the so called stable spline kernel which includes information on smooth exponential decay. We design a novel Markov chain Monte Carlo scheme able to reconstruct the impulse responses posterior by efficiently dealing with collinearity. Our scheme relies on a variation of the Gibbs sampling technique: beyond considering blocks forming a partition of the parameter space, some other (overlapping) blocks are also updated on the basis of the level of collinearity of the system inputs. Theoretical properties of the algorithm are studied obtaining its convergence rate. Numerical experiments are included using systems containing hundreds of impulse responses and highly correlated inputs. △ Less

Submitted 28 February, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2203.13633

arXiv:2301.12101 [pdf, other]

Non-Hermitian Physics-Inspired Voltage-Controlled Oscillators with Resistive Tuning

Authors: Weidong Cao, Hua Wang, Xuan Zhang

Abstract: This paper presents a non-Hermitian physics-inspired voltage-controlled oscillator (VCO) topology, which is termed parity-time-symmetric topology. The VCO consists of two coupled inductor-capacitor (LC) cores with a balanced gain and loss profile. Due to the interplay between the gain/loss and their coupling, an extra degree of freedom is enabled via resistive tuning, which can enhance the frequen… ▽ More This paper presents a non-Hermitian physics-inspired voltage-controlled oscillator (VCO) topology, which is termed parity-time-symmetric topology. The VCO consists of two coupled inductor-capacitor (LC) cores with a balanced gain and loss profile. Due to the interplay between the gain/loss and their coupling, an extra degree of freedom is enabled via resistive tuning, which can enhance the frequency tuning range (FTR) beyond the bounds of conventional capacitive or inductive tuning. A silicon prototype is implemented in a standard 130 nm bulk CMOS process with a core area of 0.15mm^2. Experimental results show that it achieves a 3.1x FTR improvement and 30% phase noise reduction of the baseline VCO with the same amount of capacitive tuning ability. △ Less

Submitted 28 January, 2023; originally announced January 2023.

Comments: 5 Pages, 6 figures, accepted by ISCAS 2023

arXiv:2301.08413 [pdf, other]

Chaos to Order: A Label Propagation Perspective on Source-Free Domain Adaptation

Authors: Chunwei Wu, Guitao Cao, Yan Li, Xidong Xi, Wenming Cao, Hong Wang

Abstract: Source-free domain adaptation (SFDA), where only a pre-trained source model is used to adapt to the target distribution, is a more general approach to achieving domain adaptation in the real world. However, it can be challenging to capture the inherent structure of the target features accurately due to the lack of supervised information on the target domain. By analyzing the clustering performance… ▽ More Source-free domain adaptation (SFDA), where only a pre-trained source model is used to adapt to the target distribution, is a more general approach to achieving domain adaptation in the real world. However, it can be challenging to capture the inherent structure of the target features accurately due to the lack of supervised information on the target domain. By analyzing the clustering performance of the target features, we show that they still contain core features related to discriminative attributes but lack the collation of semantic information. Inspired by this insight, we present Chaos to Order (CtO), a novel approach for SFDA that strives to constrain semantic credibility and propagate label information among target subpopulations. CtO divides the target data into inner and outlier samples based on the adaptive threshold of the learning state, customizing the learning strategy to fit the data properties best. Specifically, inner samples are utilized for learning intra-class structure thanks to their relatively well-clustered properties. The low-density outlier samples are regularized by input consistency to achieve high accuracy with respect to the ground truth labels. In CtO, by employing different learning strategies to propagate the labels from the inner local to outlier instances, it clusters the global samples from chaos to order. We further adaptively regulate the neighborhood affinity of the inner samples to constrain the local semantic credibility. In theoretical and empirical analyses, we demonstrate that our algorithm not only propagates from inner to outlier but also prevents local clustering from forming spurious clusters. Empirical evidence demonstrates that CtO outperforms the state of the arts on three public benchmarks: Office-31, Office-Home, and VisDA. △ Less

Submitted 14 August, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

Comments: Accepted by ACM MM2023

arXiv:2301.00089 [pdf, other]

Autonomous Driving Simulator based on Neurorobotics Platform

Authors: Wei Cao, Liguo Zhou, Yuhong Huang, Alois Knoll

Abstract: There are many artificial intelligence algorithms for autonomous driving, but directly installing these algorithms on vehicles is unrealistic and expensive. At the same time, many of these algorithms need an environment to train and optimize. Simulation is a valuable and meaningful solution with training and testing functions, and it can say that simulation is a critical link in the autonomous dri… ▽ More There are many artificial intelligence algorithms for autonomous driving, but directly installing these algorithms on vehicles is unrealistic and expensive. At the same time, many of these algorithms need an environment to train and optimize. Simulation is a valuable and meaningful solution with training and testing functions, and it can say that simulation is a critical link in the autonomous driving world. There are also many different applications or systems of simulation from companies or academies such as SVL and Carla. These simulators flaunt that they have the closest real-world simulation, but their environment objects, such as pedestrians and other vehicles around the agent-vehicle, are already fixed programmed. They can only move along the pre-setting trajectory, or random numbers determine their movements. What is the situation when all environmental objects are also installed by Artificial Intelligence, or their behaviors are like real people or natural reactions of other drivers? This problem is a blind spot for most of the simulation applications, or these applications cannot be easy to solve this problem. The Neurorobotics Platform from the TUM team of Prof. Alois Knoll has the idea about "Engines" and "Transceiver Functions" to solve the multi-agents problem. This report will start with a little research on the Neurorobotics Platform and analyze the potential and possibility of developing a new simulator to achieve the true real-world simulation goal. Then based on the NRP-Core Platform, this initial development aims to construct an initial demo experiment. The consist of this report starts with the basic knowledge of NRP-Core and its installation, then focus on the explanation of the necessary components for a simulation experiment, at last, about the details of constructions for the autonomous driving system, which is integrated object detection and autonomous control. △ Less

Submitted 30 December, 2022; originally announced January 2023.

Comments: 25 pages, 8 figures

MSC Class: 00-01 ACM Class: D.0

arXiv:2212.03183 [pdf]

doi 10.1063/5.0138863

A novel convergence enhancement method based on Online Dimension Reduction Optimization

Authors: Wenbo Cao, Yilang Liu, Xianglin Shan, Chuanqiang Gao, Weiwei Zhang

Abstract: Iterative steady-state solvers are widely used in computational fluid dynamics. Unfortunately, it is difficult to obtain steady-state solution for unstable problem caused by physical instability and numerical instability. Optimization is a better choice for solving unstable problem because steady-state solution is always the extreme point of optimization regardless of whether the problem is unstab… ▽ More Iterative steady-state solvers are widely used in computational fluid dynamics. Unfortunately, it is difficult to obtain steady-state solution for unstable problem caused by physical instability and numerical instability. Optimization is a better choice for solving unstable problem because steady-state solution is always the extreme point of optimization regardless of whether the problem is unstable or ill-conditioned, but it is difficult to solve partial differential equations (PDEs) due to too many optimization variables. In this study, we propose an Online Dimension Reduction Optimization (ODRO) method to enhance the convergence of the traditional iterative method to obtain the steady-state solution of unstable problem. This method performs proper orthogonal decomposition (POD) on the snapshots collected from a few iteration steps, optimizes PDE residual in the POD subspace to get a solution with lower residual, and then continues to iterate with the optimized solution as the initial value, repeating the above three steps until the residual converges. Several typical cases show that the proposed method can efficiently calculate the steady-state solution of unstable problem with both the high efficiency and robustness of the iterative method and the good convergence of the optimization method. In addition, this method is easy to implement in almost any iterative solver with minimal code modification. △ Less

Submitted 29 November, 2022; originally announced December 2022.

Journal ref: Physics of Fluids 35, 036124 (2023)

arXiv:2211.12507 [pdf, other]

OpenFE: Automated Feature Generation with Expert-level Performance

Authors: Tianping Zhang, Zheyu Zhang, Zhiyuan Fan, Haoyan Luo, Fengyuan Liu, Qian Liu, Wei Cao, Jian Li

Abstract: The goal of automated feature generation is to liberate machine learning experts from the laborious task of manual feature generation, which is crucial for improving the learning performance of tabular data. The major challenge in automated feature generation is to efficiently and accurately identify effective features from a vast pool of candidate features. In this paper, we present OpenFE, an au… ▽ More The goal of automated feature generation is to liberate machine learning experts from the laborious task of manual feature generation, which is crucial for improving the learning performance of tabular data. The major challenge in automated feature generation is to efficiently and accurately identify effective features from a vast pool of candidate features. In this paper, we present OpenFE, an automated feature generation tool that provides competitive results against machine learning experts. OpenFE achieves high efficiency and accuracy with two components: 1) a novel feature boosting method for accurately evaluating the incremental performance of candidate features and 2) a two-stage pruning algorithm that performs feature pruning in a coarse-to-fine manner. Extensive experiments on ten benchmark datasets show that OpenFE outperforms existing baseline methods by a large margin. We further evaluate OpenFE in two Kaggle competitions with thousands of data science teams participating. In the two competitions, features generated by OpenFE with a simple baseline model can beat 99.3% and 99.6% data science teams respectively. In addition to the empirical results, we provide a theoretical perspective to show that feature generation can be beneficial in a simple yet representative setting. The code is available at https://github.com/ZhangTP1996/OpenFE. △ Less

Submitted 5 June, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: 22 pages, 3 figures, accepted by ICML2023

arXiv:2210.04122 [pdf, other]

doi 10.3847/1538-4357/ac927e

Inferring Line-of-Sight Velocities and Doppler Widths from Stokes Profiles of GST/NIRIS Using Stacked Deep Neural Networks

Authors: Haodi Jiang, Qin Li, Yan Xu, Wynne Hsu, Kwangsu Ahn, Wenda Cao, Jason T. L. Wang, Haimin Wang

Abstract: Obtaining high-quality magnetic and velocity fields through Stokes inversion is crucial in solar physics. In this paper, we present a new deep learning method, named Stacked Deep Neural Networks (SDNN), for inferring line-of-sight (LOS) velocities and Doppler widths from Stokes profiles collected by the Near InfraRed Imaging Spectropolarimeter (NIRIS) on the 1.6 m Goode Solar Telescope (GST) at th… ▽ More Obtaining high-quality magnetic and velocity fields through Stokes inversion is crucial in solar physics. In this paper, we present a new deep learning method, named Stacked Deep Neural Networks (SDNN), for inferring line-of-sight (LOS) velocities and Doppler widths from Stokes profiles collected by the Near InfraRed Imaging Spectropolarimeter (NIRIS) on the 1.6 m Goode Solar Telescope (GST) at the Big Bear Solar Observatory (BBSO). The training data of SDNN is prepared by a Milne-Eddington (ME) inversion code used by BBSO. We quantitatively assess SDNN, comparing its inversion results with those obtained by the ME inversion code and related machine learning (ML) algorithms such as multiple support vector regression, multilayer perceptrons and a pixel-level convolutional neural network. Major findings from our experimental study are summarized as follows. First, the SDNN-inferred LOS velocities are highly correlated to the ME-calculated ones with the Pearson product-moment correlation coefficient being close to 0.9 on average. Second, SDNN is faster, while producing smoother and cleaner LOS velocity and Doppler width maps, than the ME inversion code. Third, the maps produced by SDNN are closer to ME's maps than those from the related ML algorithms, demonstrating the better learning capability of SDNN than the ML algorithms. Finally, comparison between the inversion results of ME and SDNN based on GST/NIRIS and those from the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory in flare-prolific active region NOAA 12673 is presented. We also discuss extensions of SDNN for inferring vector magnetic fields with empirical evaluation. △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: 16 pages, 8 figures

Journal ref: The Astrophysical Journal, 2022

arXiv:2209.05042 [pdf, other]

doi 10.1109/CDC51059.2022.9992503

On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator

Authors: Jingliang Duan, Wenhan Cao, Yang Zheng, Lin Zhao

Abstract: The convergence of policy gradient algorithms in reinforcement learning hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies… ▽ More The convergence of policy gradient algorithms in reinforcement learning hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. At the core of our results is the uniqueness of the stationary point of dLQR when it is observable, which is in a concise form of an observer-based controller with the optimal similarity transformation. These results shed light on designing efficient algorithms for general decision-making problems with partially observed information. △ Less

Submitted 29 October, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2201.09598

Journal ref: 2022 IEEE 61st Conference on Decision and Control (CDC)

Showing 1–50 of 98 results for author: Cao, W