subscribe to arXiv mailings

EVA-Score: Evaluation of Long-form Summarization on Informativeness through Extraction and Validation

Authors: Yuchen Fan, Xin Zhong, Chengsi Wang, Gaoche Wu, Bowen Zhou

Abstract: Summarization is a fundamental task in natural language processing (NLP) and since large language models (LLMs), such as GPT-4 and Claude, come out, increasing attention has been paid to long-form summarization whose input sequences are much longer, indicating more information contained. The current evaluation metrics either use similarity-based metrics like ROUGE and BERTScore which rely on sim… ▽ More Summarization is a fundamental task in natural language processing (NLP) and since large language models (LLMs), such as GPT-4 and Claude, come out, increasing attention has been paid to long-form summarization whose input sequences are much longer, indicating more information contained. The current evaluation metrics either use similarity-based metrics like ROUGE and BERTScore which rely on similarity and fail to consider informativeness or LLM-based metrics, lacking quantitative analysis of information richness and are rather subjective. In this paper, we propose a new evaluation metric called EVA-Score using Atomic Fact Chain Generation and Document-level Relation Extraction together to automatically calculate the informativeness and give a definite number as an information score. Experiment results show that our metric shows a state-of-the-art correlation with humans. We also re-evaluate the performance of LLMs on long-form summarization comprehensively from the information aspect, forecasting future ways to use LLMs for long-form summarization. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 16 pages, 3 figures, submitted to EMNLP

arXiv:2407.04948 [pdf, other]

Zero-shot Object Counting with Good Exemplars

Authors: Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Zheng Wang, Xian Zhong, Shengfeng He

Abstract: Zero-shot object counting (ZOC) aims to enumerate objects in images using only the names of object classes during testing, without the need for manual annotations. However, a critical challenge in current ZOC methods lies in their inability to identify high-quality exemplars effectively. This deficiency hampers scalability across diverse classes and undermines the development of strong visual asso… ▽ More Zero-shot object counting (ZOC) aims to enumerate objects in images using only the names of object classes during testing, without the need for manual annotations. However, a critical challenge in current ZOC methods lies in their inability to identify high-quality exemplars effectively. This deficiency hampers scalability across diverse classes and undermines the development of strong visual associations between the identified classes and image content. To this end, we propose the Visual Association-based Zero-shot Object Counting (VA-Count) framework. VA-Count consists of an Exemplar Enhancement Module (EEM) and a Noise Suppression Module (NSM) that synergistically refine the process of class exemplar identification while minimizing the consequences of incorrect object identification. The EEM utilizes advanced vision-language pretaining models to discover potential exemplars, ensuring the framework's adaptability to various classes. Meanwhile, the NSM employs contrastive learning to differentiate between optimal and suboptimal exemplar pairs, reducing the negative effects of erroneous exemplars. VA-Count demonstrates its effectiveness and scalability in zero-shot contexts with superior performance on two object counting datasets. △ Less

Submitted 9 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

arXiv:2406.10175 [pdf, other]

Enhancing Incomplete Multi-modal Brain Tumor Segmentation with Intra-modal Asymmetry and Inter-modal Dependency

Authors: Weide Liu, Jingwen Hou, Xiaoyang Zhong, Huijing Zhan, Jun Cheng, Yuming Fang, Guanghui Yue

Abstract: Deep learning-based brain tumor segmentation (BTS) models for multi-modal MRI images have seen significant advancements in recent years. However, a common problem in practice is the unavailability of some modalities due to varying scanning protocols and patient conditions, making segmentation from incomplete MRI modalities a challenging issue. Previous methods have attempted to address this by fus… ▽ More Deep learning-based brain tumor segmentation (BTS) models for multi-modal MRI images have seen significant advancements in recent years. However, a common problem in practice is the unavailability of some modalities due to varying scanning protocols and patient conditions, making segmentation from incomplete MRI modalities a challenging issue. Previous methods have attempted to address this by fusing accessible multi-modal features, leveraging attention mechanisms, and synthesizing missing modalities using generative models. However, these methods ignore the intrinsic problems of medical image segmentation, such as the limited availability of training samples, particularly for cases with tumors. Furthermore, these methods require training and deploying a specific model for each subset of missing modalities. To address these issues, we propose a novel approach that enhances the BTS model from two perspectives. Firstly, we introduce a pre-training stage that generates a diverse pre-training dataset covering a wide range of different combinations of tumor shapes and brain anatomy. Secondly, we propose a post-training stage that enables the model to reconstruct missing modalities in the prediction results when only partial modalities are available. To achieve the pre-training stage, we conceptually decouple the MRI image into two parts: `anatomy' and `tumor'. We pre-train the BTS model using synthesized data generated from the anatomy and tumor parts across different training samples. ... Extensive experiments demonstrate that our proposed method significantly improves the performance over the baseline and achieves new state-of-the-art results on three brain tumor segmentation datasets: BRATS2020, BRATS2018, and BRATS2015. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.08098 [pdf, other]

Scalable Defect Detection via Traversal on Code Graph

Authors: Zhengyao Liu, Xitong Zhong, Xingjing Deng, Shuo Hong, Xiang Gao, Hailong Sun

Abstract: Detecting defects and vulnerabilities in the early stage has long been a challenge in software engineering. Static analysis, a technique that inspects code without execution, has emerged as a key strategy to address this challenge. Among recent advancements, the use of graph-based representations, particularly Code Property Graph (CPG), has gained traction due to its comprehensive depiction of cod… ▽ More Detecting defects and vulnerabilities in the early stage has long been a challenge in software engineering. Static analysis, a technique that inspects code without execution, has emerged as a key strategy to address this challenge. Among recent advancements, the use of graph-based representations, particularly Code Property Graph (CPG), has gained traction due to its comprehensive depiction of code structure and semantics. Despite the progress, existing graph-based analysis tools still face performance and scalability issues. The main bottleneck lies in the size and complexity of CPG, which makes analyzing large codebases inefficient and memory-consuming. Also, query rules used by the current tools can be over-specific. Hence, we introduce QVoG, a graph-based static analysis platform for detecting defects and vulnerabilities. It employs a compressed CPG representation to maintain a reasonable graph size, thereby enhancing the overall query efficiency. Based on the CPG, it also offers a declarative query language to simplify the queries. Furthermore, it takes a step forward to integrate machine learning to enhance the generality of vulnerability detection. For projects consisting of 1,000,000+ lines of code, QVoG can complete analysis in approximately 15 minutes, as opposed to 19 minutes with CodeQL. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07949 [pdf, other]

Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

Authors: Jie Feng, Xiaojian Zhong, Di Li, Weisheng Dong, Ronghua Shang, Licheng Jiao

Abstract: Band selection plays a crucial role in hyperspectral image classification by removing redundant and noisy bands and retaining discriminative ones. However, most existing deep learning-based methods are aimed at dealing with a specific band selection dataset, and need to retrain parameters for new datasets, which significantly limits their generalizability.To address this issue, a novel multi-teach… ▽ More Band selection plays a crucial role in hyperspectral image classification by removing redundant and noisy bands and retaining discriminative ones. However, most existing deep learning-based methods are aimed at dealing with a specific band selection dataset, and need to retrain parameters for new datasets, which significantly limits their generalizability.To address this issue, a novel multi-teacher multi-objective meta-learning network (M$^3$BS) is proposed for zero-shot hyperspectral band selection. In M$^3$BS, a generalizable graph convolution network (GCN) is constructed to generate dataset-agnostic base, and extract compatible meta-knowledge from multiple band selection tasks. To enhance the ability of meta-knowledge extraction, multiple band selection teachers are introduced to provide diverse high-quality experiences.strategy Finally, subsequent classification tasks are attached and jointly optimized with multi-teacher band selection tasks through multi-objective meta-learning in an end-to-end trainable way. Multi-objective meta-learning guarantees to coordinate diverse optimization objectives automatically and adapt to various datasets simultaneously. Once the optimization is accomplished, the acquired meta-knowledge can be directly transferred to unseen datasets without any retraining or fine-tuning. Experimental results demonstrate the effectiveness and efficiency of our proposed method on par with state-of-the-art baselines for zero-shot hyperspectral band selection. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.05704 [pdf, other]

Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation

Authors: Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Tao Dai, Meikang Qiu, Shu-Tao Xia

Abstract: Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current methods have integrated parameterization techniques to boost synthetic dataset performance by shifting the optimization space from pixel to another informative feature domain. However, they limit themselves to a fixed optimization space for distillation, negle… ▽ More Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current methods have integrated parameterization techniques to boost synthetic dataset performance by shifting the optimization space from pixel to another informative feature domain. However, they limit themselves to a fixed optimization space for distillation, neglecting the diverse guidance across different informative latent spaces. To overcome this limitation, we propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD), to systematically explore hierarchical layers within the generative adversarial networks (GANs). This allows us to progressively span from the initial latent space to the final pixel space. In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation, bridging the gap between synthetic and original datasets. Experimental results demonstrate that the proposed H-GLaD achieves a significant improvement in both same-architecture and cross-architecture performance with equivalent time consumption. △ Less

Submitted 12 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.05925 [pdf, other]

FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting

Authors: Xiaohui Zhong, Lei Chen, Hao Li, Jun Liu, Xu Fan, Jie Feng, Kan Dai, Jing-Jia Luo, Jie Wu, Yuan Qi, Bo Lu

Abstract: Ensemble forecasting is crucial for improving weather predictions, especially for forecasts of extreme events. Constructing an ensemble prediction system (EPS) based on conventional NWP models is highly computationally expensive. ML models have emerged as valuable tools for deterministic weather forecasts, providing forecasts with significantly reduced computational requirements and even surpassin… ▽ More Ensemble forecasting is crucial for improving weather predictions, especially for forecasts of extreme events. Constructing an ensemble prediction system (EPS) based on conventional NWP models is highly computationally expensive. ML models have emerged as valuable tools for deterministic weather forecasts, providing forecasts with significantly reduced computational requirements and even surpassing the forecast performance of traditional NWP models. However, challenges arise when applying ML models to ensemble forecasting. Recent ML models, such as GenCast and SEEDS model, rely on the ERA5 EDA or operational NWP ensemble members for forecast generation. Their spatial resolution is also considered too coarse for many applications. To overcome these limitations, we introduce FuXi-ENS, an advanced ML model designed to deliver 6-hourly global ensemble weather forecasts up to 15 days. This model runs at a significantly increased spatial resolution of 0.25\textdegree, incorporating 5 atmospheric variables at 13 pressure levels, along with 13 surface variables. By leveraging the inherent probabilistic nature of Variational AutoEncoder (VAE), FuXi-ENS optimizes a loss function that combines the CRPS and the KL divergence between the predicted and target distribution, facilitating the incorporation of flow-dependent perturbations in both initial conditions and forecast. This innovative approach makes FuXi-ENS an advancement over the traditional ones that use L1 loss combined with the KL loss in standard VAE models for ensemble weather forecasting. Results demonstrate that FuXi-ENS outperforms ensemble forecasts from the ECMWF, a world leading NWP model, in the CRPS of 98.1% of 360 variable and forecast lead time combinations. This achievement underscores the potential of the FuXi-ENS model to enhance ensemble weather forecasts, offering a promising direction for further development in this field. △ Less

Submitted 5 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05075 [pdf, other]

Towards Efficient Training and Evaluation of Robust Models against $l_0$ Bounded Adversarial Perturbations

Authors: Xuyang Zhong, Yixiao Huang, Chen Liu

Abstract: This work studies sparse adversarial perturbations bounded by $l_0$ norm. We propose a white-box PGD-like attack method named sparse-PGD to effectively and efficiently generate such perturbations. Furthermore, we combine sparse-PGD with a black-box attack to comprehensively and more reliably evaluate the models' robustness against $l_0$ bounded adversarial perturbations. Moreover, the efficiency o… ▽ More This work studies sparse adversarial perturbations bounded by $l_0$ norm. We propose a white-box PGD-like attack method named sparse-PGD to effectively and efficiently generate such perturbations. Furthermore, we combine sparse-PGD with a black-box attack to comprehensively and more reliably evaluate the models' robustness against $l_0$ bounded adversarial perturbations. Moreover, the efficiency of sparse-PGD enables us to conduct adversarial training to build robust models against sparse perturbations. Extensive experiments demonstrate that our proposed attack algorithm exhibits strong performance in different scenarios. More importantly, compared with other robust models, our adversarially trained model demonstrates state-of-the-art robustness against various sparse attacks. Codes are available at https://github.com/CityU-MLO/sPGD. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.03388 [pdf, other]

3D LiDAR Mapping in Dynamic Environments Using a 4D Implicit Neural Representation

Authors: Xingguang Zhong, Yue Pan, Cyrill Stachniss, Jens Behley

Abstract: Building accurate maps is a key building block to enable reliable localization, planning, and navigation of autonomous vehicles. We propose a novel approach for building accurate maps of dynamic environments utilizing a sequence of LiDAR scans. To this end, we propose encoding the 4D scene into a novel spatio-temporal implicit neural map representation by fitting a time-dependent truncated signed… ▽ More Building accurate maps is a key building block to enable reliable localization, planning, and navigation of autonomous vehicles. We propose a novel approach for building accurate maps of dynamic environments utilizing a sequence of LiDAR scans. To this end, we propose encoding the 4D scene into a novel spatio-temporal implicit neural map representation by fitting a time-dependent truncated signed distance function to each point. Using our representation, we extract the static map by filtering the dynamic parts. Our neural representation is based on sparse feature grids, a globally shared decoder, and time-dependent basis functions, which we jointly optimize in an unsupervised fashion. To learn this representation from a sequence of LiDAR scans, we design a simple yet efficient loss function to supervise the map optimization in a piecewise way. We evaluate our approach on various scenes containing moving objects in terms of the reconstruction quality of static maps and the segmentation of dynamic point clouds. The experimental results demonstrate that our method is capable of removing the dynamic part of the input point clouds while reconstructing accurate and complete 3D maps, outperforming several state-of-the-art methods. Codes are available at: https://github.com/PRBonn/4dNDF △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 10 pages, CVPR 2024

arXiv:2404.13134 [pdf, other]

Deep Learning-based Text-in-Image Watermarking

Authors: Bishwa Karki, Chun-Hua Tsai, Pei-Chi Huang, Xin Zhong

Abstract: In this work, we introduce a novel deep learning-based approach to text-in-image watermarking, a method that embeds and extracts textual information within images to enhance data security and integrity. Leveraging the capabilities of deep learning, specifically through the use of Transformer-based architectures for text processing and Vision Transformers for image feature extraction, our method se… ▽ More In this work, we introduce a novel deep learning-based approach to text-in-image watermarking, a method that embeds and extracts textual information within images to enhance data security and integrity. Leveraging the capabilities of deep learning, specifically through the use of Transformer-based architectures for text processing and Vision Transformers for image feature extraction, our method sets new benchmarks in the domain. The proposed method represents the first application of deep learning in text-in-image watermarking that improves adaptivity, allowing the model to intelligently adjust to specific image characteristics and emerging threats. Through testing and evaluation, our method has demonstrated superior robustness compared to traditional watermarking techniques, achieving enhanced imperceptibility that ensures the watermark remains undetectable across various image contents. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.08522 [pdf, other]

Fuxi-DA: A Generalized Deep Learning Data Assimilation Framework for Assimilating Satellite Observations

Authors: Xiaoze Xu, Xiuyu Sun, Wei Han, Xiaohui Zhong, Lei Chen, Hao Li

Abstract: Data assimilation (DA), as an indispensable component within contemporary Numerical Weather Prediction (NWP) systems, plays a crucial role in generating the analysis that significantly impacts forecast performance. Nevertheless, the development of an efficient DA system poses significant challenges, particularly in establishing intricate relationships between the background data and the vast amoun… ▽ More Data assimilation (DA), as an indispensable component within contemporary Numerical Weather Prediction (NWP) systems, plays a crucial role in generating the analysis that significantly impacts forecast performance. Nevertheless, the development of an efficient DA system poses significant challenges, particularly in establishing intricate relationships between the background data and the vast amount of multi-source observation data within limited time windows in operational settings. To address these challenges, researchers design complex pre-processing methods for each observation type, leveraging approximate modeling and the power of super-computing clusters to expedite solutions. The emergence of deep learning (DL) models has been a game-changer, offering unified multi-modal modeling, enhanced nonlinear representation capabilities, and superior parallelization. These advantages have spurred efforts to integrate DL models into various domains of weather modeling. Remarkably, DL models have shown promise in matching, even surpassing, the forecast accuracy of leading operational NWP models worldwide. This success motivates the exploration of DL-based DA frameworks tailored for weather forecasting models. In this study, we introduces FuxiDA, a generalized DL-based DA framework for assimilating satellite observations. By assimilating data from Advanced Geosynchronous Radiation Imager (AGRI) aboard Fengyun-4B, FuXi-DA consistently mitigates analysis errors and significantly improves forecast performance. Furthermore, through a series of single-observation experiments, Fuxi-DA has been validated against established atmospheric physics, demonstrating its consistency and reliability. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.05832 [pdf, other]

Human-Machine Interaction in Automated Vehicles: Reducing Voluntary Driver Intervention

Authors: Xinzhi Zhong, Yang Zhou, Varshini Kamaraj, Zhenhao Zhou, Wissam Kontar, Dan Negrut, John D. Lee, Soyoung Ahn

Abstract: This paper develops a novel car-following control method to reduce voluntary driver interventions and improve traffic stability in Automated Vehicles (AVs). Through a combination of experimental and empirical analysis, we show how voluntary driver interventions can instigate substantial traffic disturbances that are amplified along the traffic upstream. Motivated by these findings, we present a fr… ▽ More This paper develops a novel car-following control method to reduce voluntary driver interventions and improve traffic stability in Automated Vehicles (AVs). Through a combination of experimental and empirical analysis, we show how voluntary driver interventions can instigate substantial traffic disturbances that are amplified along the traffic upstream. Motivated by these findings, we present a framework for driver intervention based on evidence accumulation (EA), which describes the evolution of the driver's distrust in automation, ultimately resulting in intervention. Informed through the EA framework, we propose a deep reinforcement learning (DRL)-based car-following control for AVs that is strategically designed to mitigate unnecessary driver intervention and improve traffic stability. Numerical experiments are conducted to demonstrate the effectiveness of the proposed control model. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.14690 [pdf]

Incorporating Graph Attention Mechanism into Geometric Problem Solving Based on Deep Reinforcement Learning

Authors: Xiuqin Zhong, Shengyuan Yan, Gongqi Lin, Hongguang Fu, Liang Xu, Siwen Jiang, Lei Huang, Wei Fang

Abstract: In the context of online education, designing an automatic solver for geometric problems has been considered a crucial step towards general math Artificial Intelligence (AI), empowered by natural language understanding and traditional logical inference. In most instances, problems are addressed by adding auxiliary components such as lines or points. However, adding auxiliary components automatical… ▽ More In the context of online education, designing an automatic solver for geometric problems has been considered a crucial step towards general math Artificial Intelligence (AI), empowered by natural language understanding and traditional logical inference. In most instances, problems are addressed by adding auxiliary components such as lines or points. However, adding auxiliary components automatically is challenging due to the complexity in selecting suitable auxiliary components especially when pivotal decisions have to be made. The state-of-the-art performance has been achieved by exhausting all possible strategies from the category library to identify the one with the maximum likelihood. However, an extensive strategy search have to be applied to trade accuracy for ef-ficiency. To add auxiliary components automatically and efficiently, we present deep reinforcement learning framework based on the language model, such as BERT. We firstly apply the graph attention mechanism to reduce the strategy searching space, called AttnStrategy, which only focus on the conclusion-related components. Meanwhile, a novel algorithm, named Automatically Adding Auxiliary Components using Reinforcement Learning framework (A3C-RL), is proposed by forcing an agent to select top strategies, which incorporates the AttnStrategy and BERT as the memory components. Results from extensive experiments show that the proposed A3C-RL algorithm can substantially enhance the average precision by 32.7% compared to the traditional MCTS. In addition, the A3C-RL algorithm outperforms humans on the geometric questions from the annual University Entrance Mathematical Examination of China. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.11099 [pdf, other]

Wait to be Faster: a Smart Pooling Framework for Dynamic Ridesharing

Authors: Xiaoyao Zhong, Jiabao Jin, Peng Cheng, Wangze Ni, Libin Zheng, Lei Chen, Xuemin Lin

Abstract: Ridesharing services, such as Uber or Didi, have attracted considerable attention in recent years due to their positive impact on environmental protection and the economy. Existing studies require quick responses to orders, which lack the flexibility to accommodate longer wait times for better grouping opportunities. In this paper, we address a NP-hard ridesharing problem, called Minimal Extra Tim… ▽ More Ridesharing services, such as Uber or Didi, have attracted considerable attention in recent years due to their positive impact on environmental protection and the economy. Existing studies require quick responses to orders, which lack the flexibility to accommodate longer wait times for better grouping opportunities. In this paper, we address a NP-hard ridesharing problem, called Minimal Extra Time RideSharing (METRS), which balances waiting time and group quality (i.e., detour time) to improve riders' satisfaction. To tackle this problem, we propose a novel approach called WATTER (WAit To be fasTER), which leverages an order pooling management algorithm allowing orders to wait until they can be matched with suitable groups. The key challenge is to customize the extra time threshold for each order by reducing the original optimization objective into a convex function of threshold, thus offering a theoretical guarantee to be optimized efficiently. We model the dispatch process using a Markov Decision Process (MDP) with a carefully designed value function to learn the threshold. Through extensive experiments on three real datasets, we demonstrate the efficiency and effectiveness of our proposed approaches. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: IEEE ICDE 2024

arXiv:2402.04557 [pdf]

doi 10.1021/acs.iecr.3c02520

An Artificial Intelligence (AI) workflow for catalyst design and optimization

Authors: Nung Siong Lai, Yi Shen Tew, Xialin Zhong, Jun Yin, Jiali Li, Binhang Yan, Xiaonan Wang

Abstract: In the pursuit of novel catalyst development to address pressing environmental concerns and energy demand, conventional design and optimization methods often fall short due to the complexity and vastness of the catalyst parameter space. The advent of Machine Learning (ML) has ushered in a new era in the field of catalyst optimization, offering potential solutions to the shortcomings of traditional… ▽ More In the pursuit of novel catalyst development to address pressing environmental concerns and energy demand, conventional design and optimization methods often fall short due to the complexity and vastness of the catalyst parameter space. The advent of Machine Learning (ML) has ushered in a new era in the field of catalyst optimization, offering potential solutions to the shortcomings of traditional techniques. However, existing methods fail to effectively harness the wealth of information contained within the burgeoning body of scientific literature on catalyst synthesis. To address this gap, this study proposes an innovative Artificial Intelligence (AI) workflow that integrates Large Language Models (LLMs), Bayesian optimization, and an active learning loop to expedite and enhance catalyst optimization. Our methodology combines advanced language understanding with robust optimization strategies, effectively translating knowledge extracted from diverse literature into actionable parameters for practical experimentation and optimization. In this article, we demonstrate the application of this AI workflow in the optimization of catalyst synthesis for ammonia production. The results underscore the workflow's ability to streamline the catalyst development process, offering a swift, resource-efficient, and high-precision alternative to conventional methods. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 31 pages, 7 figures

Journal ref: Ind. Eng. Chem. Res. 2023, 62, 43, 17835-17848

arXiv:2401.16669 [pdf]

Improving Global Weather and Ocean Wave Forecast with Large Artificial Intelligence Models

Authors: Fenghua Ling, Lin Ouyang, Boufeniza Redouane Larbi, Jing-Jia Luo, Tao Han, Xiaohui Zhong, Lei Bai

Abstract: The rapid advancement of artificial intelligence technologies, particularly in recent years, has led to the emergence of several large parameter artificial intelligence weather forecast models. These models represent a significant breakthrough, overcoming the limitations of traditional numerical weather prediction models and indicating the emergence of profound potential tools for atmosphere-ocean… ▽ More The rapid advancement of artificial intelligence technologies, particularly in recent years, has led to the emergence of several large parameter artificial intelligence weather forecast models. These models represent a significant breakthrough, overcoming the limitations of traditional numerical weather prediction models and indicating the emergence of profound potential tools for atmosphere-ocean forecasts. This study explores the evolution of these advanced artificial intelligence forecast models, and based on the identified commonalities, proposes the "Three Large Rules" to measure their development. We discuss the potential of artificial intelligence in revolutionizing numerical weather prediction, and briefly outlining the underlying reasons for its great potential. While acknowledging the high accuracy, computational efficiency, and ease of deployment of large artificial intelligence forecast models, we also emphasize the irreplaceable values of traditional numerical forecasts and explore the challenges in the future development of large-scale artificial intelligence atmosphere-ocean forecast models. We believe that the optimal future of atmosphere-ocean weather forecast lies in achieving a seamless integration of artificial intelligence and traditional numerical models. Such a synthesis is anticipated to offer a more advanced and reliable approach for improved atmosphere-ocean forecasts. Additionally, we illustrate how forecasters can adapt and leverage the advanced artificial intelligence model through an example by building a large artificial intelligence model for global ocean wave forecast. △ Less

Submitted 18 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.14620 [pdf, other]

A RISC-V SOC for Terahertz IoT Devices: Implementation and design challenges

Authors: Xinchao Zhong, Sean Longyu Ma, Hong-fu Chou

Abstract: Terahertz (THz) communication is considered a viable approach to augmenting the communication capacity of prospective Internet-of-Things (IoT) resulting in enhanced spectral efficiency. This study first provides an outline of the design challenges encountered in developing THz transceivers. This paper introduces advanced approaches and a unique methodology known as Modified Pulse-width Modulation… ▽ More Terahertz (THz) communication is considered a viable approach to augmenting the communication capacity of prospective Internet-of-Things (IoT) resulting in enhanced spectral efficiency. This study first provides an outline of the design challenges encountered in developing THz transceivers. This paper introduces advanced approaches and a unique methodology known as Modified Pulse-width Modulation (MPWM) to address the issues in the THz domain. In this situation involving a transceiver that handles complex modulation schemes, the presence of a mixed signal through a high-resolution digital-to-analog converter (DAC) in the transmitter greatly contributes to the limitation in maintaining linearity at high frequencies. The utilization of Pulse-width Modulation-based Digital-to-Analog Converters (PWM-DACs) has garnered significant attention among scholars due to its efficiency and affordability. However, the converters' performance is restricted by insufficient conversion speed and precision, especially in the context of high-resolution, high-order modulation schemes for THz wireless communications. The MPWM framework offers a multitude of adjustable options, rendering the final MPWM-DAC highly adaptable for a diverse array of application scenarios. Comparative performance assessments indicate that MPWM-DACs have enhanced conversion speed compared to standard PWM-DACs, and they also provide greater accuracy in comparison to Pulse-count Modulation DACs (PCM-DACs). The study presents a comprehensive examination of the core principles, spectrum characteristics, and evaluation metrics, as well as the development and experimental validation of the MPWM method. Furthermore, we present a RISC-V System-on-Chip (SoC) that incorporates an MPWM-DAC, offering a highly favorable resolution for THz IoT communications. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 18 pages, 17 figures, journal

arXiv:2401.12151 [pdf, other]

Uncoded Storage Coded Transmission Elastic Computing with Straggler Tolerance in Heterogeneous Systems

Authors: Xi Zhong, Joerg Kliewer, Mingyue Ji

Abstract: In 2018, Yang et al. introduced a novel and effective approach, using maximum distance separable (MDS) codes, to mitigate the impact of elasticity in cloud computing systems. This approach is referred to as coded elastic computing. Some limitations of this approach include that it assumes all virtual machines have the same computing speeds and storage capacities, and it cannot tolerate stragglers… ▽ More In 2018, Yang et al. introduced a novel and effective approach, using maximum distance separable (MDS) codes, to mitigate the impact of elasticity in cloud computing systems. This approach is referred to as coded elastic computing. Some limitations of this approach include that it assumes all virtual machines have the same computing speeds and storage capacities, and it cannot tolerate stragglers for matrix-matrix multiplications. In order to resolve these limitations, in this paper, we introduce a new combinatorial optimization framework, named uncoded storage coded transmission elastic computing (USCTEC), for heterogeneous speeds and storage constraints, aiming to minimize the expected computation time for matrix-matrix multiplications, under the consideration of straggler tolerance. Within this framework, we propose optimal solutions with straggler tolerance under relaxed storage constraints. Moreover, we propose a heuristic algorithm that considers the heterogeneous storage constraints. Our results demonstrate that the proposed algorithm outperforms baseline solutions utilizing cyclic storage placements, in terms of both expected computation time and storage size. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 6 pages, 1 figure, accepted in ICC 2024

arXiv:2401.09101 [pdf, other]

doi 10.1109/TRO.2024.3422055

PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency

Authors: Yue Pan, Xingguang Zhong, Louis Wiesmann, Thorbjörn Posewsky, Jens Behley, Cyrill Stachniss

Abstract: Accurate and robust localization and mapping are essential components for most autonomous robots. In this paper, we propose a SLAM system for building globally consistent maps, called PIN-SLAM, that is based on an elastic and compact point-based implicit neural map representation. Taking range measurements as input, our approach alternates between incremental learning of the local implicit signed… ▽ More Accurate and robust localization and mapping are essential components for most autonomous robots. In this paper, we propose a SLAM system for building globally consistent maps, called PIN-SLAM, that is based on an elastic and compact point-based implicit neural map representation. Taking range measurements as input, our approach alternates between incremental learning of the local implicit signed distance field and the pose estimation given the current local map using a correspondence-free, point-to-implicit model registration. Our implicit map is based on sparse optimizable neural points, which are inherently elastic and deformable with the global pose adjustment when closing a loop. Loops are also detected using the neural point features. Extensive experiments validate that PIN-SLAM is robust to various environments and versatile to different range sensors such as LiDAR and RGB-D cameras. PIN-SLAM achieves pose estimation accuracy better or on par with the state-of-the-art LiDAR odometry or SLAM systems and outperforms the recent neural implicit SLAM approaches while maintaining a more consistent, and highly compact implicit map that can be reconstructed as accurate and complete meshes. Finally, thanks to the voxel hashing for efficient neural points indexing and the fast implicit map-based registration without closest point association, PIN-SLAM can run at the sensor frame rate on a moderate GPU. Codes will be available at: https://github.com/PRBonn/PIN_SLAM. △ Less

Submitted 2 July, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: 20 pages

arXiv:2312.09926 [pdf, other]

FuXi-S2S: A machine learning model that outperforms conventional global subseasonal forecast models

Authors: Lei Chen, Xiaohui Zhong, Hao Li, Jie Wu, Bo Lu, Deliang Chen, Shangping Xie, Qingchen Chao, Chensen Lin, Zixin Hu, Yuan Qi

Abstract: Skillful subseasonal forecasts are crucial for various sectors of society but pose a grand scientific challenge. Recently, machine learning based weather forecasting models outperform the most successful numerical weather predictions generated by the European Centre for Medium-Range Weather Forecasts (ECMWF), but have not yet surpassed conventional models at subseasonal timescales. This paper intr… ▽ More Skillful subseasonal forecasts are crucial for various sectors of society but pose a grand scientific challenge. Recently, machine learning based weather forecasting models outperform the most successful numerical weather predictions generated by the European Centre for Medium-Range Weather Forecasts (ECMWF), but have not yet surpassed conventional models at subseasonal timescales. This paper introduces FuXi Subseasonal-to-Seasonal (FuXi-S2S), a machine learning model that provides global daily mean forecasts up to 42 days, encompassing five upper-air atmospheric variables at 13 pressure levels and 11 surface variables. FuXi-S2S, trained on 72 years of daily statistics from ECMWF ERA5 reanalysis data, outperforms the ECMWF's state-of-the-art Subseasonal-to-Seasonal model in ensemble mean and ensemble forecasts for total precipitation and outgoing longwave radiation, notably enhancing global precipitation forecast. The improved performance of FuXi-S2S can be primarily attributed to its superior capability to capture forecast uncertainty and accurately predict the Madden-Julian Oscillation (MJO), extending the skillful MJO prediction from 30 days to 36 days. Moreover, FuXi-S2S not only captures realistic teleconnections associated with the MJO, but also emerges as a valuable tool for discovering precursor signals, offering researchers insights and potentially establishing a new paradigm in Earth system science research. △ Less

Submitted 5 July, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.01713 [pdf, other]

Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection

Authors: Xubin Zhong, Changxing Ding, Yupeng Hu, Dacheng Tao

Abstract: Human-Object Interaction (HOI) detection is a core task for human-centric image understanding. Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction; however, the interaction representations obtained using this method are entangled and lack interpretability. In contrast, traditional two-stage methods benefit significantly from th… ▽ More Human-Object Interaction (HOI) detection is a core task for human-centric image understanding. Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction; however, the interaction representations obtained using this method are entangled and lack interpretability. In contrast, traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner. In this paper, we improve the performance of one-stage methods by enabling them to extract disentangled interaction representations. First, we propose Shunted Cross-Attention (SCA) to extract human appearance, object appearance, and global context features using different cross-attention heads. This is achieved by imposing different masks on the cross-attention maps produced by the different heads. Second, we introduce the Interaction-aware Pose Estimation (IPE) task to learn interaction-relevant human pose features using a disentangled decoder. This is achieved with a novel attention module that accurately captures the human keypoints relevant to the current interaction category. Finally, our approach fuses the appearance feature and pose feature via element-wise addition to form the interaction representation. Experimental results show that our approach can be readily applied to existing one-stage HOI detectors. Moreover, we achieve state-of-the-art performance on two benchmarks: HICO-DET and V-COCO. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.01554 [pdf, other]

Building Ears for Robots: Machine Hearing in the Age of Autonomy

Authors: Xuan Zhong

Abstract: This study explores the significance of robot hearing systems, emphasizing their importance for robots operating in diverse and uncertain environments. It introduces the hardware design principles using robotaxis as an example, where exterior microphone arrays are employed to detect sound events such as sirens. The challenges, goals, and test methods are discussed, focusing on achieving a suitable… ▽ More This study explores the significance of robot hearing systems, emphasizing their importance for robots operating in diverse and uncertain environments. It introduces the hardware design principles using robotaxis as an example, where exterior microphone arrays are employed to detect sound events such as sirens. The challenges, goals, and test methods are discussed, focusing on achieving a suitable signal-to-noise ratio (SNR). Additionally, it presents a preliminary software framework rooted in probabilistic robotics theory, advocating for the integration of robot hearing into the broader context of perception and decision-making. It discusses various models, including Bayes filters, partially observable Markov decision processes (POMDP), and multiagent systems, highlighting the multifaceted roles that robot hearing can play. In conclusion, as service robots continue to evolve, robot hearing research will expand, offering new perspectives and challenges for future development beyond simple sound event classification. △ Less

Submitted 5 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

Comments: 11 pages, 6 figures. The materials covered in this article were presented and discussed at the Hearing Seminar at Stanford University organized by Malcolm Slaney in October, 2023

arXiv:2311.16828 [pdf, other]

SARA: Controllable Makeup Transfer with Spatial Alignment and Region-Adaptive Normalization

Authors: Xiaojing Zhong, Xinyi Huang, Zhonghua Wu, Guosheng Lin, Qingyao Wu

Abstract: Makeup transfer is a process of transferring the makeup style from a reference image to the source images, while preserving the source images' identities. This technique is highly desirable and finds many applications. However, existing methods lack fine-level control of the makeup style, making it challenging to achieve high-quality results when dealing with large spatial misalignments. To addres… ▽ More Makeup transfer is a process of transferring the makeup style from a reference image to the source images, while preserving the source images' identities. This technique is highly desirable and finds many applications. However, existing methods lack fine-level control of the makeup style, making it challenging to achieve high-quality results when dealing with large spatial misalignments. To address this problem, we propose a novel Spatial Alignment and Region-Adaptive normalization method (SARA) in this paper. Our method generates detailed makeup transfer results that can handle large spatial misalignments and achieve part-specific and shade-controllable makeup transfer. Specifically, SARA comprises three modules: Firstly, a spatial alignment module that preserves the spatial context of makeup and provides a target semantic map for guiding the shape-independent style codes. Secondly, a region-adaptive normalization module that decouples shape and makeup style using per-region encoding and normalization, which facilitates the elimination of spatial misalignments. Lastly, a makeup fusion module blends identity features and makeup style by injecting learned scale and bias parameters. Experimental results show that our SARA method outperforms existing methods and achieves state-of-the-art performance on two public datasets. △ Less

Submitted 21 May, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.16818 [pdf, other]

DI-Net : Decomposed Implicit Garment Transfer Network for Digital Clothed 3D Human

Authors: Xiaojing Zhong, Yukun Su, Zhonghua Wu, Guosheng Lin, Qingyao Wu

Abstract: 3D virtual try-on enjoys many potential applications and hence has attracted wide attention. However, it remains a challenging task that has not been adequately solved. Existing 2D virtual try-on methods cannot be directly extended to 3D since they lack the ability to perceive the depth of each pixel. Besides, 3D virtual try-on approaches are mostly built on the fixed topological structure and wit… ▽ More 3D virtual try-on enjoys many potential applications and hence has attracted wide attention. However, it remains a challenging task that has not been adequately solved. Existing 2D virtual try-on methods cannot be directly extended to 3D since they lack the ability to perceive the depth of each pixel. Besides, 3D virtual try-on approaches are mostly built on the fixed topological structure and with heavy computation. To deal with these problems, we propose a Decomposed Implicit garment transfer network (DI-Net), which can effortlessly reconstruct a 3D human mesh with the newly try-on result and preserve the texture from an arbitrary perspective. Specifically, DI-Net consists of two modules: 1) A complementary warping module that warps the reference image to have the same pose as the source image through dense correspondence learning and sparse flow learning; 2) A geometry-aware decomposed transfer module that decomposes the garment transfer into image layout based transfer and texture based transfer, achieving surface and texture reconstruction by constructing pixel-aligned implicit functions. Experimental results show the effectiveness and superiority of our method in the 3D virtual try-on task, which can yield more high-quality results over other existing methods. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.03652 [pdf, other]

Machine Learning Parameterization of the Multi-scale Kain-Fritsch (MSKF) Convection Scheme

Authors: Xiaohui Zhong, Xing Yu, Hao Li

Abstract: Warm-sector heavy rainfall often occurs along the coast of South China, and it is usually localized and long-lasting, making it challenging to predict. High-resolution numerical weather prediction (NWP) models are increasingly used to better resolve topographic features and forecast such high-impact weather events. However, when the grid spacing becomes comparable to the length scales of convectio… ▽ More Warm-sector heavy rainfall often occurs along the coast of South China, and it is usually localized and long-lasting, making it challenging to predict. High-resolution numerical weather prediction (NWP) models are increasingly used to better resolve topographic features and forecast such high-impact weather events. However, when the grid spacing becomes comparable to the length scales of convection, known as the gray zone, the turbulent eddies in the atmospheric boundary layer are only partially resolved and parameterized to some extent. Whether using a convection parameterization (CP) scheme in the gray zone remains controversial. Scale-aware CP schemes are developed to enhance the representation of convective transport within the gray zone. The multi-scale Kain-Fritsch (MSKF) scheme includes modifications that allow for its effective implementation at a grid resolution as high as 2 km. In recent years, there has been an increasing application of machine learning (ML) models to various domains of atmospheric sciences, including the replacement of physical parameterizations with ML models. This work proposes a multi-output bidirectional long short-term memory (Bi-LSTM) model as a replace the scale-aware MSKF CP scheme. The Weather Research and Forecast (WRF) model is used to generate training and testing data over South China at a horizontal resolution of 5 km. Furthermore, the WRF model is coupled with the ML based CP scheme and compared with WRF simulations with original MSKF scheme. The results demonstrate that the Bi-LSTM model can achieve high accuracy, indicating the potential use of ML models to substitute the MSKF scheme in the gray zone. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.00979 [pdf]

Overhead Line Defect Recognition Based on Unsupervised Semantic Segmentation

Authors: Weixi Wang, Xichen Zhong, Xin Li, Sizhe Li, Xun Ma

Abstract: Overhead line inspection greatly benefits from defect recognition using visible light imagery. Addressing the limitations of existing feature extraction techniques and the heavy data dependency of deep learning approaches, this paper introduces a novel defect recognition framework. This is built on the Faster RCNN network and complemented by unsupervised semantic segmentation. The approach involve… ▽ More Overhead line inspection greatly benefits from defect recognition using visible light imagery. Addressing the limitations of existing feature extraction techniques and the heavy data dependency of deep learning approaches, this paper introduces a novel defect recognition framework. This is built on the Faster RCNN network and complemented by unsupervised semantic segmentation. The approach involves identifying the type and location of the target equipment, utilizing semantic segmentation to differentiate between the device and its backdrop, and finally employing similarity measures and logical rules to categorize the type of defect. Experimental results indicate that this methodology focuses more on the equipment rather than the defects when identifying issues in overhead lines. This leads to a notable enhancement in accuracy and exhibits impressive adaptability. Thus, offering a fresh perspective for automating the inspection of distribution network equipment. △ Less

Submitted 6 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.19822 [pdf, other]

FuXi-Extreme: Improving extreme rainfall and wind forecasts with diffusion model

Authors: Xiaohui Zhong, Lei Chen, Jun Liu, Chensen Lin, Yuan Qi, Hao Li

Abstract: Significant advancements in the development of machine learning (ML) models for weather forecasting have produced remarkable results. State-of-the-art ML-based weather forecast models, such as FuXi, have demonstrated superior statistical forecast performance in comparison to the high-resolution forecasts (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF). However, ML models f… ▽ More Significant advancements in the development of machine learning (ML) models for weather forecasting have produced remarkable results. State-of-the-art ML-based weather forecast models, such as FuXi, have demonstrated superior statistical forecast performance in comparison to the high-resolution forecasts (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF). However, ML models face a common challenge: as forecast lead times increase, they tend to generate increasingly smooth predictions, leading to an underestimation of the intensity of extreme weather events. To address this challenge, we developed the FuXi-Extreme model, which employs a denoising diffusion probabilistic model (DDPM) to restore finer-scale details in the surface forecast data generated by the FuXi model in 5-day forecasts. An evaluation of extreme total precipitation ($\textrm{TP}$), 10-meter wind speed ($\textrm{WS10}$), and 2-meter temperature ($\textrm{T2M}$) illustrates the superior performance of FuXi-Extreme over both FuXi and HRES. Moreover, when evaluating tropical cyclone (TC) forecasts based on International Best Track Archive for Climate Stewardship (IBTrACS) dataset, both FuXi and FuXi-Extreme shows superior performance in TC track forecasts compared to HRES, but they show inferior performance in TC intensity forecasts in comparison to HRES. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.19378 [pdf, other]

Few-shot Hybrid Domain Adaptation of Image Generators

Authors: Hengjia Li, Yang Liu, Linxuan Xia, Yuqi Lin, Tu Zheng, Zheng Yang, Wenxiao Wang, Xiaohui Zhong, Xiaobo Ren, Xiaofei He

Abstract: Can a pre-trained generator be adapted to the hybrid of multiple target domains and generate images with integrated attributes of them? In this work, we introduce a new task -- Few-shot Hybrid Domain Adaptation (HDA). Given a source generator and several target domains, HDA aims to acquire an adapted generator that preserves the integrated attributes of all target domains, without overriding the s… ▽ More Can a pre-trained generator be adapted to the hybrid of multiple target domains and generate images with integrated attributes of them? In this work, we introduce a new task -- Few-shot Hybrid Domain Adaptation (HDA). Given a source generator and several target domains, HDA aims to acquire an adapted generator that preserves the integrated attributes of all target domains, without overriding the source domain's characteristics. Compared with Domain Adaptation (DA), HDA offers greater flexibility and versatility to adapt generators to more composite and expansive domains. Simultaneously, HDA also presents more challenges than DA as we have access only to images from individual target domains and lack authentic images from the hybrid domain. To address this issue, we introduce a discriminator-free framework that directly encodes different domains' images into well-separable subspaces. To achieve HDA, we propose a novel directional subspace loss comprised of a distance loss and a direction loss. Concretely, the distance loss blends the attributes of all target domains by reducing the distances from generated images to all target subspaces. The direction loss preserves the characteristics from the source domain by guiding the adaptation along the perpendicular to subspaces. Experiments show that our method can obtain numerous domain-specific attributes in a single adapted generator, which surpasses the baseline methods in semantic similarity, image fidelity, and cross-domain consistency. △ Less

Submitted 6 December, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.09492 [pdf, other]

Perception Reinforcement Using Auxiliary Learning Feature Fusion: A Modified Yolov8 for Head Detection

Authors: Jiezhou Chen, Guankun Wang, Weixiang Liu, Xiaopin Zhong, Yibin Tian, ZongZe Wu

Abstract: Head detection provides distribution information of pedestrian, which is crucial for scene statistical analysis, traffic management, and risk assessment and early warning. However, scene complexity and large-scale variation in the real world make accurate detection more difficult. Therefore, we present a modified Yolov8 which improves head detection performance through reinforcing target perceptio… ▽ More Head detection provides distribution information of pedestrian, which is crucial for scene statistical analysis, traffic management, and risk assessment and early warning. However, scene complexity and large-scale variation in the real world make accurate detection more difficult. Therefore, we present a modified Yolov8 which improves head detection performance through reinforcing target perception. An Auxiliary Learning Feature Fusion (ALFF) module comprised of LSTM and convolutional blocks is used as the auxiliary task to help the model perceive targets. In addition, we introduce Noise Calibration into Distribution Focal Loss to facilitate model fitting and improve the accuracy of detection. Considering the requirements of high accuracy and speed for the head detection task, our method is adapted with two kinds of backbone, namely Yolov8n and Yolov8m. The results demonstrate the superior performance of our approach in improving detection accuracy and robustness. △ Less

Submitted 14 October, 2023; originally announced October 2023.

arXiv:2310.06448 [pdf, other]

Asynchronous Federated Learning with Incentive Mechanism Based on Contract Theory

Authors: Danni Yang, Yun Ji, Zhoubin Kou, Xiaoxiong Zhong, Sheng Zhang

Abstract: To address the challenges posed by the heterogeneity inherent in federated learning (FL) and to attract high-quality clients, various incentive mechanisms have been employed. However, existing incentive mechanisms are typically utilized in conventional synchronous aggregation, resulting in significant straggler issues. In this study, we propose a novel asynchronous FL framework that integrates an… ▽ More To address the challenges posed by the heterogeneity inherent in federated learning (FL) and to attract high-quality clients, various incentive mechanisms have been employed. However, existing incentive mechanisms are typically utilized in conventional synchronous aggregation, resulting in significant straggler issues. In this study, we propose a novel asynchronous FL framework that integrates an incentive mechanism based on contract theory. Within the incentive mechanism, we strive to maximize the utility of the task publisher by adaptively adjusting clients' local model training epochs, taking into account factors such as time delay and test accuracy. In the asynchronous scheme, considering client quality, we devise aggregation weights and an access control algorithm to facilitate asynchronous aggregation. Through experiments conducted on the MNIST dataset, the simulation results demonstrate that the test accuracy achieved by our framework is 3.12% and 5.84% higher than that achieved by FedAvg and FedProx without any attacks, respectively. The framework exhibits a 1.35% accuracy improvement over the ideal Local SGD under attacks. Furthermore, aiming for the same target accuracy, our framework demands notably less computation time than both FedAvg and FedProx. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.05395 [pdf, other]

Robust Image Watermarking based on Cross-Attention and Invariant Domain Learning

Authors: Agnibh Dasgupta, Xin Zhong

Abstract: Image watermarking involves embedding and extracting watermarks within a cover image, with deep learning approaches emerging to bolster generalization and robustness. Predominantly, current methods employ convolution and concatenation for watermark embedding, while also integrating conceivable augmentation in the training process. This paper explores a robust image watermarking methodology by harn… ▽ More Image watermarking involves embedding and extracting watermarks within a cover image, with deep learning approaches emerging to bolster generalization and robustness. Predominantly, current methods employ convolution and concatenation for watermark embedding, while also integrating conceivable augmentation in the training process. This paper explores a robust image watermarking methodology by harnessing cross-attention and invariant domain learning, marking two novel, significant advancements. First, we design a watermark embedding technique utilizing a multi-head cross attention mechanism, enabling information exchange between the cover image and watermark to identify semantically suitable embedding locations. Second, we advocate for learning an invariant domain representation that encapsulates both semantic and noise-invariant information concerning the watermark, shedding light on promising avenues for enhancing image watermarking techniques. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2310.01208 [pdf, other]

Label Supervised LLaMA Finetuning

Authors: Zongxi Li, Xianming Li, Yuzhang Liu, Haoran Xie, Jing Li, Fu-lee Wang, Qing Li, Xiaoqin Zhong

Abstract: The recent success of Large Language Models (LLMs) has gained significant attention in both academia and industry. Substantial efforts have been made to enhance the zero- and few-shot generalization capabilities of open-source LLMs through finetuning. Currently, the prevailing approach is instruction-tuning, which trains LLMs to complete real-world tasks by generating responses guided by natural l… ▽ More The recent success of Large Language Models (LLMs) has gained significant attention in both academia and industry. Substantial efforts have been made to enhance the zero- and few-shot generalization capabilities of open-source LLMs through finetuning. Currently, the prevailing approach is instruction-tuning, which trains LLMs to complete real-world tasks by generating responses guided by natural language instructions. It is worth noticing that such an approach may underperform in sequence and token classification tasks. Unlike text generation tasks, classification tasks have a limited label space, where precise label prediction is more appreciated than generating diverse and human-like responses. Prior research has unveiled that instruction-tuned LLMs cannot outperform BERT, prompting us to explore the potential of leveraging latent representations from LLMs for supervised label prediction. In this paper, we introduce a label-supervised adaptation for LLMs, which aims to finetuning the model with discriminant labels. We evaluate this approach with Label Supervised LLaMA (LS-LLaMA), based on LLaMA-2-7B, a relatively small-scale LLM, and can be finetuned on a single GeForce RTX4090 GPU. We extract latent representations from the final LLaMA layer and project them into the label space to compute the cross-entropy loss. The model is finetuned by Low-Rank Adaptation (LoRA) to minimize this loss. Remarkably, without intricate prompt engineering or external knowledge, LS-LLaMA substantially outperforms LLMs ten times its size in scale and demonstrates consistent improvements compared to robust baselines like BERT-Large and RoBERTa-Large in text classification. Moreover, by removing the causal mask from decoders, LS-unLLaMA achieves the state-of-the-art performance in named entity recognition (NER). Our work will shed light on a novel approach to adapting LLMs for various downstream tasks. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2310.01024 [pdf, other]

Joint Source-Channel Coding System for 6G Communication: Design, Prototype and Future Directions

Authors: Xinchao Zhong, Sean Longyu Ma, Hong-fu Chou, Arsham Mostaani, Thang X. Vu, Symeon Chatzinotas

Abstract: The goal of semantic communication is to surpass optimal Shannon's criterion regarding a notable problem for future communication which lies in the integration of collaborative efforts between the intelligence of the transmission source and the joint design of source coding and channel coding. The convergence of scholarly investigation and applicable products in the field of semantic communication… ▽ More The goal of semantic communication is to surpass optimal Shannon's criterion regarding a notable problem for future communication which lies in the integration of collaborative efforts between the intelligence of the transmission source and the joint design of source coding and channel coding. The convergence of scholarly investigation and applicable products in the field of semantic communication is facilitated by the utilization of flexible structural hardware design, which is constrained by the computational capabilities of edge devices. This characteristic represents a significant benefit of joint source-channel coding (JSCC), as it enables the generation of source alphabets with diverse lengths and achieves a code rate of unity. Moreover, JSCC exhibits near-capacity performance while maintaining low complexity. Therefore, we leverage not only quasi-cyclic (QC) characteristics to propose a QC-LDPC code-based JSCC scheme but also Unequal Error Protection (UEP) to ensure the recovery of semantic importance. In this study, the feasibility for using a semantic encoder/decoder that is aware of UEP can be explored based on the existing JSCC system. This approach is aimed at protecting the significance of semantic task-oriented information. Additionally, the deployment of a JSCC system can be facilitated by employing Low-Density Parity-Check (LDPC) codes on a reconfigurable device. This is achieved by reconstructing the LDPC codes as QC-LDPC codes. The QC-LDPC layered decoding technique, which has been specifically optimized for hardware parallelism and tailored for channel decoding applications, can be suitably adapted to accommodate the JSCC system. The performance of the proposed system is evaluated by conducting BER measurements using both floating-point and 6-bit quantization. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 14 pages, 9 figures, Journal

arXiv:2309.08159 [pdf, other]

doi 10.1145/3580305.3599770

AdSEE: Investigating the Impact of Image Style Editing on Advertisement Attractiveness

Authors: Liyao Jiang, Chenglin Li, Haolan Chen, Xiaodong Gao, Xinwang Zhong, Yang Qiu, Shani Ye, Di Niu

Abstract: Online advertisements are important elements in e-commerce sites, social media platforms, and search engines. With the increasing popularity of mobile browsing, many online ads are displayed with visual information in the form of a cover image in addition to text descriptions to grab the attention of users. Various recent studies have focused on predicting the click rates of online advertisements… ▽ More Online advertisements are important elements in e-commerce sites, social media platforms, and search engines. With the increasing popularity of mobile browsing, many online ads are displayed with visual information in the form of a cover image in addition to text descriptions to grab the attention of users. Various recent studies have focused on predicting the click rates of online advertisements aware of visual features or composing optimal advertisement elements to enhance visibility. In this paper, we propose Advertisement Style Editing and Attractiveness Enhancement (AdSEE), which explores whether semantic editing to ads images can affect or alter the popularity of online advertisements. We introduce StyleGAN-based facial semantic editing and inversion to ads images and train a click rate predictor attributing GAN-based face latent representations in addition to traditional visual and textual features to click rates. Through a large collected dataset named QQ-AD, containing 20,527 online ads, we perform extensive offline tests to study how different semantic directions and their edit coefficients may impact click rates. We further design a Genetic Advertisement Editor to efficiently search for the optimal edit directions and intensity given an input ad cover image to enhance its projected click rates. Online A/B tests performed over a period of 5 days have verified the increased click-through rates of AdSEE-edited samples as compared to a control group of original ads, verifying the relation between image styles and ad popularity. We open source the code for AdSEE research at https://github.com/LiyaoJiang1998/adsee. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: Accepted to KDD 2023 Applied Data Science Track

arXiv:2309.04195 [pdf, other]

Towards Mitigating Architecture Overfitting in Dataset Distillation

Authors: Xuyang Zhong, Chen Liu

Abstract: Dataset distillation methods have demonstrated remarkable performance for neural networks trained with very limited training data. However, a significant challenge arises in the form of architecture overfitting: the distilled training data synthesized by a specific network architecture (i.e., training network) generates poor performance when trained by other network architectures (i.e., test netwo… ▽ More Dataset distillation methods have demonstrated remarkable performance for neural networks trained with very limited training data. However, a significant challenge arises in the form of architecture overfitting: the distilled training data synthesized by a specific network architecture (i.e., training network) generates poor performance when trained by other network architectures (i.e., test networks). This paper addresses this issue and proposes a series of approaches in both architecture designs and training schemes which can be adopted together to boost the generalization performance across different network architectures on the distilled training data. We conduct extensive experiments to demonstrate the effectiveness and generality of our methods. Particularly, across various scenarios involving different sizes of distilled data, our approaches achieve comparable or superior performance to existing methods when training on the distilled data using networks with larger capacities. △ Less

Submitted 8 September, 2023; originally announced September 2023.

arXiv:2309.03360 [pdf, other]

ViewMix: Augmentation for Robust Representation in Self-Supervised Learning

Authors: Arjon Das, Xin Zhong

Abstract: Joint Embedding Architecture-based self-supervised learning methods have attributed the composition of data augmentations as a crucial factor for their strong representation learning capabilities. While regional dropout strategies have proven to guide models to focus on lesser indicative parts of the objects in supervised methods, it hasn't been adopted by self-supervised methods for generating po… ▽ More Joint Embedding Architecture-based self-supervised learning methods have attributed the composition of data augmentations as a crucial factor for their strong representation learning capabilities. While regional dropout strategies have proven to guide models to focus on lesser indicative parts of the objects in supervised methods, it hasn't been adopted by self-supervised methods for generating positive pairs. This is because the regional dropout methods are not suitable for the input sampling process of the self-supervised methodology. Whereas dropping informative pixels from the positive pairs can result in inefficient training, replacing patches of a specific object with a different one can steer the model from maximizing the agreement between different positive pairs. Moreover, joint embedding representation learning methods have not made robustness their primary training outcome. To this end, we propose the ViewMix augmentation policy, specially designed for self-supervised learning, upon generating different views of the same image, patches are cut and pasted from one view to another. By leveraging the different views created by this augmentation strategy, multiple joint embedding-based self-supervised methodologies obtained better localization capability and consistently outperformed their corresponding baseline methods. It is also demonstrated that incorporating ViewMix augmentation policy promotes robustness of the representations in the state-of-the-art methods. Furthermore, our experimentation and analysis of compute times suggest that ViewMix augmentation doesn't introduce any additional overhead compared to other counterparts. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.16870 [pdf, other]

Learning Driver Models for Automated Vehicles via Knowledge Sharing and Personalization

Authors: Wissam Kontar, Xinzhi Zhong, Soyoung Ahn

Abstract: This paper describes a framework for learning Automated Vehicles (AVs) driver models via knowledge sharing between vehicles and personalization. The innate variability in the transportation system makes it exceptionally challenging to expose AVs to all possible driving scenarios during empirical experimentation or testing. Consequently, AVs could be blind to certain encounters that are deemed detr… ▽ More This paper describes a framework for learning Automated Vehicles (AVs) driver models via knowledge sharing between vehicles and personalization. The innate variability in the transportation system makes it exceptionally challenging to expose AVs to all possible driving scenarios during empirical experimentation or testing. Consequently, AVs could be blind to certain encounters that are deemed detrimental to their safe and efficient operation. It is then critical to share knowledge across AVs that increase exposure to driving scenarios occurring in the real world. This paper explores a method to collaboratively train a driver model by sharing knowledge and borrowing strength across vehicles while retaining a personalized model tailored to the vehicle's unique conditions and properties. Our model brings a federated learning approach to collaborate between multiple vehicles while circumventing the need to share raw data between them. We showcase our method's performance in experimental simulations. Such an approach to learning finds several applications across transportation engineering including intelligent transportation systems, traffic management, and vehicle-to-vehicle communication. Code and sample dataset are made available at the project page https://github.com/wissamkontar. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: 10 pages, 8 figures

arXiv:2308.16552 [pdf, other]

Prompt-enhanced Hierarchical Transformer Elevating Cardiopulmonary Resuscitation Instruction via Temporal Action Segmentation

Authors: Yang Liu, Xiaoyun Zhong, Shiyao Zhai, Zhicheng Du, Zhenyuan Gao, Qiming Huang, Canyang Zhang, Bin Jiang, Vijay Kumar Pandey, Sanyang Han, Runming Wang, Yuxing Han, Peiwu Qin

Abstract: The vast majority of people who suffer unexpected cardiac arrest are performed cardiopulmonary resuscitation (CPR) by passersby in a desperate attempt to restore life, but endeavors turn out to be fruitless on account of disqualification. Fortunately, many pieces of research manifest that disciplined training will help to elevate the success rate of resuscitation, which constantly desires a seamle… ▽ More The vast majority of people who suffer unexpected cardiac arrest are performed cardiopulmonary resuscitation (CPR) by passersby in a desperate attempt to restore life, but endeavors turn out to be fruitless on account of disqualification. Fortunately, many pieces of research manifest that disciplined training will help to elevate the success rate of resuscitation, which constantly desires a seamless combination of novel techniques to yield further advancement. To this end, we collect a custom CPR video dataset in which trainees make efforts to behave resuscitation on mannequins independently in adherence to approved guidelines, thereby devising an auxiliary toolbox to assist supervision and rectification of intermediate potential issues via modern deep learning methodologies. Our research empirically views this problem as a temporal action segmentation (TAS) task in computer vision, which aims to segment an untrimmed video at a frame-wise level. Here, we propose a Prompt-enhanced hierarchical Transformer (PhiTrans) that integrates three indispensable modules, including a textual prompt-based Video Features Extractor (VFE), a transformer-based Action Segmentation Executor (ASE), and a regression-based Prediction Refinement Calibrator (PRC). The backbone of the model preferentially derives from applications in three approved public datasets (GTEA, 50Salads, and Breakfast) collected for TAS tasks, which accounts for the excavation of the segmentation pipeline on the CPR dataset. In general, we unprecedentedly probe into a feasible pipeline that genuinely elevates the CPR instruction qualification via action segmentation in conjunction with cutting-edge deep learning techniques. Associated experiments advocate our implementation with multiple metrics surpassing 91.0%. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: Transformer for Cardiopulmonary Resuscitation

arXiv:2308.15344 [pdf, other]

Imperceptible Adversarial Attack on Deep Neural Networks from Image Boundary

Authors: Fahad Alrasheedi, Xin Zhong

Abstract: Although Deep Neural Networks (DNNs), such as the convolutional neural networks (CNN) and Vision Transformers (ViTs), have been successfully applied in the field of computer vision, they are demonstrated to be vulnerable to well-sought Adversarial Examples (AEs) that can easily fool the DNNs. The research in AEs has been active, and many adversarial attacks and explanations have been proposed sinc… ▽ More Although Deep Neural Networks (DNNs), such as the convolutional neural networks (CNN) and Vision Transformers (ViTs), have been successfully applied in the field of computer vision, they are demonstrated to be vulnerable to well-sought Adversarial Examples (AEs) that can easily fool the DNNs. The research in AEs has been active, and many adversarial attacks and explanations have been proposed since they were discovered in 2014. The mystery of the AE's existence is still an open question, and many studies suggest that DNN training algorithms have blind spots. The salient objects usually do not overlap with boundaries; hence, the boundaries are not the DNN model's attention. Nevertheless, recent studies show that the boundaries can dominate the behavior of the DNN models. Hence, this study aims to look at the AEs from a different perspective and proposes an imperceptible adversarial attack that systemically attacks the input image boundary for finding the AEs. The experimental results have shown that the proposed boundary attacking method effectively attacks six CNN models and the ViT using only 32% of the input image content (from the boundaries) with an average success rate (SR) of 95.2% and an average peak signal-to-noise ratio of 41.37 dB. Correlation analyses are conducted, including the relation between the adversarial boundary's width and the SR and how the adversarial boundary changes the DNN model's attention. This paper's discoveries can potentially advance the understanding of AEs and provide a different perspective on how AEs can be constructed. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.09259 [pdf, other]

FRGNN: Mitigating the Impact of Distribution Shift on Graph Neural Networks via Test-Time Feature Reconstruction

Authors: Rui Ding, Jielong Yang, Feng Ji, Xionghu Zhong, Linbo Xie

Abstract: Due to inappropriate sample selection and limited training data, a distribution shift often exists between the training and test sets. This shift can adversely affect the test performance of Graph Neural Networks (GNNs). Existing approaches mitigate this issue by either enhancing the robustness of GNNs to distribution shift or reducing the shift itself. However, both approaches necessitate retrain… ▽ More Due to inappropriate sample selection and limited training data, a distribution shift often exists between the training and test sets. This shift can adversely affect the test performance of Graph Neural Networks (GNNs). Existing approaches mitigate this issue by either enhancing the robustness of GNNs to distribution shift or reducing the shift itself. However, both approaches necessitate retraining the model, which becomes unfeasible when the model structure and parameters are inaccessible. To address this challenge, we propose FR-GNN, a general framework for GNNs to conduct feature reconstruction. FRGNN constructs a mapping relationship between the output and input of a well-trained GNN to obtain class representative embeddings and then uses these embeddings to reconstruct the features of labeled nodes. These reconstructed features are then incorporated into the message passing mechanism of GNNs to influence the predictions of unlabeled nodes at test time. Notably, the reconstructed node features can be directly utilized for testing the well-trained model, effectively reducing the distribution shift and leading to improved test performance. This remarkable achievement is attained without any modifications to the model structure or parameters. We provide theoretical guarantees for the effectiveness of our framework. Furthermore, we conduct comprehensive experiments on various public datasets. The experimental results demonstrate the superior performance of FRGNN in comparison to multiple categories of baseline methods. △ Less

Submitted 13 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.05311 [pdf, other]

doi 10.1145/3581783.3611793

DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting

Authors: Huilin Zhu, Jingling Yuan, Xian Zhong, Zhengwei Yang, Zheng Wang, Shengfeng He

Abstract: Domain adaptation is commonly employed in crowd counting to bridge the domain gaps between different datasets. However, existing domain adaptation methods tend to focus on inter-dataset differences while overlooking the intra-differences within the same dataset, leading to additional learning ambiguities. These domain-agnostic factors, e.g., density, surveillance perspective, and scale, can cause… ▽ More Domain adaptation is commonly employed in crowd counting to bridge the domain gaps between different datasets. However, existing domain adaptation methods tend to focus on inter-dataset differences while overlooking the intra-differences within the same dataset, leading to additional learning ambiguities. These domain-agnostic factors, e.g., density, surveillance perspective, and scale, can cause significant in-domain variations, and the misalignment of these factors across domains can lead to a drop in performance in cross-domain crowd counting. To address this issue, we propose a Domain-agnostically Aligned Optimal Transport (DAOT) strategy that aligns domain-agnostic factors between domains. The DAOT consists of three steps. First, individual-level differences in domain-agnostic factors are measured using structural similarity (SSIM). Second, the optimal transfer (OT) strategy is employed to smooth out these differences and find the optimal domain-to-domain misalignment, with outlier individuals removed via a virtual "dustbin" column. Third, knowledge is transferred based on the aligned domain-agnostic factors, and the model is retrained for domain adaptation to bridge the gap across domains. We conduct extensive experiments on five standard crowd-counting benchmarks and demonstrate that the proposed method has strong generalizability across diverse datasets. Our code will be available at: https://github.com/HopooLinZ/DAOT/. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 11 pages, 12 figures, 5 tables

ACM Class: I.4.m

arXiv:2308.04603 [pdf, other]

A Brief Yet In-Depth Survey of Deep Learning-Based Image Watermarking

Authors: Xin Zhong, Arjon Das, Fahad Alrasheedi, Abdullah Tanvir

Abstract: This paper presents a comprehensive survey on deep learning-based image watermarking, a technique that entails the invisible embedding and extraction of watermarks within a cover image, aiming to offer a seamless blend of robustness and adaptability. We navigate the complex landscape of this interdisciplinary domain, linking historical foundations, current innovations, and prospective developments… ▽ More This paper presents a comprehensive survey on deep learning-based image watermarking, a technique that entails the invisible embedding and extraction of watermarks within a cover image, aiming to offer a seamless blend of robustness and adaptability. We navigate the complex landscape of this interdisciplinary domain, linking historical foundations, current innovations, and prospective developments. Unlike existing literature, our study concentrates exclusively on image watermarking with deep learning, delivering an in-depth, yet brief analysis enriched by three fundamental contributions. First, we introduce a refined categorization, segmenting the field into Embedder-Extractor, Deep Networks as a Feature Transformation, and Hybrid Methods. This taxonomy, inspired by the varied roles of deep learning across studies, is designed to infuse clarity, offering readers technical insights and directional guidance. Second, our exploration dives into representative methodologies, encapsulating the diverse research directions and inherent challenges within each category to provide a consolidated perspective. Lastly, we venture beyond established boundaries to outline emerging frontiers, offering a detailed insight into prospective research avenues. △ Less

Submitted 29 October, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: This paper was accepted for publication by the MDPI Applied Sciences journal

arXiv:2306.16991 [pdf, other]

Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion

Authors: Weide Liu, Xiaoyang Zhong, Jingwen Hou, Shaohua Li, Haozhe Huang, Yuming Fang

Abstract: Multimodal Named Entity Recognition (MNER) is a crucial task for information extraction from social media platforms such as Twitter. Most current methods rely on attention weights to extract information from both text and images but are often unreliable and lack interpretability. To address this problem, we propose incorporating uncertainty estimation into the MNER task, producing trustworthy pred… ▽ More Multimodal Named Entity Recognition (MNER) is a crucial task for information extraction from social media platforms such as Twitter. Most current methods rely on attention weights to extract information from both text and images but are often unreliable and lack interpretability. To address this problem, we propose incorporating uncertainty estimation into the MNER task, producing trustworthy predictions. Our proposed algorithm models the distribution of each modality as a Normal-inverse Gamma distribution, and fuses them into a unified distribution with an evidential fusion mechanism, enabling hierarchical characterization of uncertainties and promotion of prediction accuracy and trustworthiness. Additionally, we explore the potential of pre-trained large foundation models in MNER and propose an efficient fusion approach that leverages their robust feature representations. Experiments on two datasets demonstrate that our proposed method outperforms the baselines and achieves new state-of-the-art performance. △ Less

Submitted 29 June, 2023; originally announced June 2023.

arXiv:2306.12873 [pdf, other]

FuXi: A cascade machine learning forecasting system for 15-day global weather forecast

Authors: Lei Chen, Xiaohui Zhong, Feng Zhang, Yuan Cheng, Yinghui Xu, Yuan Qi, Hao Li

Abstract: Over the past few years, due to the rapid development of machine learning (ML) models for weather forecasting, state-of-the-art ML models have shown superior performance compared to the European Centre for Medium-Range Weather Forecasts (ECMWF)'s high-resolution forecast (HRES) in 10-day forecasts at a spatial resolution of 0.25 degree. However, the challenge remains to perform comparably to the E… ▽ More Over the past few years, due to the rapid development of machine learning (ML) models for weather forecasting, state-of-the-art ML models have shown superior performance compared to the European Centre for Medium-Range Weather Forecasts (ECMWF)'s high-resolution forecast (HRES) in 10-day forecasts at a spatial resolution of 0.25 degree. However, the challenge remains to perform comparably to the ECMWF ensemble mean (EM) in 15-day forecasts. Previous studies have demonstrated the importance of mitigating the accumulation of forecast errors for effective long-term forecasts. Despite numerous efforts to reduce accumulation errors, including autoregressive multi-time step loss, using a single model is found to be insufficient to achieve optimal performance in both short and long lead times. Therefore, we present FuXi, a cascaded ML weather forecasting system that provides 15-day global forecasts with a temporal resolution of 6 hours and a spatial resolution of 0.25 degree. FuXi is developed using 39 years of the ECMWF ERA5 reanalysis dataset. The performance evaluation, based on latitude-weighted root mean square error (RMSE) and anomaly correlation coefficient (ACC), demonstrates that FuXi has comparable forecast performance to ECMWF EM in 15-day forecasts, making FuXi the first ML-based weather forecasting system to accomplish this achievement. △ Less

Submitted 20 October, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

arXiv:2306.10056 [pdf, other]

Generate to Understand for Representation

Authors: Changshang Xue, Xiande Zhong, Xiaoqing Liu

Abstract: In recent years, a significant number of high-quality pretrained models have emerged, greatly impacting Natural Language Understanding (NLU), Natural Language Generation (NLG), and Text Representation tasks. Traditionally, these models are pretrained on custom domain corpora and finetuned for specific tasks, resulting in high costs related to GPU usage and labor. Unfortunately, recent trends in la… ▽ More In recent years, a significant number of high-quality pretrained models have emerged, greatly impacting Natural Language Understanding (NLU), Natural Language Generation (NLG), and Text Representation tasks. Traditionally, these models are pretrained on custom domain corpora and finetuned for specific tasks, resulting in high costs related to GPU usage and labor. Unfortunately, recent trends in language modeling have shifted towards enhancing performance through scaling, further exacerbating the associated costs. Introducing GUR: a pretraining framework that combines language modeling and contrastive learning objectives in a single training step. We select similar text pairs based on their Longest Common Substring (LCS) from raw unlabeled documents and train the model using masked language modeling and unsupervised contrastive learning. The resulting model, GUR, achieves impressive results without any labeled training data, outperforming all other pretrained baselines as a retriever at the recall benchmark in a zero-shot setting. Additionally, GUR maintains its language modeling ability, as demonstrated in our ablation experiment. Our code is available at \url{https://github.com/laohur/GUR}. △ Less

Submitted 14 June, 2023; originally announced June 2023.

MSC Class: 68T50 (Primary) 03B65; 91F20(Secondary) ACM Class: I.7

arXiv:2305.14150 [pdf, other]

WYWEB: A NLP Evaluation Benchmark For Classical Chinese

Authors: Bo Zhou, Qianglong Chen, Tianyu Wang, Xiaomi Zhong, Yin Zhang

Abstract: To fully evaluate the overall performance of different NLP models in a given domain, many evaluation benchmarks are proposed, such as GLUE, SuperGLUE and CLUE. The fi eld of natural language understanding has traditionally focused on benchmarks for various tasks in languages such as Chinese, English, and multilingua, however, there has been a lack of attention given to the area of classical Chines… ▽ More To fully evaluate the overall performance of different NLP models in a given domain, many evaluation benchmarks are proposed, such as GLUE, SuperGLUE and CLUE. The fi eld of natural language understanding has traditionally focused on benchmarks for various tasks in languages such as Chinese, English, and multilingua, however, there has been a lack of attention given to the area of classical Chinese, also known as "wen yan wen", which has a rich history spanning thousands of years and holds signifi cant cultural and academic value. For the prosperity of the NLP community, in this paper, we introduce the WYWEB evaluation benchmark, which consists of nine NLP tasks in classical Chinese, implementing sentence classifi cation, sequence labeling, reading comprehension, and machine translation. We evaluate the existing pre-trained language models, which are all struggling with this benchmark. We also introduce a number of supplementary datasets and additional tools to help facilitate further progress on classical Chinese NLU. The github repository is https://github.com/baudzhou/WYWEB. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted by ACL 2023

Report number: 2023.findings-acl.204

Journal ref: https://aclanthology.org/2023.findings-acl.204

arXiv:2305.12691 [pdf, other]

Hi-ResNet: A High-Resolution Remote Sensing Network for Semantic Segmentation

Authors: Yuxia Chen, Pengcheng Fang, Jianhui Yu, Xiaoling Zhong, Xiaoming Zhang, Tianrui Li

Abstract: High-resolution remote sensing (HRS) semantic segmentation extracts key objects from high-resolution coverage areas. However, objects of the same category within HRS images generally show significant differences in scale and shape across diverse geographical environments, making it difficult to fit the data distribution. Additionally, a complex background environment causes similar appearances of… ▽ More High-resolution remote sensing (HRS) semantic segmentation extracts key objects from high-resolution coverage areas. However, objects of the same category within HRS images generally show significant differences in scale and shape across diverse geographical environments, making it difficult to fit the data distribution. Additionally, a complex background environment causes similar appearances of objects of different categories, which precipitates a substantial number of objects into misclassification as background. These issues make existing learning algorithms sub-optimal. In this work, we solve the above-mentioned problems by proposing a High-resolution remote sensing network (Hi-ResNet) with efficient network structure designs, which consists of a funnel module, a multi-branch module with stacks of information aggregation (IA) blocks, and a feature refinement module, sequentially, and Class-agnostic Edge Aware (CEA) loss. Specifically, we propose a funnel module to downsample, which reduces the computational cost, and extract high-resolution semantic information from the initial input image. Secondly, we downsample the processed feature images into multi-resolution branches incrementally to capture image features at different scales and apply IA blocks, which capture key latent information by leveraging attention mechanisms, for effective feature aggregation, distinguishing image features of the same class with variant scales and shapes. Finally, our feature refinement module integrate the CEA loss function, which disambiguates inter-class objects with similar shapes and increases the data distribution distance for correct predictions. With effective pre-training strategies, we demonstrated the superiority of Hi-ResNet over state-of-the-art methods on three HRS segmentation benchmarks. △ Less

Submitted 23 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

arXiv:2305.09145 [pdf, other]

Deep ReLU Networks Have Surprisingly Simple Polytopes

Authors: Feng-Lei Fan, Wei Huang, Xiangru Zhong, Lecheng Ruan, Tieyong Zeng, Huan Xiong, Fei Wang

Abstract: A ReLU network is a piecewise linear function over polytopes. Figuring out the properties of such polytopes is of fundamental importance for the research and development of neural networks. So far, either theoretical or empirical studies on polytopes only stay at the level of counting their number, which is far from a complete characterization of polytopes. To upgrade the characterization to a new… ▽ More A ReLU network is a piecewise linear function over polytopes. Figuring out the properties of such polytopes is of fundamental importance for the research and development of neural networks. So far, either theoretical or empirical studies on polytopes only stay at the level of counting their number, which is far from a complete characterization of polytopes. To upgrade the characterization to a new level, here we propose to study the shapes of polytopes via the number of simplices obtained by triangulating the polytope. Then, by computing and analyzing the histogram of simplices across polytopes, we find that a ReLU network has relatively simple polytopes under both initialization and gradient descent, although these polytopes theoretically can be rather diverse and complicated. This finding can be appreciated as a novel implicit bias. Next, we use nontrivial combinatorial derivation to theoretically explain why adding depth does not create a more complicated polytope by bounding the average number of faces of polytopes with a function of the dimensionality. Our results concretely reveal what kind of simple functions a network learns and its space partition property. Also, by characterizing the shape of polytopes, the number of simplices be a leverage for other problems, \textit{e.g.}, serving as a generic functional complexity measure to explain the power of popular shortcut networks such as ResNet and analyzing the impact of different regularization strategies on a network's space partition. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.05773 [pdf, other]

DeepTextMark: A Deep Learning-Driven Text Watermarking Approach for Identifying Large Language Model Generated Text

Authors: Travis Munyer, Abdullah Tanvir, Arjon Das, Xin Zhong

Abstract: The rapid advancement of Large Language Models (LLMs) has significantly enhanced the capabilities of text generators. With the potential for misuse escalating, the importance of discerning whether texts are human-authored or generated by LLMs has become paramount. Several preceding studies have ventured to address this challenge by employing binary classifiers to differentiate between human-writte… ▽ More The rapid advancement of Large Language Models (LLMs) has significantly enhanced the capabilities of text generators. With the potential for misuse escalating, the importance of discerning whether texts are human-authored or generated by LLMs has become paramount. Several preceding studies have ventured to address this challenge by employing binary classifiers to differentiate between human-written and LLM-generated text. Nevertheless, the reliability of these classifiers has been subject to question. Given that consequential decisions may hinge on the outcome of such classification, it is imperative that text source detection is of high caliber. In light of this, the present paper introduces DeepTextMark, a deep learning-driven text watermarking methodology devised for text source identification. By leveraging Word2Vec and Sentence Encoding for watermark insertion, alongside a transformer-based classifier for watermark detection, DeepTextMark epitomizes a blend of blindness, robustness, imperceptibility, and reliability. As elaborated within the paper, these attributes are crucial for universal text source detection, with a particular emphasis in this paper on text produced by LLMs. DeepTextMark offers a viable "add-on" solution to prevailing text generation frameworks, requiring no direct access or alterations to the underlying text generation mechanism. Experimental evaluations underscore the high imperceptibility, elevated detection accuracy, augmented robustness, reliability, and swift execution of DeepTextMark. △ Less

Submitted 11 March, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: The paper has been accpeted for publication by IEEE Access

arXiv:2305.04066 [pdf, other]

Semi-Asynchronous Federated Edge Learning Mechanism via Over-the-air Computation

Authors: Zhoubin Kou, Yun Ji, Xiaoxiong Zhong, Sheng Zhang

Abstract: Over-the-air Computation (AirComp) has been demonstrated as an effective transmission scheme to boost the efficiency of federated edge learning (FEEL). However, existing FEEL systems with AirComp scheme often employ traditional synchronous aggregation mechanisms for local model aggregation in each global round, which suffer from the stragglers issues. In this paper, we propose a semi-asynchronous… ▽ More Over-the-air Computation (AirComp) has been demonstrated as an effective transmission scheme to boost the efficiency of federated edge learning (FEEL). However, existing FEEL systems with AirComp scheme often employ traditional synchronous aggregation mechanisms for local model aggregation in each global round, which suffer from the stragglers issues. In this paper, we propose a semi-asynchronous aggregation FEEL mechanism with AirComp scheme (PAOTA) to improve the training efficiency of the FEEL system in the case of significant heterogeneity in data and devices. Taking the staleness and divergence of model updates from edge devices into consideration, we minimize the convergence upper bound of the FEEL global model by adjusting the uplink transmit power of edge devices at each aggregation period. The simulation results demonstrate that our proposed algorithm achieves convergence performance close to that of the ideal Local SGD. Furthermore, with the same target accuracy, the training time required for PAOTA is less than that of the ideal Local SGD and the synchronous FEEL algorithm via AirComp. △ Less

Submitted 29 May, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

Showing 1–50 of 160 results for author: Zhong, X