subscribe to arXiv mailings

FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting

Authors: Xiaohui Zhong, Lei Chen, Hao Li, Jun Liu, Xu Fan, Jie Feng, Kan Dai, Jing-Jia Luo, Jie Wu, Yuan Qi, Bo Lu

Abstract: Ensemble forecasting is crucial for improving weather predictions, especially for forecasts of extreme events. Constructing an ensemble prediction system (EPS) based on conventional NWP models is highly computationally expensive. ML models have emerged as valuable tools for deterministic weather forecasts, providing forecasts with significantly reduced computational requirements and even surpassin… ▽ More Ensemble forecasting is crucial for improving weather predictions, especially for forecasts of extreme events. Constructing an ensemble prediction system (EPS) based on conventional NWP models is highly computationally expensive. ML models have emerged as valuable tools for deterministic weather forecasts, providing forecasts with significantly reduced computational requirements and even surpassing the forecast performance of traditional NWP models. However, challenges arise when applying ML models to ensemble forecasting. Recent ML models, such as GenCast and SEEDS model, rely on the ERA5 EDA or operational NWP ensemble members for forecast generation. Their spatial resolution is also considered too coarse for many applications. To overcome these limitations, we introduce FuXi-ENS, an advanced ML model designed to deliver 6-hourly global ensemble weather forecasts up to 15 days. This model runs at a significantly increased spatial resolution of 0.25\textdegree, incorporating 5 atmospheric variables at 13 pressure levels, along with 13 surface variables. By leveraging the inherent probabilistic nature of Variational AutoEncoder (VAE), FuXi-ENS optimizes a loss function that combines the CRPS and the KL divergence between the predicted and target distribution, facilitating the incorporation of flow-dependent perturbations in both initial conditions and forecast. This innovative approach makes FuXi-ENS an advancement over the traditional ones that use L1 loss combined with the KL loss in standard VAE models for ensemble weather forecasting. Results demonstrate that FuXi-ENS outperforms ensemble forecasts from the ECMWF, a world leading NWP model, in the CRPS of 98.1% of 360 variable and forecast lead time combinations. This achievement underscores the potential of the FuXi-ENS model to enhance ensemble weather forecasts, offering a promising direction for further development in this field. △ Less

Submitted 5 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

arXiv:2404.14132 [pdf, other]

CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task

Authors: Kangzhen Yang, Tao Hu, Kexin Dai, Genggeng Chen, Yu Cao, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

Abstract: In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. Howev… ▽ More In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. However, merely performing a single type of image enhancement still cannot yield satisfactory images. In this paper, to deal with the challenge above, we propose the Composite Refinement Network (CRNet) to address this issue using multiple exposure images. By fully integrating information-rich multiple exposure inputs, CRNet can perform unified image restoration and enhancement. To improve the quality of image details, CRNet explicitly separates and strengthens high and low-frequency information through pooling layers, using specially designed Multi-Branch Blocks for effective fusion of these frequencies. To increase the receptive field and fully integrate input features, CRNet employs the High-Frequency Enhancement Module, which includes large kernel convolutions and an inverted bottleneck ConvFFN. Our model secured third place in the first track of the Bracketing Image Restoration and Enhancement Challenge, surpassing previous SOTA models in both testing metrics and visual quality. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: This paper is accepted by CVPR2024 Workshop, Code: https://github.com/CalvinYang0/CRNet

arXiv:2404.13537 [pdf, other]

Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition

Authors: Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

Abstract: In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul… ▽ More In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resulting in less-than-ideal restoration outcomes. Inspired by the notion that high/low frequency information is applicable to different degradations, we introduce HLNet, a Bracketing Image Restoration and Enhancement method based on high-low frequency decomposition. Specifically, we employ two modules for feature extraction: shared weight modules and non-shared weight modules. In the shared weight modules, we use SCConv to extract common features from different degradations. In the non-shared weight modules, we introduce the High-Low Frequency Decomposition Block (HLFDB), which employs different methods to handle high-low frequency information, enabling the model to address different degradations more effectively. Compared to other networks, our method takes into account the characteristics of different degradations, thus achieving higher-quality image restoration. △ Less

Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: This paper is accepted by CVPR 2024 Workshop, code: https://github.com/chengeng0613/HLNet

arXiv:2404.10512 [pdf]

Four-hour thunderstorm nowcasting using deep diffusion models of satellite

Authors: Kuai Dai, Xutao Li, Junying Fang, Yunming Ye, Demin Yu, Di Xian, Danyu Qin

Abstract: Convection (thunderstorm) develops rapidly within hours and is highly destructive, posing a significant challenge for nowcasting and resulting in substantial losses to nature and society. After the emergence of artificial intelligence (AI)-based methods, convection nowcasting has experienced rapid advancements, with its performance surpassing that of physics-based numerical weather prediction and… ▽ More Convection (thunderstorm) develops rapidly within hours and is highly destructive, posing a significant challenge for nowcasting and resulting in substantial losses to nature and society. After the emergence of artificial intelligence (AI)-based methods, convection nowcasting has experienced rapid advancements, with its performance surpassing that of physics-based numerical weather prediction and other conventional approaches. However, the lead time and coverage of it still leave much to be desired and hardly meet the needs of disaster emergency response. Here, we propose a deep diffusion model of satellite (DDMS) to establish an AI-based convection nowcasting system. On one hand, it employs diffusion processes to effectively simulate complicated spatiotemporal evolution patterns of convective clouds, significantly improving the forecast lead time. On the other hand, it utilizes geostationary satellite brightness temperature data, thereby achieving planetary-scale forecast coverage. During long-term tests and objective validation based on the FengYun-4A satellite, our system achieves, for the first time, effective convection nowcasting up to 4 hours, with broad coverage (about 20,000,000 km2), remarkable accuracy, and high resolution (15 minutes; 4 km). Its performance reaches a new height in convection nowcasting compared to the existing models. In terms of application, our system operates efficiently (forecasting 4 hours of convection in 8 minutes), and is highly transferable with the potential to collaborate with multiple satellites for global convection nowcasting. Furthermore, our results highlight the remarkable capabilities of diffusion models in convective clouds forecasting, as well as the significant value of geostationary satellite data when empowered by AI technologies. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2402.00059 [pdf, other]

FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting

Authors: Tao Han, Song Guo, Fenghua Ling, Kang Chen, Junchao Gong, Jingjia Luo, Junxia Gu, Kan Dai, Wanli Ouyang, Lei Bai

Abstract: Kilometer-scale modeling of global atmosphere dynamics enables fine-grained weather forecasting and decreases the risk of disastrous weather and climate activity. Therefore, building a kilometer-scale global forecast model is a persistent pursuit in the meteorology domain. Active international efforts have been made in past decades to improve the spatial resolution of numerical weather models. Non… ▽ More Kilometer-scale modeling of global atmosphere dynamics enables fine-grained weather forecasting and decreases the risk of disastrous weather and climate activity. Therefore, building a kilometer-scale global forecast model is a persistent pursuit in the meteorology domain. Active international efforts have been made in past decades to improve the spatial resolution of numerical weather models. Nonetheless, developing the higher resolution numerical model remains a long-standing challenge due to the substantial consumption of computational resources. Recent advances in data-driven global weather forecasting models utilize reanalysis data for model training and have demonstrated comparable or even higher forecasting skills than numerical models. However, they are all limited by the resolution of reanalysis data and incapable of generating higher-resolution forecasts. This work presents FengWu-GHR, the first data-driven global weather forecasting model running at the 0.09$^{\circ}$ horizontal resolution. FengWu-GHR introduces a novel approach that opens the door for operating ML-based high-resolution forecasts by inheriting prior knowledge from a pretrained low-resolution model. The hindcast of weather prediction in 2022 indicates that FengWu-GHR is superior to the IFS-HRES. Furthermore, evaluations on station observations and case studies of extreme events support the competitive operational forecasting skill of FengWu-GHR at the high resolution. △ Less

Submitted 28 January, 2024; originally announced February 2024.

Comments: 19 pages

arXiv:2312.06734 [pdf, other]

DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting

Authors: Demin Yu, Xutao Li, Yunming Ye, Baoquan Zhang, Chuyao Luo, Kuai Dai, Rui Wang, Xunlai Chen

Abstract: Precipitation nowcasting is an important spatio-temporal prediction task to predict the radar echoes sequences based on current observations, which can serve both meteorological science and smart city applications. Due to the chaotic evolution nature of the precipitation systems, it is a very challenging problem. Previous studies address the problem either from the perspectives of deterministic mo… ▽ More Precipitation nowcasting is an important spatio-temporal prediction task to predict the radar echoes sequences based on current observations, which can serve both meteorological science and smart city applications. Due to the chaotic evolution nature of the precipitation systems, it is a very challenging problem. Previous studies address the problem either from the perspectives of deterministic modeling or probabilistic modeling. However, their predictions suffer from the blurry, high-value echoes fading away and position inaccurate issues. The root reason of these issues is that the chaotic evolutionary precipitation systems are not appropriately modeled. Inspired by the nature of the systems, we propose to decompose and model them from the perspective of global deterministic motion and local stochastic variations with residual mechanism. A unified and flexible framework that can equip any type of spatio-temporal models is proposed based on residual diffusion, which effectively tackles the shortcomings of previous methods. Extensive experimental results on four publicly available radar datasets demonstrate the effectiveness and superiority of the proposed framework, compared to state-of-the-art techniques. Our code is publicly available at https://github.com/DeminYu98/DiffCast. △ Less

Submitted 25 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: CVPR 2024; https://github.com/DeminYu98/DiffCast

arXiv:2310.13605 [pdf, other]

FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer

Authors: Xinyu Zhang, Li Wang, Zhiqiang Jiang, Kun Dai, Tao Xie, Lei Yang, Wenhao Yu, Yang Shen, Jun Li

Abstract: Local Feature Matching, an essential component of several computer vision tasks (e.g., structure from motion and visual localization), has been effectively settled by Transformer-based methods. However, these methods only integrate long-range context information among keypoints with a fixed receptive field, which constrains the network from reconciling the importance of features with different rec… ▽ More Local Feature Matching, an essential component of several computer vision tasks (e.g., structure from motion and visual localization), has been effectively settled by Transformer-based methods. However, these methods only integrate long-range context information among keypoints with a fixed receptive field, which constrains the network from reconciling the importance of features with different receptive fields to realize complete image perception, hence limiting the matching accuracy. In addition, these methods utilize a conventional handcrafted encoding approach to integrate the positional information of keypoints into the visual descriptors, which limits the capability of the network to extract reliable positional encoding message. In this study, we propose Feature Matching with Reconciliatory Transformer (FMRT), a novel Transformer-based detector-free method that reconciles different features with multiple receptive fields adaptively and utilizes parallel networks to realize reliable positional encoding. Specifically, FMRT proposes a dedicated Reconciliatory Transformer (RecFormer) that consists of a Global Perception Attention Layer (GPAL) to extract visual descriptors with different receptive fields and integrate global context information under various scales, Perception Weight Layer (PWL) to measure the importance of various receptive fields adaptively, and Local Perception Feed-forward Network (LPFFN) to extract deep aggregated multi-scale local feature representation. Extensive experiments demonstrate that FMRT yields extraordinary performance on multiple benchmarks, including pose estimation, visual localization, homography estimation, and image matching. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2308.11928 [pdf, other]

OFVL-MS: Once for Visual Localization across Multiple Indoor Scenes

Authors: Tao Xie, Kun Dai, Siyi Lu, Ke Wang, Zhiqiang Jiang, Jinghan Gao, Dedong Liu, Jie Xu, Lijun Zhao, Ruifeng Li

Abstract: In this work, we seek to predict camera poses across scenes with a multi-task learning manner, where we view the localization of each scene as a new task. We propose OFVL-MS, a unified framework that dispenses with the traditional practice of training a model for each individual scene and relieves gradient conflict induced by optimizing multiple scenes collectively, enabling efficient storage yet… ▽ More In this work, we seek to predict camera poses across scenes with a multi-task learning manner, where we view the localization of each scene as a new task. We propose OFVL-MS, a unified framework that dispenses with the traditional practice of training a model for each individual scene and relieves gradient conflict induced by optimizing multiple scenes collectively, enabling efficient storage yet precise visual localization for all scenes. Technically, in the forward pass of OFVL-MS, we design a layer-adaptive sharing policy with a learnable score for each layer to automatically determine whether the layer is shared or not. Such sharing policy empowers us to acquire task-shared parameters for a reduction of storage cost and task-specific parameters for learning scene-related features to alleviate gradient conflict. In the backward pass of OFVL-MS, we introduce a gradient normalization algorithm that homogenizes the gradient magnitude of the task-shared parameters so that all tasks converge at the same pace. Furthermore, a sparse penalty loss is applied on the learnable scores to facilitate parameter sharing for all tasks without performance degradation. We conduct comprehensive experiments on multiple benchmarks and our new released indoor dataset LIVL, showing that OFVL-MS families significantly outperform the state-of-the-arts with fewer parameters. We also verify that OFVL-MS can generalize to a new scene with much few parameters while gaining superior localization performance. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2307.16675 [pdf, other]

Poly-MOT: A Polyhedral Framework For 3D Multi-Object Tracking

Authors: Xiaoyu Li, Tao Xie, Dedong Liu, Jinghan Gao, Kun Dai, Zhiqiang Jiang, Lijun Zhao, Ke Wang

Abstract: 3D Multi-object tracking (MOT) empowers mobile robots to accomplish well-informed motion planning and navigation tasks by providing motion trajectories of surrounding objects. However, existing 3D MOT methods typically employ a single similarity metric and physical model to perform data association and state estimation for all objects. With large-scale modern datasets and real scenes, there are a… ▽ More 3D Multi-object tracking (MOT) empowers mobile robots to accomplish well-informed motion planning and navigation tasks by providing motion trajectories of surrounding objects. However, existing 3D MOT methods typically employ a single similarity metric and physical model to perform data association and state estimation for all objects. With large-scale modern datasets and real scenes, there are a variety of object categories that commonly exhibit distinctive geometric properties and motion patterns. In this way, such distinctions would enable various object categories to behave differently under the same standard, resulting in erroneous matches between trajectories and detections, and jeopardizing the reliability of downstream tasks (navigation, etc.). Towards this end, we propose Poly-MOT, an efficient 3D MOT method based on the Tracking-By-Detection framework that enables the tracker to choose the most appropriate tracking criteria for each object category. Specifically, Poly-MOT leverages different motion models for various object categories to characterize distinct types of motion accurately. We also introduce the constraint of the rigid structure of objects into a specific motion model to accurately describe the highly nonlinear motion of the object. Additionally, we introduce a two-stage data association strategy to ensure that objects can find the optimal similarity metric from three custom metrics for their categories and reduce missing matches. On the NuScenes dataset, our proposed method achieves state-of-the-art performance with 75.4\% AMOTA. The code is available at https://github.com/lixiaoyu2000/Poly-MOT △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: Accepted to IROS 2023, 1st on the NuScenes Tracking benchmark with 75.4 AMOTA

arXiv:2302.06848 [pdf, ps, other]

YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection

Authors: Jianhua Yang, Kun Dai

Abstract: Designing a real-time framework for the spatio-temporal action detection task is still a challenge. In this paper, we propose a novel real-time action detection framework, YOWOv2. In this new framework, YOWOv2 takes advantage of both the 3D backbone and 2D backbone for accurate action detection. A multi-level detection pipeline is designed to detect action instances of different scales. To achieve… ▽ More Designing a real-time framework for the spatio-temporal action detection task is still a challenge. In this paper, we propose a novel real-time action detection framework, YOWOv2. In this new framework, YOWOv2 takes advantage of both the 3D backbone and 2D backbone for accurate action detection. A multi-level detection pipeline is designed to detect action instances of different scales. To achieve this goal, we carefully build a simple and efficient 2D backbone with a feature pyramid network to extract different levels of classification features and regression features. For the 3D backbone, we adopt the existing efficient 3D CNN to save development time. By combining 3D backbones and 2D backbones of different sizes, we design a YOWOv2 family including YOWOv2-Tiny, YOWOv2-Medium, and YOWOv2-Large. We also introduce the popular dynamic label assignment strategy and anchor-free mechanism to make the YOWOv2 consistent with the advanced model architecture design. With our improvement, YOWOv2 is significantly superior to YOWO, and can still keep real-time detection. Without any bells and whistles, YOWOv2 achieves 87.0 % frame mAP and 52.8 % video mAP with over 20 FPS on the UCF101-24. On the AVA, YOWOv2 achieves 21.7 % frame mAP with over 20 FPS. Our code is available on https://github.com/yjh0410/YOWOv2. △ Less

Submitted 7 June, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

arXiv:2302.05846 [pdf, other]

doi 10.1016/j.patcog.2023.110094

OAMatcher: An Overlapping Areas-based Network for Accurate Local Feature Matching

Authors: Kun Dai, Tao Xie, Ke Wang, Zhiqiang Jiang, Ruifeng Li, Lijun Zhao

Abstract: Local feature matching is an essential component in many visual applications. In this work, we propose OAMatcher, a Tranformer-based detector-free method that imitates humans behavior to generate dense and accurate matches. Firstly, OAMatcher predicts overlapping areas to promote effective and clean global context aggregation, with the key insight that humans focus on the overlapping areas instead… ▽ More Local feature matching is an essential component in many visual applications. In this work, we propose OAMatcher, a Tranformer-based detector-free method that imitates humans behavior to generate dense and accurate matches. Firstly, OAMatcher predicts overlapping areas to promote effective and clean global context aggregation, with the key insight that humans focus on the overlapping areas instead of the entire images after multiple observations when matching keypoints in image pairs. Technically, we first perform global information integration across all keypoints to imitate the humans behavior of observing the entire images at the beginning of feature matching. Then, we propose Overlapping Areas Prediction Module (OAPM) to capture the keypoints in co-visible regions and conduct feature enhancement among them to simulate that humans transit the focus regions from the entire images to overlapping regions, hence realizeing effective information exchange without the interference coming from the keypoints in non overlapping areas. Besides, since humans tend to leverage probability to determine whether the match labels are correct or not, we propose a Match Labels Weight Strategy (MLWS) to generate the coefficients used to appraise the reliability of the ground-truth match labels, while alleviating the influence of measurement noise coming from the data. Moreover, we integrate depth-wise convolution into Tranformer encoder layers to ensure OAMatcher extracts local and global feature representation concurrently. Comprehensive experiments demonstrate that OAMatcher outperforms the state-of-the-art methods on several benchmarks, while exhibiting excellent robustness to extreme appearance variants. The source code is available at https://github.com/DK-HU/OAMatcher. △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2301.02993 [pdf, other]

DeepMatcher: A Deep Transformer-based Network for Robust and Accurate Local Feature Matching

Authors: Tao Xie, Kun Dai, Ke Wang, Ruifeng Li, Lijun Zhao

Abstract: Local feature matching between images remains a challenging task, especially in the presence of significant appearance variations, e.g., extreme viewpoint changes. In this work, we propose DeepMatcher, a deep Transformer-based network built upon our investigation of local feature matching in detector-free methods. The key insight is that local feature matcher with deep layers can capture more huma… ▽ More Local feature matching between images remains a challenging task, especially in the presence of significant appearance variations, e.g., extreme viewpoint changes. In this work, we propose DeepMatcher, a deep Transformer-based network built upon our investigation of local feature matching in detector-free methods. The key insight is that local feature matcher with deep layers can capture more human-intuitive and simpler-to-match features. Based on this, we propose a Slimming Transformer (SlimFormer) dedicated for DeepMatcher, which leverages vector-based attention to model relevance among all keypoints and achieves long-range context aggregation in an efficient and effective manner. A relative position encoding is applied to each SlimFormer so as to explicitly disclose relative distance information, further improving the representation of keypoints. A layer-scale strategy is also employed in each SlimFormer to enable the network to assimilate message exchange from the residual block adaptively, thus allowing it to simulate the human behaviour that humans can acquire different matching cues each time they scan an image pair. To facilitate a better adaption of the SlimFormer, we introduce a Feature Transition Module (FTM) to ensure a smooth transition in feature scopes with different receptive fields. By interleaving the self- and cross-SlimFormer multiple times, DeepMatcher can easily establish pixel-wise dense matches at coarse level. Finally, we perceive the match refinement as a combination of classification and regression problems and design Fine Matches Module to predict confidence and offset concurrently, thereby generating robust and accurate matches. Experimentally, we show that DeepMatcher significantly outperforms the state-of-the-art methods on several benchmarks, demonstrating the superior matching capability of DeepMatcher. △ Less

Submitted 8 January, 2023; originally announced January 2023.

arXiv:2211.11843 [pdf, other]

SLUGBOT, an Aplysia-inspired Robotic Grasper for Studying Control

Authors: Kevin Dai, Ravesh Sukhnandan, Michael Bennington, Karen Whirley, Ryan Bao, Lu Li, Jeffrey P. Gill, Hillel J. Chiel, Victoria A. Webster-Wood

Abstract: Living systems can use a single periphery to perform a variety of tasks and adapt to a dynamic environment. This multifunctionality is achieved through the use of neural circuitry that adaptively controls the reconfigurable musculature. Current robotic systems struggle to flexibly adapt to unstructured environments. Through mimicry of the neuromechanical coupling seen in living organisms, robotic… ▽ More Living systems can use a single periphery to perform a variety of tasks and adapt to a dynamic environment. This multifunctionality is achieved through the use of neural circuitry that adaptively controls the reconfigurable musculature. Current robotic systems struggle to flexibly adapt to unstructured environments. Through mimicry of the neuromechanical coupling seen in living organisms, robotic systems could potentially achieve greater autonomy. The tractable neuromechanics of the sea slug $\textit{Aplysia californica's}$ feeding apparatus, or buccal mass, make it an ideal candidate for applying neuromechanical principles to the control of a soft robot. In this work, a robotic grasper was designed to mimic specific morphology of the $\textit{Aplysia}$ feeding apparatus. These include the use of soft actuators akin to biological muscle, a deformable grasping surface, and a similar muscular architecture. A previously developed Boolean neural controller was then adapted for the control of this soft robotic system. The robot was capable of qualitatively replicating swallowing behavior by cyclically ingesting a plastic tube. The robot's normalized translational and rotational kinematics of the odontophore followed profiles observed $\textit{in vivo}$ despite morphological differences. This brings $\textit{Aplysia}$-inspired control $\textit{in roboto}$ one step closer to multifunctional neural control schema $\textit{in vivo}$ and $\textit{in silico}$. Future additions may improve SLUGBOT's viability as a neuromechanical research platform. △ Less

Submitted 21 November, 2022; originally announced November 2022.

Comments: Submitted and accepted to Living Machines 2022 conference

arXiv:2206.14737 [pdf, other]

doi 10.1007/978-3-030-91387-8_16

A Consensus-Based Load-Balancing Algorithm for Sharded Blockchains

Authors: M. Toulouse, H. K. Dai, Q. L. Nguyen

Abstract: Public blockchains are decentralized networks where each participating node executes the same decision-making process. This form of decentralization does not scale well because the same data are stored on each network node, and because all nodes must validate each transaction prior to their confirmation. One solution approach decomposes the nodes of a blockchain network into subsets called "shards… ▽ More Public blockchains are decentralized networks where each participating node executes the same decision-making process. This form of decentralization does not scale well because the same data are stored on each network node, and because all nodes must validate each transaction prior to their confirmation. One solution approach decomposes the nodes of a blockchain network into subsets called "shards", each shard processing and storing disjoint sets of transactions in parallel. To fully benefit from the parallelism of sharded blockchains, the processing load of shards must be evenly distributed. However, the problem of computing balanced workloads is theoretically hard and further complicated in practice as transaction processing times are unknown prior to be assigned to shards. In this paper we introduce a dynamic workload-balancing algorithm where the allocation strategy of transactions to shards is periodically adapted based on the recent workload history of shards. Our algorithm is an adaptation to sharded blockchains of a consensus-based load-balancing algorithm. It is a fully distributed algorithm inline with network based applications such as blockchains. Some preliminary results are reported based on simulations that shard transactions of three well-known blockchain platforms. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Journal ref: Dang, T.K., Kung, J., Chung, T.M., Takizawa, M. (eds) Future Data and Security Engineering. FDSE 2021. Lecture Notes in Computer Science(), vol 13076. Springer

arXiv:2206.00910 [pdf, other]

A Real-time Critical-scenario-generation Framework for Testing Autonomous Driving System

Authors: Yizhou Xie, Kunpeng Dai, Yong Zhang

Abstract: In order to find the most likely failure scenarios which may occur under certain given operation domain, critical-scenario-based test is supposed as an effective and widely used method, which gives suggestions for designers to improve the developing algorithm. However, for the state of art, critical-scenario generation approaches commonly utilize random-search or reinforcement learning methods to… ▽ More In order to find the most likely failure scenarios which may occur under certain given operation domain, critical-scenario-based test is supposed as an effective and widely used method, which gives suggestions for designers to improve the developing algorithm. However, for the state of art, critical-scenario generation approaches commonly utilize random-search or reinforcement learning methods to generate series of scenarios for a specific algorithm, which takes amounts of computing resource for testing a developing target that is always changing, and inapplicable for testing a real-time system. In this paper, we proposed a real-time critical-scenario-generation (RTCSG) framework to address the above challenges. In our framework, an aggressive-driving algorithm is proposed in controlling the virtual agent vehicles, a specially designed cost function is presented to guide scenarios to evolve towards critical conditions, and a self-adaptive coefficient iteration is designed that enable the approach to operate successfully in different conditions. With our proposed method, the critical-scenarios can be directly generated for the target under test which is a black-box system, and the real-time critical-scenario test can be brought into reality. The simulation results show that our approach is able to obtain more critical scenarios in most conditions than current methods, with a higher stability of success. For a real-time testing, our approach improves the efficiency around 16 times. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 7pages,9 figures, 21 references

arXiv:2203.15941 [pdf, other]

Design of a Biomimetic Tactile Sensor for Material Classification

Authors: Kevin Dai, Xinyu Wang, Allison M. Rojas, Evan Harber, Yu Tian, Nicholas Paiva, Joseph Gnehm, Evan Schindewolf, Howie Choset, Victoria A. Webster-Wood, Lu Li

Abstract: Tactile sensing typically involves active exploration of unknown surfaces and objects, making it especially effective at processing the characteristics of materials and textures. A key property extracted by human tactile perception is surface roughness, which relies on measuring vibratory signals using the multi-layered fingertip structure. Existing robotic systems lack tactile sensors that are ab… ▽ More Tactile sensing typically involves active exploration of unknown surfaces and objects, making it especially effective at processing the characteristics of materials and textures. A key property extracted by human tactile perception is surface roughness, which relies on measuring vibratory signals using the multi-layered fingertip structure. Existing robotic systems lack tactile sensors that are able to provide high dynamic sensing ranges, perceive material properties, and maintain a low hardware cost. In this work, we introduce the reference design and fabrication procedure of a miniature and low-cost tactile sensor consisting of a biomimetic cutaneous structure, including the artificial fingerprint, dermis, epidermis, and an embedded magnet-sensor structure which serves as a mechanoreceptor for converting mechanical information to digital signals. The presented sensor is capable of detecting high-resolution magnetic field data through the Hall effect and creating high-dimensional time-frequency domain features for material texture classification. Additionally, we investigate the effects of different superficial sensor fingerprint patterns for classifying materials through both simulation and physical experimentation. After extracting time series and frequency domain features, we assess a k-nearest neighbors classifier for distinguishing between different materials. The results from our experiments show that our biomimetic tactile sensors with fingerprint ridges can classify materials with more than 8% higher accuracy and lower variability than ridge-less sensors. These results, along with the low cost and customizability of our sensor, demonstrate high potential for lowering the barrier to entry for a wide array of robotic applications, including model-less tactile sensing for texture classification, material inspection, and object recognition. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: To be published in ICRA 2022

arXiv:2202.11086 [pdf, other]

doi 10.1145/3491101.3519653

StickyLand: Breaking the Linear Presentation of Computational Notebooks

Authors: Zijie J. Wang, Katie Dai, W. Keith Edwards

Abstract: How can we better organize code in computational notebooks? Notebooks have become a popular tool among data scientists, as they seamlessly weave text and code together, supporting users to rapidly iterate and document code experiments. However, it is often challenging to organize code in notebooks, partially because there is a mismatch between the linear presentation of code and the non-linear pro… ▽ More How can we better organize code in computational notebooks? Notebooks have become a popular tool among data scientists, as they seamlessly weave text and code together, supporting users to rapidly iterate and document code experiments. However, it is often challenging to organize code in notebooks, partially because there is a mismatch between the linear presentation of code and the non-linear process of exploratory data analysis. We present StickyLand, a notebook extension for empowering users to freely organize their code in non-linear ways. With sticky cells that are always shown on the screen, users can quickly access their notes, instantly observe experiment results, and easily build interactive dashboards that support complex visual analytics. Case studies highlight how our tool can enhance notebook users's productivity and identify opportunities for future notebook designs. StickyLand is available at https://github.com/xiaohk/stickyland. △ Less

Submitted 22 February, 2022; originally announced February 2022.

Comments: Accepted at CHI 2022 (Late-Breaking Work). 7 pages, 6 figures. For a demo video, see https://youtu.be/OKaPmEBzEX0. For a live demo, visit https://zijie.wang/#stickyland-demo

arXiv:2108.03821 [pdf, other]

Video Annotation for Visual Tracking via Selection and Refinement

Authors: Kenan Dai, Jie Zhao, Lijun Wang, Dong Wang, Jianhua Li, Huchuan Lu, Xuesheng Qian, Xiaoyun Yang

Abstract: Deep learning based visual trackers entail offline pre-training on large volumes of video datasets with accurate bounding box annotations that are labor-expensive to achieve. We present a new framework to facilitate bounding box annotations for video sequences, which investigates a selection-and-refinement strategy to automatically improve the preliminary annotations generated by tracking algorith… ▽ More Deep learning based visual trackers entail offline pre-training on large volumes of video datasets with accurate bounding box annotations that are labor-expensive to achieve. We present a new framework to facilitate bounding box annotations for video sequences, which investigates a selection-and-refinement strategy to automatically improve the preliminary annotations generated by tracking algorithms. A temporal assessment network (T-Assess Net) is proposed which is able to capture the temporal coherence of target locations and select reliable tracking results by measuring their quality. Meanwhile, a visual-geometry refinement network (VG-Refine Net) is also designed to further enhance the selected tracking results by considering both target appearance and temporal geometry constraints, allowing inaccurate tracking results to be corrected. The combination of the above two networks provides a principled approach to ensure the quality of automatic video annotation. Experiments on large scale tracking benchmarks demonstrate that our method can deliver highly accurate bounding box annotations and significantly reduce human labor by 94.0%, yielding an effective means to further boost tracking performance with augmented training data. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: Accepted by ICCV2021

arXiv:2004.00305 [pdf, other]

High-Performance Long-Term Tracking with Meta-Updater

Authors: Kenan Dai, Yunhua Zhang, Dong Wang, Jianhua Li, Huchuan Lu, Xiaoyun Yang

Abstract: Long-term visual tracking has drawn increasing attention because it is much closer to practical applications than short-term tracking. Most top-ranked long-term trackers adopt the offline-trained Siamese architectures, thus, they cannot benefit from great progress of short-term trackers with online update. However, it is quite risky to straightforwardly introduce online-update-based trackers to so… ▽ More Long-term visual tracking has drawn increasing attention because it is much closer to practical applications than short-term tracking. Most top-ranked long-term trackers adopt the offline-trained Siamese architectures, thus, they cannot benefit from great progress of short-term trackers with online update. However, it is quite risky to straightforwardly introduce online-update-based trackers to solve the long-term problem, due to long-term uncertain and noisy observations. In this work, we propose a novel offline-trained Meta-Updater to address an important but unsolved problem: Is the tracker ready for updating in the current frame? The proposed meta-updater can effectively integrate geometric, discriminative, and appearance cues in a sequential manner, and then mine the sequential information with a designed cascaded LSTM module. Our meta-updater learns a binary output to guide the tracker's update and can be easily embedded into different trackers. This work also introduces a long-term tracking framework consisting of an online local tracker, an online verifier, a SiamRPN-based re-detector, and our meta-updater. Numerous experimental results on the VOT2018LT, VOT2019LT, OxUvALT, TLP, and LaSOT benchmarks show that our tracker performs remarkably better than other competing algorithms. Our project is available on the website: https://github.com/Daikenan/LTMU. △ Less

Submitted 1 April, 2020; originally announced April 2020.

arXiv:1912.07964 [pdf]

Nanoscale Microscopy Images Colorization Using Neural Networks

Authors: Israel Goytom, Qin Wang, Tianxiang Yu, Kunjie Dai, Kris Sankaran, Xinfei Zhou, Dongdong Lin

Abstract: Microscopy images are powerful tools and widely used in the majority of research areas, such as biology, chemistry, physics and materials fields by various microscopies (scanning electron microscope (SEM), atomic force microscope (AFM) and the optical microscope, et al.). However, most of the microscopy images are colorless due to the unique imaging mechanism. Though investigating on some popular… ▽ More Microscopy images are powerful tools and widely used in the majority of research areas, such as biology, chemistry, physics and materials fields by various microscopies (scanning electron microscope (SEM), atomic force microscope (AFM) and the optical microscope, et al.). However, most of the microscopy images are colorless due to the unique imaging mechanism. Though investigating on some popular solutions proposed recently about colorizing images, we notice the process of those methods are usually tedious, complicated, and time-consuming. In this paper, inspired by the achievement of machine learning algorithms on different science fields, we introduce two artificial neural networks for gray microscopy image colorization: An end-to-end convolutional neural network (CNN) with a pre-trained model for feature extraction and a pixel-to-pixel neural style transfer convolutional neural network (NST-CNN), which can colorize gray microscopy images with semantic information learned from a user-provided colorful image at inference time. The results demonstrate that our algorithm not only can colorize the microscopy images under complex circumstances precisely but also make the color naturally according to the training of a massive number of nature images with proper hue and saturation. △ Less

Submitted 22 February, 2020; v1 submitted 17 December, 2019; originally announced December 2019.

Comments: 27 pages, 6 figures

arXiv:1505.00989 [pdf]

doi 10.5121/csit.2015.50101

Scraping and Clustering Techniques for the Characterization of Linkedin Profiles

Authors: Kais Dai, Celia Gónzalez Nespereira, Ana Fernández Vilas, Rebeca P. Díaz Redondo

Abstract: The socialization of the web has undertaken a new dimension after the emergence of the Online Social Networks (OSN) concept. The fact that each Internet user becomes a potential content creator entails managing a big amount of data. This paper explores the most popular professional OSN: LinkedIn. A scraping technique was implemented to get around 5 Million public profiles. The application of natur… ▽ More The socialization of the web has undertaken a new dimension after the emergence of the Online Social Networks (OSN) concept. The fact that each Internet user becomes a potential content creator entails managing a big amount of data. This paper explores the most popular professional OSN: LinkedIn. A scraping technique was implemented to get around 5 Million public profiles. The application of natural language processing techniques (NLP) to classify the educational background and to cluster the professional background of the collected profiles led us to provide some insights about this OSN's users and to evaluate the relationships between educational degrees and professional careers. △ Less

Submitted 5 May, 2015; originally announced May 2015.

Comments: In proceedings of the Fourth International Conference on Information Technology Convergence and Services (ITCS 2015), pp. 1-15, January 2015, Zurich(Switzerland)

Showing 1–21 of 21 results for author: Dai, K