-
Automating Urban Soundscape Enhancements with AI: In-situ Assessment of Quality and Restorativeness in Traffic-Exposed Residential Areas
Authors:
Bhan Lam,
Zhen-Ting Ong,
Kenneth Ooi,
Wen-Hui Ong,
Trevor Wong,
Karn N. Watcharasupat,
Vanessa Boey,
Irene Lee,
Joo Young Hong,
Jian Kang,
Kar Fye Alvin Lee,
Georgios Christopoulos,
Woon-Seng Gan
Abstract:
Formalized in ISO 12913, the "soundscape" approach is a paradigmatic shift towards perception-based urban sound management, aiming to alleviate the substantial socioeconomic costs of noise pollution to advance the United Nations Sustainable Development Goals. Focusing on traffic-exposed outdoor residential sites, we implemented an automatic masker selection system (AMSS) utilizing natural sounds t…
▽ More
Formalized in ISO 12913, the "soundscape" approach is a paradigmatic shift towards perception-based urban sound management, aiming to alleviate the substantial socioeconomic costs of noise pollution to advance the United Nations Sustainable Development Goals. Focusing on traffic-exposed outdoor residential sites, we implemented an automatic masker selection system (AMSS) utilizing natural sounds to mask (or augment) traffic soundscapes. We employed a pre-trained AI model to automatically select the optimal masker and adjust its playback level, adapting to changes over time in the ambient environment to maximize "Pleasantness", a perceptual dimension of soundscape quality in ISO 12913. Our validation study involving ($N=68$) residents revealed a significant 14.6 % enhancement in "Pleasantness" after intervention, correlating with increased restorativeness and positive affect. Perceptual enhancements at the traffic-exposed site matched those at a quieter control site with 6 dB(A) lower $L_\text{A,eq}$ and road traffic noise dominance, affirming the efficacy of AMSS as a soundscape intervention, while streamlining the labour-intensive assessment of "Pleasantness" with probabilistic AI prediction.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Enabling Multicast Transmission for Spatio-Temporally Asynchronous User Requests in Wireless Environments
Authors:
Hojung Lee,
Jun-Pyo Hong,
Wan Choi
Abstract:
The surge in wireless devices and data traffic volume necessitates more efficient transmission methods. Multicasting has garnered consistent attention as a means to fulfill the increasing demand for more efficient data transmission methods. Nevertheless, leveraging multicast wireless networks for spatio-temporally asynchronous data requests poses challenges. In this context, this paper introduces…
▽ More
The surge in wireless devices and data traffic volume necessitates more efficient transmission methods. Multicasting has garnered consistent attention as a means to fulfill the increasing demand for more efficient data transmission methods. Nevertheless, leveraging multicast wireless networks for spatio-temporally asynchronous data requests poses challenges. In this context, this paper introduces a new multicast mechanism called \emph{set-up based merged multicast (SMMC)} to minimize the delivery time of the requested file in wireless networks by considering the uncertainties inherent in wireless channels. The proposed mechanism comprises two phases. The first phase involves gathering asynchronous requests for a file from users experiencing diverse channel conditions. During this phase, packets of the requested file are transmitted individually in unicast mode within a specified set-up time. Following this, the second phase initiates multicast transmission, which sequentially handles the remaining packets of the file in multicast mode. In the proposed mechanism, we optimize the set-up time and transmission rates of both unicast and multicast modes to minimize the expected file delivery time by jointly taking into account the statistical characteristics of wireless channels, users' locations, and file popularity. Additionally, we also delve into a \emph{fine-tuned SMMC} by utilizing posterior information on the multicast group size and further improve the performance. Our performance evaluations reveal that the proposed SMMC outperforms conventional unicast methods, especially with high-demand data.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Using Photoplethysmography to Detect Real-time Blood Pressure Changes with a Calibration-free Deep Learning Model
Authors:
Jingyuan Hong,
Manasi Nandi,
Weiwei Jin,
Jordi Alastruey
Abstract:
Blood pressure (BP) changes are linked to individual health status in both clinical and non-clinical settings. This study developed a deep learning model to classify systolic (SBP), diastolic (DBP), and mean (MBP) BP changes using photoplethysmography (PPG) waveforms. Data from the Vital Signs Database (VitalDB) comprising 1,005 ICU patients with synchronized PPG and BP recordings was used. BP cha…
▽ More
Blood pressure (BP) changes are linked to individual health status in both clinical and non-clinical settings. This study developed a deep learning model to classify systolic (SBP), diastolic (DBP), and mean (MBP) BP changes using photoplethysmography (PPG) waveforms. Data from the Vital Signs Database (VitalDB) comprising 1,005 ICU patients with synchronized PPG and BP recordings was used. BP changes were categorized into three labels: Spike (increase above a threshold), Stable (change within a plus or minus threshold), and Dip (decrease below a threshold). Four time-series classification models were studied: multi-layer perceptron, convolutional neural network, residual network, and Encoder. A subset of 500 patients was randomly selected for training and validation, ensuring a uniform distribution across BP change labels. Two test datasets were compiled: Test-I (n=500) with a uniform distribution selection process, and Test-II (n=5) without. The study also explored the impact of including second-deviation PPG (sdPPG) waveforms as additional input information. The Encoder model with a Softmax weighting process using both PPG and sdPPG waveforms achieved the highest detection accuracy--exceeding 71.3% and 85.4% in Test-I and Test-II, respectively, with thresholds of 30 mmHg for SBP, 15 mmHg for DBP, and 20 mmHg for MBP. Corresponding F1-scores were over 71.8% and 88.5%. These findings confirm that PPG waveforms are effective for real-time monitoring of BP changes in ICU settings and suggest potential for broader applications.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications
Authors:
Aydin Zaboli,
Seong Lok Choi,
Tai-Jin Song,
Junho Hong
Abstract:
Cybersecurity breaches in digital substations can pose significant challenges to the stability and reliability of power system operations. To address these challenges, defense and mitigation techniques are required. Identifying and detecting anomalies in information and communication technology (ICT) is crucial to ensure secure device interactions within digital substations. This paper proposes a…
▽ More
Cybersecurity breaches in digital substations can pose significant challenges to the stability and reliability of power system operations. To address these challenges, defense and mitigation techniques are required. Identifying and detecting anomalies in information and communication technology (ICT) is crucial to ensure secure device interactions within digital substations. This paper proposes a task-oriented dialogue (ToD) system for anomaly detection (AD) in datasets of multicast messages e.g., generic object oriented substation event (GOOSE) and sampled value (SV) in digital substations using large language models (LLMs). This model has a lower potential error and better scalability and adaptability than a process that considers the cybersecurity guidelines recommended by humans, known as the human-in-the-loop (HITL) process. Also, this methodology significantly reduces the effort required when addressing new cyber threats or anomalies compared with machine learning (ML) techniques, since it leaves the models complexity and precision unaffected and offers a faster implementation. These findings present a comparative assessment, conducted utilizing standard and advanced performance evaluation metrics for the proposed AD framework and the HITL process. To generate and extract datasets of IEC 61850 communications, a hardware-in-the-loop (HIL) testbed was employed.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Online Ecological Gearshift Strategy via Neural Network with Soft-Argmax Operator
Authors:
Xi Luo,
Shiying Dong,
Jinlong Hong,
Bingzhao Gao,
Hong Chen
Abstract:
This paper presents a neural network optimizer with soft-argmax operator to achieve an ecological gearshift strategy in real-time. The strategy is reformulated as the mixed-integer model predictive control (MIMPC) problem to minimize energy consumption. Then the outer convexification is introduced to transform integer variables into relaxed binary controls. To approximate binary solutions properly…
▽ More
This paper presents a neural network optimizer with soft-argmax operator to achieve an ecological gearshift strategy in real-time. The strategy is reformulated as the mixed-integer model predictive control (MIMPC) problem to minimize energy consumption. Then the outer convexification is introduced to transform integer variables into relaxed binary controls. To approximate binary solutions properly within training, the soft-argmax operator is applied to the neural network with the fact that all the operations of this scheme are differentiable. Moreover, this operator can help push the relaxed binary variables close to 0 or 1. To evaluate the strategy effect, we deployed it to a 2-speed electric vehicle (EV). In contrast to the mature solver Bonmin, our proposed method not only achieves similar energy-saving effects but also significantly reduces the solution time to meet real-time requirements. This results in a notable energy savings of 6.02% compared to the rule-based method.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Reliable long timescale decision-directed channel estimation for OFDM system
Authors:
Xun Wang,
Xin Xie,
Cunqing Hua,
Jianan Hong,
Pengwenlong Gu
Abstract:
Decision-directed channel estimation (DDCE) is one kind of blind channel estimation method that tracks the channel blindly by an iterative algorithm without relying on the pilots, which can increase the utilization of wireless resource. However, one major problem of DDCE is the performance degradation caused by error accumulation during the tracking process. In this paper, we propose an reliable D…
▽ More
Decision-directed channel estimation (DDCE) is one kind of blind channel estimation method that tracks the channel blindly by an iterative algorithm without relying on the pilots, which can increase the utilization of wireless resource. However, one major problem of DDCE is the performance degradation caused by error accumulation during the tracking process. In this paper, we propose an reliable DDCE (RDDCE) scheme for an OFDM-based communication system in the time-varying deep fading environment. By combining the conventional DDCE and discrete Fourier transform (DFT) channel estimation method, the proposed RDDCE scheme selects the reliable estimated channels on the subcarriers which are less affected by deep fading, and then estimates the channel based on the selected subcarriers by an extended DFT channel estimation where the indices of selected subcarriers are not distributed evenly. Simulation results show that RRDCE can alleviate the performance degradation effectively, track the channel with high accuracy on a long time scale, and has good performance under time-varying and noisy channel conditions.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Capturing Cancer as Music: Cancer Mechanisms Expressed through Musification
Authors:
Rostyslav Hnatyshyn,
Jiayi Hong,
Ross Maciejewski,
Christopher Norby,
Carlo C. Maley
Abstract:
The development of cancer is difficult to express on a simple and intuitive level due to its complexity. Since cancer is so widespread, raising public awareness about its mechanisms can help those affected cope with its realities, as well as inspire others to make lifestyle adjustments and screen for the disease. Unfortunately, studies have shown that cancer literature is too technical for the gen…
▽ More
The development of cancer is difficult to express on a simple and intuitive level due to its complexity. Since cancer is so widespread, raising public awareness about its mechanisms can help those affected cope with its realities, as well as inspire others to make lifestyle adjustments and screen for the disease. Unfortunately, studies have shown that cancer literature is too technical for the general public to understand. We found that musification, the process of turning data into music, remains an unexplored avenue for conveying this information. We explore the pedagogical effectiveness of musification through the use of an algorithm that manipulates a piece of music in a manner analogous to the development of cancer. We conducted two lab studies and found that our approach is marginally more effective at promoting cancer literacy when accompanied by a text-based article than text-based articles alone.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Deep-learning-driven end-to-end metalens imaging
Authors:
Joonhyuk Seo,
Jaegang Jo,
Joohoon Kim,
Joonho Kang,
Chanik Kang,
Seongwon Moon,
Eunji Lee,
Jehyeong Hong,
Junsuk Rho,
Haejun Chung
Abstract:
Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic ab…
▽ More
Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic aberration, angular aberration, and a relatively low efficiency. In this study, a deep-learning-based image restoration framework is proposed to overcome these limitations and realize end-to-end metalens imaging, thereby achieving aberration-free full-color imaging for mass-produced metalenses with 10-mm diameter. Neural-network-assisted metalens imaging achieved a high resolution comparable to that of the ground truth image.
△ Less
Submitted 10 May, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
Machine Learning based Post Event Analysis for Cybersecurity of Cyber-Physical System
Authors:
Kuchan Park,
Junho Hong,
Wencong Su,
HyoJong Lee
Abstract:
As Information and Communication Technology (ICT) equipment continues to be integrated into power systems, issues related to cybersecurity are increasingly emerging. Particularly noteworthy is the transition to digital substations, which is shifting operations from traditional hardwired-based systems to communication-based Supervisory Control and Data Acquisition (SCADA) system operations. These c…
▽ More
As Information and Communication Technology (ICT) equipment continues to be integrated into power systems, issues related to cybersecurity are increasingly emerging. Particularly noteworthy is the transition to digital substations, which is shifting operations from traditional hardwired-based systems to communication-based Supervisory Control and Data Acquisition (SCADA) system operations. These changes in the power system have increased the vulnerability of the system to cyber-attacks and emphasized its importance. This paper proposes a machine learning (ML) based post event analysis of the power system in order to respond to these cybersecurity issues. An artificial neural network (ANN) and other ML models are trained using transient fault measurements and cyber-attack data on substations. The trained models can successfully distinguish between power system faults and cyber-attacks. Furthermore, the results of the proposed ML-based methods can also identify 10 different fault types and the location where the event occurred.
△ Less
Submitted 7 March, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Joint Design of Coding and Modulation for Digital Over-the-Air Computation
Authors:
Xin Xie,
Cunqinq Hua,
Jianan Hong,
Yuejun Wei
Abstract:
Due to its high communication efficiency, over-the-air computation (AirComp) has been expected to carry out various computing tasks in the next-generation wireless networks. However, up to now, most applications of AirComp are explored in the analog domain, which limits the capability of AirComp in resisting the complex wireless environment, not to mention to integrate the AirComp technique to the…
▽ More
Due to its high communication efficiency, over-the-air computation (AirComp) has been expected to carry out various computing tasks in the next-generation wireless networks. However, up to now, most applications of AirComp are explored in the analog domain, which limits the capability of AirComp in resisting the complex wireless environment, not to mention to integrate the AirComp technique to the existing universal communication standards, most of which are based on the digital system. In this paper, we propose a joint design of channel coding and digital modulation for digital AirComp transmission to attempt to reinforce the foundation for the application of AirComp in the digital system. Specifically, we first propose a non-binary LDPC-based channel coding scheme to enhance the error-correction capability of AirComp. Then, a digital modulation scheme is proposed to achieve the number summation from multiple transmitters via the lattice coding technique. We also provide simulation results to demonstrate the feasibility and the performance of the proposed design.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
ChatGPT and Other Large Language Models for Cybersecurity of Smart Grid Applications
Authors:
Aydin Zaboli,
Seong Lok Choi,
Tai-Jin Song,
Junho Hong
Abstract:
Cybersecurity breaches targeting electrical substations constitute a significant threat to the integrity of the power grid, necessitating comprehensive defense and mitigation strategies. Any anomaly in information and communication technology (ICT) should be detected for secure communications between devices in digital substations. This paper proposes large language models (LLM), e.g., ChatGPT, fo…
▽ More
Cybersecurity breaches targeting electrical substations constitute a significant threat to the integrity of the power grid, necessitating comprehensive defense and mitigation strategies. Any anomaly in information and communication technology (ICT) should be detected for secure communications between devices in digital substations. This paper proposes large language models (LLM), e.g., ChatGPT, for the cybersecurity of IEC 61850-based digital substation communications. Multicast messages such as generic object oriented system event (GOOSE) and sampled value (SV) are used for case studies. The proposed LLM-based cybersecurity framework includes, for the first time, data pre-processing of communication systems and human-in-the-loop (HITL) training (considering the cybersecurity guidelines recommended by humans). The results show a comparative analysis of detected anomaly data carried out based on the performance evaluation metrics for different LLMs. A hardware-in-the-loop (HIL) testbed is used to generate and extract dataset of IEC 61850 communications.
△ Less
Submitted 25 February, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model
Authors:
Joanna Hong,
Se Jin Park,
Yong Man Ro
Abstract:
We present a novel approach to multilingual audio-visual speech recognition tasks by introducing a single model on a multilingual dataset. Motivated by a human cognitive system where humans can intuitively distinguish different languages without any conscious effort or guidance, we propose a model that can capture which language is given as an input speech by distinguishing the inherent similariti…
▽ More
We present a novel approach to multilingual audio-visual speech recognition tasks by introducing a single model on a multilingual dataset. Motivated by a human cognitive system where humans can intuitively distinguish different languages without any conscious effort or guidance, we propose a model that can capture which language is given as an input speech by distinguishing the inherent similarities and differences between languages. To do so, we design a prompt fine-tuning technique into the largely pre-trained audio-visual representation model so that the network can recognize the language class as well as the speech with the corresponding language. Our work contributes to developing robust and efficient multilingual audio-visual speech recognition systems, reducing the need for language-specific models.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion
Authors:
Se Jin Park,
Joanna Hong,
Minsu Kim,
Yong Man Ro
Abstract:
Speech-driven 3D facial animation has gained significant attention for its ability to create realistic and expressive facial animations in 3D space based on speech. Learning-based methods have shown promising progress in achieving accurate facial motion synchronized with speech. However, one-to-many nature of speech-to-3D facial synthesis has not been fully explored: while the lip accurately synch…
▽ More
Speech-driven 3D facial animation has gained significant attention for its ability to create realistic and expressive facial animations in 3D space based on speech. Learning-based methods have shown promising progress in achieving accurate facial motion synchronized with speech. However, one-to-many nature of speech-to-3D facial synthesis has not been fully explored: while the lip accurately synchronizes with the speech content, other facial attributes beyond speech-related motions are variable with respect to the speech. To account for the potential variance in the facial attributes within a single speech, we propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis. DF-3DFace captures the complex one-to-many relationships between speech and 3D face based on diffusion. It concurrently achieves aligned lip motion by exploiting audio-mesh synchronization and masked conditioning. Furthermore, the proposed method jointly models identity and pose in addition to facial motions so that it can generate 3D face animation without requiring a reference identity mesh and produce natural head poses. We contribute a new large-scale 3D facial mesh dataset, 3D-HDTF to enable the synthesis of variations in identities, poses, and facial motions of 3D face mesh. Extensive experiments demonstrate that our method successfully generates highly variable facial shapes and motions from speech and simultaneously achieves more realistic facial animation than the state-of-the-art methods.
△ Less
Submitted 23 August, 2023;
originally announced October 2023.
-
Recent Advances in Path Integral Control for Trajectory Optimization: An Overview in Theoretical and Algorithmic Perspectives
Authors:
Muhammad Kazim,
JunGee Hong,
Min-Gyeom Kim,
Kwang-Ki K. Kim
Abstract:
This paper presents a tutorial overview of path integral (PI) control approaches for stochastic optimal control and trajectory optimization. We concisely summarize the theoretical development of path integral control to compute a solution for stochastic optimal control and provide algorithmic descriptions of the cross-entropy (CE) method, an open-loop controller using the receding horizon scheme k…
▽ More
This paper presents a tutorial overview of path integral (PI) control approaches for stochastic optimal control and trajectory optimization. We concisely summarize the theoretical development of path integral control to compute a solution for stochastic optimal control and provide algorithmic descriptions of the cross-entropy (CE) method, an open-loop controller using the receding horizon scheme known as the model predictive path integral (MPPI), and a parameterized state feedback controller based on the path integral control theory. We discuss policy search methods based on path integral control, efficient and stable sampling strategies, extensions to multi-agent decision-making, and MPPI for the trajectory optimization on manifolds. For tutorial demonstrations, some PI-based controllers are implemented in Python, MATLAB and ROS2/Gazebo simulations for trajectory optimization. The simulation frameworks and source codes are publicly available at https://github.com/INHA-Autonomous-Systems-Laboratory-ASL/An-Overview-on-Recent-Advances-in-Path-Integral-Control.
△ Less
Submitted 1 December, 2023; v1 submitted 21 September, 2023;
originally announced September 2023.
-
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
Authors:
Jeongsoo Choi,
Joanna Hong,
Yong Man Ro
Abstract:
Recent research has demonstrated impressive results in video-to-speech synthesis which involves reconstructing speech solely from visual input. However, previous works have struggled to accurately synthesize speech due to a lack of sufficient guidance for the model to infer the correct content with the appropriate sound. To resolve the issue, they have adopted an extra speaker embedding as a speak…
▽ More
Recent research has demonstrated impressive results in video-to-speech synthesis which involves reconstructing speech solely from visual input. However, previous works have struggled to accurately synthesize speech due to a lack of sufficient guidance for the model to infer the correct content with the appropriate sound. To resolve the issue, they have adopted an extra speaker embedding as a speaking style guidance from a reference auditory information. Nevertheless, it is not always possible to obtain the audio information from the corresponding video input, especially during the inference time. In this paper, we present a novel vision-guided speaker embedding extractor using a self-supervised pre-trained model and prompt tuning technique. In doing so, the rich speaker embedding information can be produced solely from input visual information, and the extra audio information is not necessary during the inference time. Using the extracted vision-guided speaker embedding representations, we further develop a diffusion-based video-to-speech synthesis model, so called DiffV2S, conditioned on those speaker embeddings and the visual representation extracted from the input video. The proposed DiffV2S not only maintains phoneme details contained in the input video frames, but also creates a highly intelligible mel-spectrogram in which the speaker identities of the multiple speakers are all preserved. Our experimental results show that DiffV2S achieves the state-of-the-art performance compared to the previous video-to-speech synthesis technique.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
TranssionADD: A multi-frame reinforcement based sequence tagging model for audio deepfake detection
Authors:
Jie Liu,
Zhiba Su,
Hui Huang,
Caiyan Wan,
Quanxiu Wang,
Jiangli Hong,
Benlai Tang,
Fengjie Zhu
Abstract:
Thanks to recent advancements in end-to-end speech modeling technology, it has become increasingly feasible to imitate and clone a user`s voice. This leads to a significant challenge in differentiating between authentic and fabricated audio segments. To address the issue of user voice abuse and misuse, the second Audio Deepfake Detection Challenge (ADD 2023) aims to detect and analyze deepfake spe…
▽ More
Thanks to recent advancements in end-to-end speech modeling technology, it has become increasingly feasible to imitate and clone a user`s voice. This leads to a significant challenge in differentiating between authentic and fabricated audio segments. To address the issue of user voice abuse and misuse, the second Audio Deepfake Detection Challenge (ADD 2023) aims to detect and analyze deepfake speech utterances. Specifically, Track 2, named the Manipulation Region Location (RL), aims to pinpoint the location of manipulated regions in audio, which can be present in both real and generated audio segments. We propose our novel TranssionADD system as a solution to the challenging problem of model robustness and audio segment outliers in the trace competition. Our system provides three unique contributions: 1) we adapt sequence tagging task for audio deepfake detection; 2) we improve model generalization by various data augmentation techniques; 3) we incorporate multi-frame detection (MFD) module to overcome limited representation provided by a single frame and use isolated-frame penalty (IFP) loss to handle outliers in segments. Our best submission achieved 2nd place in Track 2, demonstrating the effectiveness and robustness of our proposed system.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Deep learning based ECG segmentation for delineation of diverse arrhythmias
Authors:
Chankyu Joung,
Mijin Kim,
Taejin Paik,
Seong-Ho Kong,
Seung-Young Oh,
Won Kyeong Jeon,
Jae-hu Jeon,
Joong-Sik Hong,
Wan-Joong Kim,
Woong Kook,
Myung-Jin Cha,
Otto van Koert
Abstract:
Accurate delineation of key waveforms in an ECG is a critical initial step in extracting relevant features to support the diagnosis and treatment of heart conditions. Although deep learning based methods using a segmentation model to locate the P, QRS, and T waves have shown promising results, their ability to handle signals exhibiting arrhythmia remains unclear. This study builds on existing rese…
▽ More
Accurate delineation of key waveforms in an ECG is a critical initial step in extracting relevant features to support the diagnosis and treatment of heart conditions. Although deep learning based methods using a segmentation model to locate the P, QRS, and T waves have shown promising results, their ability to handle signals exhibiting arrhythmia remains unclear. This study builds on existing research by introducing a U-Net-like segmentation model for ECG delineation, with a particular focus on diverse arrhythmias. For this purpose, we curate an internal dataset containing waveform boundary annotations for various arrhythmia types to train and validate our model. Our key contributions include identifying segmentation model failures in different arrhythmia types, developing a robust model using a diverse training set, achieving comparable performance on benchmark datasets, and introducing a classification guided strategy to reduce false P wave predictions for specific arrhythmias. This study advances deep learning based ECG delineation in the context of arrhythmias and highlights its challenges.
△ Less
Submitted 6 September, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
Numerical Investigation of Airborne Infection Risk in an Elevator Cabin under Different Ventilation Designs
Authors:
Ata Nazari,
Changchang Wang,
Ruichen He,
Farzad Taghizadeh-Hesary,
Jiarong Hong
Abstract:
Airborne transmission of SARS-CoV-2 via virus-laden aerosols in enclosed spaces poses a significant concern. Elevators, commonly utilized enclosed spaces in modern tall buildings, present a challenge as the impact of varying heating, ventilation, and air conditioning (HVAC) systems on virus transmission within these cabins remains unclear. In this study, we employ computational modeling to examine…
▽ More
Airborne transmission of SARS-CoV-2 via virus-laden aerosols in enclosed spaces poses a significant concern. Elevators, commonly utilized enclosed spaces in modern tall buildings, present a challenge as the impact of varying heating, ventilation, and air conditioning (HVAC) systems on virus transmission within these cabins remains unclear. In this study, we employ computational modeling to examine aerosol transmission within an elevator cabin outfitted with diverse HVAC systems. Using a transport equation, we model aerosol concentration and assess infection risk distribution across passengers' breathing zones. We calculate particle removal efficiency for each HVAC design and introduce a suppression effect criterion to evaluate the effectiveness of the HVAC systems. Our findings reveal that mixing ventilation, featuring both inlet and outlet at the ceiling, proves most efficient in reducing particle spread, achieving a maximum removal efficiency of 79.40% during the exposure time. Conversely, the stratum ventilation model attains a mere removal efficiency of 3.97%. These results underscore the importance of careful HVAC system selection in mitigating the risk of SARS-CoV-2 transmission within elevator cabins.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
Authors:
Joanna Hong,
Minsu Kim,
Jeongsoo Choi,
Yong Man Ro
Abstract:
This paper deals with Audio-Visual Speech Recognition (AVSR) under multimodal input corruption situations where audio inputs and visual inputs are both corrupted, which is not well addressed in previous research directions. Previous studies have focused on how to complement the corrupted audio inputs with the clean visual inputs with the assumption of the availability of clean visual inputs. Howev…
▽ More
This paper deals with Audio-Visual Speech Recognition (AVSR) under multimodal input corruption situations where audio inputs and visual inputs are both corrupted, which is not well addressed in previous research directions. Previous studies have focused on how to complement the corrupted audio inputs with the clean visual inputs with the assumption of the availability of clean visual inputs. However, in real life, clean visual inputs are not always accessible and can even be corrupted by occluded lip regions or noises. Thus, we firstly analyze that the previous AVSR models are not indeed robust to the corruption of multimodal input streams, the audio and the visual inputs, compared to uni-modal models. Then, we design multimodal input corruption modeling to develop robust AVSR models. Lastly, we propose a novel AVSR framework, namely Audio-Visual Reliability Scoring module (AV-RelScore), that is robust to the corrupted multimodal inputs. The AV-RelScore can determine which input modal stream is reliable or not for the prediction and also can exploit the more reliable streams in prediction. The effectiveness of the proposed method is evaluated with comprehensive experiments on popular benchmark databases, LRS2 and LRS3. We also show that the reliability scores obtained by AV-RelScore well reflect the degree of corruption and make the proposed model focus on the reliable multimodal representations.
△ Less
Submitted 20 March, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Securing Safety in Collaborative Cyber-Physical Systems through Fault Criticality Analysis
Authors:
Manzoor Hussain,
Nazakat Ali,
Jang-Eui Hong
Abstract:
Collaborative Cyber-Physical Systems (CCPS) are systems that contain tightly coupled physical and cyber components, massively interconnected subsystems, and collaborate to achieve a common goal. The safety of a single Cyber-Physical System (CPS) can be achieved by following the safety standards such as ISO 26262 and IEC 61508 or by applying hazard analysis techniques. However, due to the complex,…
▽ More
Collaborative Cyber-Physical Systems (CCPS) are systems that contain tightly coupled physical and cyber components, massively interconnected subsystems, and collaborate to achieve a common goal. The safety of a single Cyber-Physical System (CPS) can be achieved by following the safety standards such as ISO 26262 and IEC 61508 or by applying hazard analysis techniques. However, due to the complex, highly interconnected, heterogeneous, and collaborative nature of CCPS, a fault in one CPS's components can trigger many other faults in other collaborating CPSs. Therefore, a safety assurance technique based on fault criticality analysis would require to ensure safety in CCPS. This paper presents a Fault Criticality Matrix (FCM) implemented in our tool called CPSTracer, which contains several data such as identified fault, fault criticality, safety guard, etc. The proposed FCM is based on composite hazard analysis and content-based relationships among the hazard analysis artifacts, and ensures that the safety guard controls the identified faults at design time; thus, we can effectively manage and control the fault at the design phase to ensure the safe development of CPSs. To validate our approach, we introduce a case study on the Platooning system (a collaborative CPS). We perform the criticality analysis of the Platooning system using FCM in our developed tool. After the detailed fault criticality analysis, we investigate the results to check the appropriateness and effectiveness with two research questions. Also, by performing simulation for the Platooning, we showed that the rate of collision of the Platooning system without using FCM was quite high as compared to the rate of collisions of the system after analyzing the fault criticality using FCM.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
Authors:
Minsu Kim,
Joanna Hong,
Yong Man Ro
Abstract:
Recent studies have shown impressive performance in Lip-to-speech synthesis that aims to reconstruct speech from visual information alone. However, they have been suffering from synthesizing accurate speech in the wild, due to insufficient supervision for guiding the model to infer the correct content. Distinct from the previous methods, in this paper, we develop a powerful Lip2Speech method that…
▽ More
Recent studies have shown impressive performance in Lip-to-speech synthesis that aims to reconstruct speech from visual information alone. However, they have been suffering from synthesizing accurate speech in the wild, due to insufficient supervision for guiding the model to infer the correct content. Distinct from the previous methods, in this paper, we develop a powerful Lip2Speech method that can reconstruct speech with correct contents from the input lip movements, even in a wild environment. To this end, we design multi-task learning that guides the model using multimodal supervision, i.e., text and audio, to complement the insufficient word representations of acoustic feature reconstruction loss. Thus, the proposed framework brings the advantage of synthesizing speech containing the right content of multiple speakers with unconstrained sentences. We verify the effectiveness of the proposed method using LRS2, LRS3, and LRW datasets.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
In situ Biological Particle Analyzer based on Digital Inline Holography
Authors:
Delaney Sanborn,
Ruichen He,
Lei Feng,
Jiarong Hong
Abstract:
Obtaining in situ measurements of biological microparticles is crucial for both scientific research and numerous industrial applications (e.g., early detection of harmful algal blooms, monitoring yeast during fermentation). However, existing methods are limited to offer timely diagnostics of these particles with sufficient accuracy and information. Here, we introduce a novel method for real-time,…
▽ More
Obtaining in situ measurements of biological microparticles is crucial for both scientific research and numerous industrial applications (e.g., early detection of harmful algal blooms, monitoring yeast during fermentation). However, existing methods are limited to offer timely diagnostics of these particles with sufficient accuracy and information. Here, we introduce a novel method for real-time, in situ analysis using machine learning assisted digital inline holography (DIH). Our machine learning model uses a customized YOLO v5 architecture specialized for the detection and classification of small biological particles. We demonstrate the effectiveness of our method in the analysis of 10 plankton species with equivalent high accuracy and significantly reduced processing time compared to previous methods. We also applied our method to differentiate yeast cells under four metabolic states and from two strains. Our results show that the proposed method can accurately detect and differentiate cellular and subcellular features related to metabolic states and strains. This study demonstrates the potential of machine learning driven DIH approach as a sensitive and versatile diagnostic tool for real-time, in situ analysis of both biotic and abiotic particles. This method can be readily deployed in a distributive manner for scientific research and manufacturing on an industrial scale.
△ Less
Submitted 14 January, 2023;
originally announced January 2023.
-
Single Cell Training on Architecture Search for Image Denoising
Authors:
Bokyeung Lee,
Kyungdeuk Ko,
Jonghwan Hong,
Hanseok Ko
Abstract:
Neural Architecture Search (NAS) for automatically finding the optimal network architecture has shown some success with competitive performances in various computer vision tasks. However, NAS in general requires a tremendous amount of computations. Thus reducing computational cost has emerged as an important issue. Most of the attempts so far has been based on manual approaches, and often the arch…
▽ More
Neural Architecture Search (NAS) for automatically finding the optimal network architecture has shown some success with competitive performances in various computer vision tasks. However, NAS in general requires a tremendous amount of computations. Thus reducing computational cost has emerged as an important issue. Most of the attempts so far has been based on manual approaches, and often the architectures developed from such efforts dwell in the balance of the network optimality and the search cost. Additionally, recent NAS methods for image restoration generally do not consider dynamic operations that may transform dimensions of feature maps because of the dimensionality mismatch in tensor calculations. This can greatly limit NAS in its search for optimal network structure. To address these issues, we re-frame the optimal search problem by focusing at component block level. From previous work, it's been shown that an effective denoising block can be connected in series to further improve the network performance. By focusing at block level, the search space of reinforcement learning becomes significantly smaller and evaluation process can be conducted more rapidly. In addition, we integrate an innovative dimension matching modules for dealing with spatial and channel-wise mismatch that may occur in the optimal design search. This allows much flexibility in optimal network search within the cell block. With these modules, then we employ reinforcement learning in search of an optimal image denoising network at a module level. Computational efficiency of our proposed Denoising Prior Neural Architecture Search (DPNAS) was demonstrated by having it complete an optimal architecture search for an image restoration task by just one day with a single GPU.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Cyber-Attack Event Analysis for EV Charging Stations
Authors:
Mansi Girdhar,
Junho Hong,
Yongsik You,
Tai-jin Song,
Manimaran Govindarasu
Abstract:
Safe and secure electric vehicle charging stations (EVCSs) are important in smart transportation infrastructure. The prevalence of EVCSs has rapidly increased over time in response to the rising demand for EV charging. However, developments in information and communication technologies (ICT) have made the cyber-physical system (CPS) of EVCSs susceptible to cyber-attacks, which might destabilize th…
▽ More
Safe and secure electric vehicle charging stations (EVCSs) are important in smart transportation infrastructure. The prevalence of EVCSs has rapidly increased over time in response to the rising demand for EV charging. However, developments in information and communication technologies (ICT) have made the cyber-physical system (CPS) of EVCSs susceptible to cyber-attacks, which might destabilize the infrastructure of the electric grid as well as the environment for charging. This study suggests a 5Ws \& 1H-based investigation approach to deal with cyber-attack-related incidents due to the incapacity of the current investigation frameworks to comprehend and handle these mishaps. Also, a stochastic anomaly detection system (ADS) is proposed to identify the anomalies, abnormal activities, and unusual operations of the station entities as a post cyber event analysis.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
Authors:
Se Jin Park,
Minsu Kim,
Joanna Hong,
Jeongsoo Choi,
Yong Man Ro
Abstract:
The challenge of talking face generation from speech lies in aligning two different modal information, audio and video, such that the mouth region corresponds to input audio. Previous methods either exploit audio-visual representation learning or leverage intermediate structural information such as landmarks and 3D models. However, they struggle to synthesize fine details of the lips varying at th…
▽ More
The challenge of talking face generation from speech lies in aligning two different modal information, audio and video, such that the mouth region corresponds to input audio. Previous methods either exploit audio-visual representation learning or leverage intermediate structural information such as landmarks and 3D models. However, they struggle to synthesize fine details of the lips varying at the phoneme level as they do not sufficiently provide visual information of the lips at the video synthesis step. To overcome this limitation, our work proposes Audio-Lip Memory that brings in visual information of the mouth region corresponding to input audio and enforces fine-grained audio-visual coherence. It stores lip motion features from sequential ground truth images in the value memory and aligns them with corresponding audio features so that they can be retrieved using audio input at inference time. Therefore, using the retrieved lip motion features as visual hints, it can easily correlate audio with visual dynamics in the synthesis step. By analyzing the memory, we demonstrate that unique lip features are stored in each memory slot at the phoneme level, capturing subtle lip motion based on memory addressing. In addition, we introduce visual-visual synchronization loss which can enhance lip-syncing performance when used along with audio-visual synchronization loss in our model. Extensive experiments are performed to verify that our method generates high-quality video with mouth shapes that best align with the input audio, outperforming previous state-of-the-art methods.
△ Less
Submitted 2 November, 2022; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Progressively refined deep joint registration segmentation (ProRSeg) of gastrointestinal organs at risk: Application to MRI and cone-beam CT
Authors:
Jue Jiang,
Jun Hong,
Kathryn Tringale,
Marsha Reyngold,
Christopher Crane,
Neelam Tyagi,
Harini Veeraraghavan
Abstract:
Method: ProRSeg was trained using 5-fold cross-validation with 110 T2-weighted MRI acquired at 5 treatment fractions from 10 different patients, taking care that same patient scans were not placed in training and testing folds. Segmentation accuracy was measured using Dice similarity coefficient (DSC) and Hausdorff distance at 95th percentile (HD95). Registration consistency was measured using coe…
▽ More
Method: ProRSeg was trained using 5-fold cross-validation with 110 T2-weighted MRI acquired at 5 treatment fractions from 10 different patients, taking care that same patient scans were not placed in training and testing folds. Segmentation accuracy was measured using Dice similarity coefficient (DSC) and Hausdorff distance at 95th percentile (HD95). Registration consistency was measured using coefficient of variation (CV) in displacement of OARs. Ablation tests and accuracy comparisons against multiple methods were done. Finally, applicability of ProRSeg to segment cone-beam CT (CBCT) scans was evaluated on 80 scans using 5-fold cross-validation. Results: ProRSeg processed 3D volumes (128 $\times$ 192 $\times$ 128) in 3 secs on a NVIDIA Tesla V100 GPU. It's segmentations were significantly more accurate ($p<0.001$) than compared methods, achieving a DSC of 0.94 $\pm$0.02 for liver, 0.88$\pm$0.04 for large bowel, 0.78$\pm$0.03 for small bowel and 0.82$\pm$0.04 for stomach-duodenum from MRI. ProRSeg achieved a DSC of 0.72$\pm$0.01 for small bowel and 0.76$\pm$0.03 for stomach-duodenum from CBCT. ProRSeg registrations resulted in the lowest CV in displacement (stomach-duodenum $CV_{x}$: 0.75\%, $CV_{y}$: 0.73\%, and $CV_{z}$: 0.81\%; small bowel $CV_{x}$: 0.80\%, $CV_{y}$: 0.80\%, and $CV_{z}$: 0.68\%; large bowel $CV_{x}$: 0.71\%, $CV_{y}$ : 0.81\%, and $CV_{z}$: 0.75\%). ProRSeg based dose accumulation accounting for intra-fraction (pre-treatment to post-treatment MRI scan) and inter-fraction motion showed that the organ dose constraints were violated in 4 patients for stomach-duodenum and for 3 patients for small bowel. Study limitations include lack of independent testing and ground truth phantom datasets to measure dose accumulation accuracy.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Machine Learning-Enabled Cyber Attack Prediction and Mitigation for EV Charging Stations
Authors:
Mansi Girdhar,
Junho Hong,
Yongsik Yoo,
Tai-Jin Song
Abstract:
Safe and reliable electric vehicle charging stations (EVCSs) have become imperative in an intelligent transportation infrastructure. Over the years, there has been a rapid increase in the deployment of EVCSs to address the upsurging charging demands. However, advances in information and communication technologies (ICT) have rendered this cyber-physical system (CPS) vulnerable to suffering cyber th…
▽ More
Safe and reliable electric vehicle charging stations (EVCSs) have become imperative in an intelligent transportation infrastructure. Over the years, there has been a rapid increase in the deployment of EVCSs to address the upsurging charging demands. However, advances in information and communication technologies (ICT) have rendered this cyber-physical system (CPS) vulnerable to suffering cyber threats, thereby destabilizing the charging ecosystem and even the entire electric grid infrastructure. This paper develops an advanced cybersecurity framework, where STRIDE threat modeling is used to identify potential vulnerabilities in an EVCS. Further, the weighted attack defense tree approach is employed to create multiple attack scenarios, followed by developing Hidden Markov Model (HMM) and Partially Observable Monte-Carlo Planning (POMCP) algorithms for modeling the security attacks. Also, potential mitigation strategies are suggested for the identified threats.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Authors:
Joanna Hong,
Minsu Kim,
Daehun Yoo,
Yong Man Ro
Abstract:
This paper focuses on designing a noise-robust end-to-end Audio-Visual Speech Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature Enhancement module (V-CAFE) to enhance the input noisy audio speech with a help of audio-visual correspondence. The proposed V-CAFE is designed to capture the transition of lip movements, namely visual context and to generate a noise r…
▽ More
This paper focuses on designing a noise-robust end-to-end Audio-Visual Speech Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature Enhancement module (V-CAFE) to enhance the input noisy audio speech with a help of audio-visual correspondence. The proposed V-CAFE is designed to capture the transition of lip movements, namely visual context and to generate a noise reduction mask by considering the obtained visual context. Through context-dependent modeling, the ambiguity in viseme-to-phoneme mapping can be refined for mask generation. The noisy representations are masked out with the noise reduction mask resulting in enhanced audio features. The enhanced audio features are fused with the visual features and taken to an encoder-decoder model composed of Conformer and Transformer for speech recognition. We show the proposed end-to-end AVSR with the V-CAFE can further improve the noise-robustness of AVSR. The effectiveness of the proposed method is evaluated in noisy speech recognition and overlapped speech recognition experiments using the two largest audio-visual datasets, LRS2 and LRS3.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes
Authors:
Kenneth Ooi,
Zhen-Ting Ong,
Karn N. Watcharasupat,
Bhan Lam,
Joo Young Hong,
Woon-Seng Gan
Abstract:
Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which…
▽ More
Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which comprises a five-fold cross-validation set and independent test set totaling 25,440 unique subjective perceptual responses to augmented soundscapes presented as audio-visual stimuli. Each augmented soundscape is made by digitally adding "maskers" (bird, water, wind, traffic, construction, or silence) to urban soundscape recordings at fixed soundscape-to-masker ratios. Responses were then collected by asking participants to rate how pleasant, annoying, eventful, uneventful, vibrant, monotonous, chaotic, calm, and appropriate each augmented soundscape was, in accordance with ISO 12913-2:2018. Participants also provided relevant demographic information and completed standard psychological questionnaires. We perform exploratory and statistical analysis of the responses obtained to verify internal consistency and agreement with known results in the literature. Finally, we demonstrate the benchmarking capability of the dataset by training and comparing four baseline models for urban soundscape pleasantness: a low-parameter regression model, a high-parameter convolutional neural network, and two attention-based networks in the literature.
△ Less
Submitted 2 July, 2024; v1 submitted 3 July, 2022;
originally announced July 2022.
-
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
Authors:
Joanna Hong,
Minsu Kim,
Yong Man Ro
Abstract:
The goal of this work is to reconstruct speech from a silent talking face video. Recent studies have shown impressive performance on synthesizing speech from silent talking face videos. However, they have not explicitly considered on varying identity characteristics of different speakers, which place a challenge in the video-to-speech synthesis, and this becomes more critical in unseen-speaker set…
▽ More
The goal of this work is to reconstruct speech from a silent talking face video. Recent studies have shown impressive performance on synthesizing speech from silent talking face videos. However, they have not explicitly considered on varying identity characteristics of different speakers, which place a challenge in the video-to-speech synthesis, and this becomes more critical in unseen-speaker settings. Our approach is to separate the speech content and the visage-style from a given silent talking face video. By guiding the model to independently focus on modeling the two representations, we can obtain the speech of high intelligibility from the model even when the input video of an unseen subject is given. To this end, we introduce speech-visage selection that separates the speech content and the speaker identity from the visual features of the input video. The disentangled representations are jointly incorporated to synthesize speech through visage-style based synthesizer which generates speech by coating the visage-styles while maintaining the speech content. Thus, the proposed framework brings the advantage of synthesizing the speech containing the right content even with the silent talking face video of an unseen subject. We validate the effectiveness of the proposed framework on the GRID, TCD-TIMIT volunteer, and LRW datasets.
△ Less
Submitted 20 July, 2022; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Singapore Soundscape Site Selection Survey (S5): Identification of Characteristic Soundscapes of Singapore via Weighted k-means Clustering
Authors:
Kenneth Ooi,
Bhan Lam,
Joo Young Hong,
Karn N. Watcharasupat,
Zhen-Ting Ong,
Woon-Seng Gan
Abstract:
The ecological validity of soundscape studies usually rests on a choice of soundscapes that are representative of the perceptual space under investigation. For example, a soundscape pleasantness study might investigate locations with soundscapes ranging from "pleasant" to "annoying". The choice of soundscapes is typically researcher-led, but a participant-led process can reduce selection bias and…
▽ More
The ecological validity of soundscape studies usually rests on a choice of soundscapes that are representative of the perceptual space under investigation. For example, a soundscape pleasantness study might investigate locations with soundscapes ranging from "pleasant" to "annoying". The choice of soundscapes is typically researcher-led, but a participant-led process can reduce selection bias and improve result reliability. Hence, we propose a robust participant-led method to pinpoint characteristic soundscapes possessing arbitrary perceptual attributes. We validate our method by identifying Singaporean soundscapes spanning the perceptual quadrants generated from the "Pleasantness" and "Eventfulness" axes of the ISO 12913-2 circumplex model of soundscape perception, as perceived by local experts. From memory and experience, 67 participants first selected locations corresponding to each perceptual quadrant in each major planning region of Singapore. We then performed weighted k-means clustering on the selected locations, with weights for each location derived from previous frequencies and durations spent in each location by each participant. Weights hence acted as proxies for participant confidence. In total, 62 locations were thereby identified as suitable locations with characteristic soundscapes for further research utilizing the ISO 12913-2 perceptual quadrants. Audio-visual recordings and acoustic characterization of the soundscapes will be made in a future study.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Lip to Speech Synthesis with Visual Context Attentional GAN
Authors:
Minsu Kim,
Joanna Hong,
Yong Man Ro
Abstract:
In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis. Specifically, the proposed VCA-GAN synthesizes the speech from local lip visual features by finding a mapping function of viseme-to-phoneme, while global visual context is embedded into the intermed…
▽ More
In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis. Specifically, the proposed VCA-GAN synthesizes the speech from local lip visual features by finding a mapping function of viseme-to-phoneme, while global visual context is embedded into the intermediate layers of the generator to clarify the ambiguity in the mapping induced by homophene. To achieve this, a visual context attention module is proposed where it encodes global representations from the local visual features, and provides the desired global visual context corresponding to the given coarse speech representation to the generator through audio-visual attention. In addition to the explicit modelling of local and global visual representations, synchronization learning is introduced as a form of contrastive learning that guides the generator to synthesize a speech in sync with the given input lip movements. Extensive experiments demonstrate that the proposed VCA-GAN outperforms existing state-of-the-art and is able to effectively synthesize the speech from multi-speaker that has been barely handled in the previous works.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Authors:
Minsu Kim,
Joanna Hong,
Se Jin Park,
Yong Man Ro
Abstract:
In this paper, we introduce a novel audio-visual multi-modal bridging framework that can utilize both audio and visual information, even with uni-modal inputs. We exploit a memory network that stores source (i.e., visual) and target (i.e., audio) modal representations, where source modal representation is what we are given, and target modal representations are what we want to obtain from the memor…
▽ More
In this paper, we introduce a novel audio-visual multi-modal bridging framework that can utilize both audio and visual information, even with uni-modal inputs. We exploit a memory network that stores source (i.e., visual) and target (i.e., audio) modal representations, where source modal representation is what we are given, and target modal representations are what we want to obtain from the memory network. We then construct an associative bridge between source and target memories that considers the interrelationship between the two memories. By learning the interrelationship through the associative bridge, the proposed bridging framework is able to obtain the target modal representations inside the memory network, even with the source modal input only, and it provides rich information for its downstream tasks. We apply the proposed framework to two tasks: lip reading and speech reconstruction from silent video. Through the proposed associative bridge and modality-specific memories, each task knowledge is enriched with the recalled audio context, achieving state-of-the-art performance. We also verify that the associative bridge properly relates the source and target memories.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Eco-Coasting Strategies Using Road Grade Preview: Evaluation and Online Implementation Based on Mixed Integer Model Predictive Control
Authors:
Yongjun Yan,
Nan Li,
Jinlong Hong,
Bingzhao Gao,
Hong Chen,
Jing Sun,
Ziyou Song
Abstract:
Coasting has been widely used in the eco-driving guidelines to reduce fuel consumption by profiting from kinetic energy. However, the comprehensive comparison between different coasting strategies and online performance of the eco-coasting strategy using road grade preview are still unclear because of the oversimplification and the integer variable in the optimal control problems. Herein, two diff…
▽ More
Coasting has been widely used in the eco-driving guidelines to reduce fuel consumption by profiting from kinetic energy. However, the comprehensive comparison between different coasting strategies and online performance of the eco-coasting strategy using road grade preview are still unclear because of the oversimplification and the integer variable in the optimal control problems. Herein, two different coasting strategies (fuel cut-off and engine start/stop) are proposed to reveal the potential benefit of eco-coasting using the road grade preview. Engine drag torque and energy cost used for engine restart are considered in the modeling to give a fair evaluation of the offline and online performance. The offline performance of these two coasting methods is evaluated through dynamic programming (DP) under various driving scenarios with different slope profiles. Offline simulation shows that the engine start/stop method outperforms the fuel cut-off method in terms of fuel consumption and travel time by getting rid of the engine drag torque. Then, online performance of these two coasting methods is evaluated using Mixed Integer Model Predictive Control (MIMPC). A novel operational constraint on the minimum off steps is added in the MIMPC formulation to avoid frequent switch of the integer variables which represent the fuel cut-off and the engine start/stop mechanism. Simulation results show that, for both fuel cut-off and engine start/stop coasting methods, the MPC controller reduces fuel consumption to a level comparable to DP without sacrificing the travel time.
△ Less
Submitted 25 December, 2021; v1 submitted 14 November, 2021;
originally announced November 2021.
-
2020 CATARACTS Semantic Segmentation Challenge
Authors:
Imanol Luengo,
Maria Grammatikopoulou,
Rahim Mohammadi,
Chris Walsh,
Chinedu Innocent Nwoye,
Deepak Alapatt,
Nicolas Padoy,
Zhen-Liang Ni,
Chen-Chen Fan,
Gui-Bin Bian,
Zeng-Guang Hou,
Heonjin Ha,
Jiacheng Wang,
Haojie Wang,
Dong Guo,
Lu Wang,
Guotai Wang,
Mobarakol Islam,
Bharat Giddwani,
Ren Hongliang,
Theodoros Pissas,
Claudio Ravasio,
Martin Huber,
Jeremy Birch,
Joan M. Nunez Do Rio
, et al. (15 additional authors not shown)
Abstract:
Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presenc…
▽ More
Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set.
△ Less
Submitted 24 February, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Data-driven yaw misalignment correction for utility-scale wind turbines
Authors:
Linyue Gao,
Jiarong Hong
Abstract:
In recent years, wind turbine yaw misalignment that tends to degrade the turbine power production and impact the blade fatigue loads raises more attention along with the rapid development of large-scale wind turbines. The state-of-the-art correction methods require additional instruments such as LiDAR to provide the ground truths and are not suitable for long-term operation and large-scale impleme…
▽ More
In recent years, wind turbine yaw misalignment that tends to degrade the turbine power production and impact the blade fatigue loads raises more attention along with the rapid development of large-scale wind turbines. The state-of-the-art correction methods require additional instruments such as LiDAR to provide the ground truths and are not suitable for long-term operation and large-scale implementation due to the high costs. In the present study, we propose a framework that enables the effective and efficient detection and correction of static and dynamic yaw errors by using only turbine SCADA data, suitable for a low-cost regular inspection for large-scale wind farms in onshore, coastal, and offshore sites. This framework includes a short-period data collection of the turbine operating under multiple static yaw errors, a data mining correction for the static yaw error, and ultra-short-term dynamic yaw error forecasts with machine learning algorithms. Three regression algorithms, i.e., linear, support vector machine, and random forest, and a hybrid model based on the average prediction of the three, have been tested for dynamic yaw error prediction and compared using the field measurement data from a 2.5 MW turbine. For the data collected in the present study, the hybrid method shows the best performance and can reduce total yaw error by up to 85% (on average of 71%) compared to the cases without static and dynamic yaw error corrections. In addition, we have tested the transferability of the proposed method in the application of detecting other static and dynamic yaw errors.
△ Less
Submitted 18 September, 2021;
originally announced September 2021.
-
Unsupervised domain adaptation for cross-modality liver segmentation via joint adversarial learning and self-learning
Authors:
Jin Hong,
Simon Chun-Ho Yu,
Weitian Chen
Abstract:
Liver segmentation on images acquired using computed tomography (CT) and magnetic resonance imaging (MRI) plays an important role in clinical management of liver diseases. Compared to MRI, CT images of liver are more abundant and readily available. However, MRI can provide richer quantitative information of the liver compared to CT. Thus, it is desirable to achieve unsupervised domain adaptation f…
▽ More
Liver segmentation on images acquired using computed tomography (CT) and magnetic resonance imaging (MRI) plays an important role in clinical management of liver diseases. Compared to MRI, CT images of liver are more abundant and readily available. However, MRI can provide richer quantitative information of the liver compared to CT. Thus, it is desirable to achieve unsupervised domain adaptation for transferring the learned knowledge from the source domain containing labeled CT images to the target domain containing unlabeled MR images. In this work, we report a novel unsupervised domain adaptation framework for cross-modality liver segmentation via joint adversarial learning and self-learning. We propose joint semantic-aware and shape-entropy-aware adversarial learning with post-situ identification manner to implicitly align the distribution of task-related features extracted from the target domain with those from the source domain. In proposed framework, a network is trained with the above two adversarial losses in an unsupervised manner, and then a mean completer of pseudo-label generation is employed to produce pseudo-labels to train the next network (desired model). Additionally, semantic-aware adversarial learning and two self-learning methods, including pixel-adaptive mask refinement and student-to-partner learning, are proposed to train the desired model. To improve the robustness of the desired model, a low-signal augmentation function is proposed to transform MRI images as the input of the desired model to handle hard samples. Using the public data sets, our experiments demonstrated the proposed unsupervised domain adaptation framework reached four supervised learning methods with a Dice score 0.912 plus or minus 0.037 (mean plus or minus standard deviation).
△ Less
Submitted 24 February, 2022; v1 submitted 12 September, 2021;
originally announced September 2021.
-
Blind Image Decomposition
Authors:
Junlin Han,
Weihao Li,
Pengfei Fang,
Chunyi Sun,
Jie Hong,
Mohammad Ali Armin,
Lars Petersson,
Hongdong Li
Abstract:
We propose and study a novel task named Blind Image Decomposition (BID), which requires separating a superimposed image into constituent underlying images in a blind setting, that is, both the source components involved in mixing as well as the mixing mechanism are unknown. For example, rain may consist of multiple components, such as rain streaks, raindrops, snow, and haze. Rainy images can be tr…
▽ More
We propose and study a novel task named Blind Image Decomposition (BID), which requires separating a superimposed image into constituent underlying images in a blind setting, that is, both the source components involved in mixing as well as the mixing mechanism are unknown. For example, rain may consist of multiple components, such as rain streaks, raindrops, snow, and haze. Rainy images can be treated as an arbitrary combination of these components, some of them or all of them. How to decompose superimposed images, like rainy images, into distinct source components is a crucial step toward real-world vision systems. To facilitate research on this new task, we construct multiple benchmark datasets, including mixed image decomposition across multiple domains, real-scenario deraining, and joint shadow/reflection/watermark removal. Moreover, we propose a simple yet general Blind Image Decomposition Network (BIDeN) to serve as a strong baseline for future work. Experimental results demonstrate the tenability of our benchmarks and the effectiveness of BIDeN.
△ Less
Submitted 18 July, 2022; v1 submitted 25 August, 2021;
originally announced August 2021.
-
Power Management of Nanogrid Cluster with P2P Electricity Trading Based on Future Trends of Load Demand and PV Power Production
Authors:
Sangkeum Lee,
Hojun Jin,
Luiz Felipe Vecchietti,
Junhee Hong,
Ki-Bum Park,
Dongsoo Har
Abstract:
This paper presents the power management of the nanogrid clusters assisted by a novel peer-to-peer(P2P) electricity trading. In our work, unbalance of power consumption among clusters is mitigated by the proposed P2P trading method. For power management of individual clusters, multi-objective optimization simultaneously minimizing total power consumption, portion of grid power consumption, and tot…
▽ More
This paper presents the power management of the nanogrid clusters assisted by a novel peer-to-peer(P2P) electricity trading. In our work, unbalance of power consumption among clusters is mitigated by the proposed P2P trading method. For power management of individual clusters, multi-objective optimization simultaneously minimizing total power consumption, portion of grid power consumption, and total delay incurred by scheduling is attempted. A renewable power source photovoltaic(PV) system is adopted for each cluster as a secondary source. The temporal surplus of self-supply PV power of a cluster can be sold through P2P trading to another cluster (s) experiencing temporal power shortage. The cluster in temporal shortage of electric power buys the PV power to reduce peak load and total delay. In P2P trading, a cooperative game model is used for buyers and sellers to maximize their welfare. To increase P2P trading efficiency, future trends of load demand and PV power production are considered for power management of each cluster to resolve instantaneous unbalance between load demand and PV power production. To this end, a gated recurrent unit network is used to forecast future load demand and future PV power production. Simulations verify the effectiveness of the proposed P2P trading for nanogrid clusters.
△ Less
Submitted 2 December, 2020; v1 submitted 2 September, 2020;
originally announced September 2020.
-
Machine learning shadowgraph for particle size and shape characterization
Authors:
Jiaqi Li,
Siyao Shao,
Jiarong Hong
Abstract:
Conventional image processing for particle shadow image is usually time-consuming and suffers degraded image segmentation when dealing with the images consisting of complex-shaped and clustered particles with varying backgrounds. In this paper, we introduce a robust learning-based method using a single convolution neural network (CNN) for analyzing particle shadow images. Our approach employs a tw…
▽ More
Conventional image processing for particle shadow image is usually time-consuming and suffers degraded image segmentation when dealing with the images consisting of complex-shaped and clustered particles with varying backgrounds. In this paper, we introduce a robust learning-based method using a single convolution neural network (CNN) for analyzing particle shadow images. Our approach employs a two-channel-output U-net model to generate a binary particle image and a particle centroid image. The binary particle image is subsequently segmented through marker-controlled watershed approach with particle centroid image as the marker image. The assessment of this method on both synthetic and experimental bubble images has shown better performance compared to the state-of-art non-machine-learning method. The proposed machine learning shadow image processing approach provides a promising tool for real-time particle image analysis.
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
Experimental Demonstration of Location-aware Beam Alignment
Authors:
Junyeol Hong,
Hyeonjin Chung,
Sunwoo Kim
Abstract:
The main focus of beam alignment is to find the optimal beam which yields the largest received signal strength (RSS) with faster speed.In this paper, we demonstrate an efficient beam alignment scheme with our testbed. The algorithm we experiment uses the location information for the computation efficient beam alignment.The testbed transmits and receives the 13.8 GHz signal and steers a beam on bot…
▽ More
The main focus of beam alignment is to find the optimal beam which yields the largest received signal strength (RSS) with faster speed.In this paper, we demonstrate an efficient beam alignment scheme with our testbed. The algorithm we experiment uses the location information for the computation efficient beam alignment.The testbed transmits and receives the 13.8 GHz signal and steers a beam on both transmitter and receiver with various radio frequency (RF) components. The location information is estimated with the indoor positioning module. The experiment shows that the location-aware algorithm significantly reduces the time consumption for beam alignment than the exhaustive search.
△ Less
Submitted 6 March, 2020;
originally announced March 2020.
-
Machine learning holography for measuring 3D particle size distribution
Authors:
Siyao Shao,
Kevin Mallery,
Jiarong Hong
Abstract:
Particle size measurement based on digital holography with conventional algorithms are usually time-consuming and susceptible to noises associated with hologram quality and particle complexity, limiting its usage in a broad range of engineering applications and fundamental research. We propose a learning-based hologram processing method to cope with the aforementioned issues. The proposed approach…
▽ More
Particle size measurement based on digital holography with conventional algorithms are usually time-consuming and susceptible to noises associated with hologram quality and particle complexity, limiting its usage in a broad range of engineering applications and fundamental research. We propose a learning-based hologram processing method to cope with the aforementioned issues. The proposed approach uses a modified U-net architecture with three input channels and two output channels, and specially-designed loss functions. The proposed method has been assessed using synthetic, manually-labeled experimental, and water tunnel bubbly flow data containing particles of different shapes. The results demonstrate that our approach can achieve better performance in comparison to the state-of-the-art non-machine-learning methods in terms of particle extraction rate and positioning accuracy with significantly improved processing speed. Our learning-based approach can be extended to other types of image-based particle size measurements.
△ Less
Submitted 30 December, 2019;
originally announced December 2019.
-
Mcity Data Collection for Automated Vehicles Study
Authors:
Yiqun Dong,
Yuanxin Zhong,
Wenbo Yu,
Minghan Zhu,
Pingping Lu,
Yeyang Fang,
Jiajun Hong,
Huei Peng
Abstract:
The main goal of this paper is to introduce the data collection effort at Mcity targeting automated vehicle development. We captured a comprehensive set of data from a set of perception sensors (Lidars, Radars, Cameras) as well as vehicle steering/brake/throttle inputs and an RTK unit. Two in-cabin cameras record the human driver's behaviors for possible future use. The naturalistic driving on sel…
▽ More
The main goal of this paper is to introduce the data collection effort at Mcity targeting automated vehicle development. We captured a comprehensive set of data from a set of perception sensors (Lidars, Radars, Cameras) as well as vehicle steering/brake/throttle inputs and an RTK unit. Two in-cabin cameras record the human driver's behaviors for possible future use. The naturalistic driving on selected open roads is recorded at different time of day and weather conditions. We also perform designed choreography data collection inside the Mcity test facility focusing on vehicle to vehicle, and vehicle to vulnerable road user interactions which is quite unique among existing open-source datasets. The vehicle platform, data content, tags/labels, and selected analysis results are shown in this paper.
△ Less
Submitted 12 December, 2019;
originally announced December 2019.
-
W-Net: Two-stage U-Net with misaligned data for raw-to-RGB mapping
Authors:
Kwang-Hyun Uhm,
Seung-Wook Kim,
Seo-Won Ji,
Sung-Jin Cho,
Jun-Pyo Hong,
Sung-Jea Ko
Abstract:
Recent research on learning a mapping between raw Bayer images and RGB images has progressed with the development of deep convolutional neural networks. A challenging data set namely the Zurich Raw-to-RGB data set (ZRR) has been released in the AIM 2019 raw-to-RGB mapping challenge. In ZRR, input raw and target RGB images are captured by two different cameras and thus not perfectly aligned. Moreov…
▽ More
Recent research on learning a mapping between raw Bayer images and RGB images has progressed with the development of deep convolutional neural networks. A challenging data set namely the Zurich Raw-to-RGB data set (ZRR) has been released in the AIM 2019 raw-to-RGB mapping challenge. In ZRR, input raw and target RGB images are captured by two different cameras and thus not perfectly aligned. Moreover, camera metadata such as white balance gains and color correction matrix are not provided, which makes the challenge more difficult. In this paper, we explore an effective network structure and a loss function to address these issues. We exploit a two-stage U-Net architecture and also introduce a loss function that is less variant to alignment and more sensitive to color differences. In addition, we show an ensemble of networks trained with different loss functions can bring a significant performance gain. We demonstrate the superiority of our method by achieving the highest score in terms of both the peak signal-to-noise ratio and the structural similarity and obtaining the second-best mean-opinion-score in the challenge.
△ Less
Submitted 21 November, 2019; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Machine Learning Holography for 3D Particle Field Imaging
Authors:
Siyao Shao,
Kevin Mallery,
Santosh Kumar,
Jiarong Hong
Abstract:
We propose a new learning-based approach for 3D particle field imaging using holography. Our approach uses a U-net architecture incorporating residual connections, Swish activation, hologram preprocessing, and transfer learning to cope with challenges arising in particle holograms where accurate measurement of individual particles is crucial. Assessments on both synthetic and experimental hologram…
▽ More
We propose a new learning-based approach for 3D particle field imaging using holography. Our approach uses a U-net architecture incorporating residual connections, Swish activation, hologram preprocessing, and transfer learning to cope with challenges arising in particle holograms where accurate measurement of individual particles is crucial. Assessments on both synthetic and experimental holograms demonstrate a significant improvement in particle extraction rate, localization accuracy and speed compared to prior methods over a wide range of particle concentrations, including highly-dense concentrations where other methods are unsuitable. Our approach can be potentially extended to other types of computational imaging tasks with similar features.
△ Less
Submitted 2 November, 2019;
originally announced November 2019.
-
Laser scanning reflection-matrix microscopy for label-free in vivo imaging of a mouse brain through an intact skull
Authors:
Seokchan Yoon,
Hojun Lee,
Jin Hee Hong,
Yong-Sik Lim,
Wonshik Choi
Abstract:
We present a laser scanning reflection-matrix microscopy combining the scanning of laser focus and the wide-field mapping of the electric field of the backscattered waves for eliminating higher-order aberrations even in the presence of strong multiple light scattering noise. Unlike conventional confocal laser scanning microscopy, we record the amplitude and phase maps of reflected waves from the s…
▽ More
We present a laser scanning reflection-matrix microscopy combining the scanning of laser focus and the wide-field mapping of the electric field of the backscattered waves for eliminating higher-order aberrations even in the presence of strong multiple light scattering noise. Unlike conventional confocal laser scanning microscopy, we record the amplitude and phase maps of reflected waves from the sample not only at the confocal pinhole, but also at other non-confocal points. These additional measurements lead us to constructing a time-resolved reflection matrix, with which the sample-induced aberrations for the illumination and detection pathways are separately identified and corrected. We realized in vivo reflectance imaging of myelinated axons through an intact skull of a living mouse with the spatial resolution close to the ideal diffraction limit. Furthermore, we demonstrated near-diffraction-limited multiphoton imaging through an intact skull by physically correcting the aberrations identified from the reflection matrix. The proposed method is expected to extend the range of applications, where the knowledge of the detailed microscopic information deep within biological tissues is critical.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Regularized Inverse Holographic Volume Reconstruction for 3D Particle Tracking
Authors:
Kevin Mallery,
Jiarong Hong
Abstract:
The key limitations of digital inline holography (DIH) for particle tracking applications are poor longitudinal resolution, particle concentration limits, and case-specific processing. We utilize an inverse problem method with fused lasso regularization to perform full volumetric reconstructions of particle fields. By exploiting data sparsity in the solution and utilizing GPU processing, we dramat…
▽ More
The key limitations of digital inline holography (DIH) for particle tracking applications are poor longitudinal resolution, particle concentration limits, and case-specific processing. We utilize an inverse problem method with fused lasso regularization to perform full volumetric reconstructions of particle fields. By exploiting data sparsity in the solution and utilizing GPU processing, we dramatically reduce the computational cost usually associated with inverse reconstruction approaches. We demonstrate the accuracy of the proposed method using synthetic and experimental holograms. Finally, we present two practical applications (high concentration microorganism swimming and microfiber rotation) to extend the capabilities of DIH beyond what was possible using prior methods.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
Substation One-Line Diagram Automatic Generation and Visualization
Authors:
Jing Hong,
Yue Li,
Yiran Xu,
Chen Yuan,
Hong Fan,
Guangyi Liu,
Renchang Dai
Abstract:
In Energy Management System (EMS) applications and many other off-line planning and study tools, one-line diagram (OLND) of the whole system and stations is a straightforward view for planners and operators to design, monitor, analyze, and control the power system. Large-scale power system OLND is usually manually developed and maintained. The work is tedious, time-consuming and ease to make mista…
▽ More
In Energy Management System (EMS) applications and many other off-line planning and study tools, one-line diagram (OLND) of the whole system and stations is a straightforward view for planners and operators to design, monitor, analyze, and control the power system. Large-scale power system OLND is usually manually developed and maintained. The work is tedious, time-consuming and ease to make mistake. Meanwhile, the manually created diagrams are hard to be shared among the on-line and off-line systems. To save the time and efforts to draw and maintain OLNDs, and provide the capability to share the OLNDs, a tool to automatically develop substation based upon Common Information Model (CIM) standard is needed. Currently, there is no standard rule to draw the substation OLND. Besides, the substation layouts can be altered from the typical formats in textbooks based on factors of economy, efficiency, engineering practice, etc. This paper presents a tool on substation OLND automatic generation and visualization. This tool takes the substation CIM/E model as input, then automatically computes the coordinates of all components and generates the substation OLND based on its components attributes and connectivity relations. Evaluation of the proposed approach is presented using a real provincial power system. Over 95\% of substation OLNDs are decently presented and the rest are corner cases, needing extra effort to do specific reconfiguration.
△ Less
Submitted 20 March, 2019;
originally announced March 2019.
-
A Multi-State Diagnosis and Prognosis Framework with Feature Learning for Tool Condition Monitoring
Authors:
Chong Zhang,
Geok Soon Hong,
Jun-Hong Zhou,
Kay Chen Tan,
Haizhou Li,
Huan Xu,
Jihoon Hong,
Hian-Leng Chan
Abstract:
In this paper, a multi-state diagnosis and prognosis (MDP) framework is proposed for tool condition monitoring via a deep belief network based multi-state approach (DBNMS). For fault diagnosis, a cost-sensitive deep belief network (namely ECS-DBN) is applied to deal with the imbalanced data problem for tool state estimation. An appropriate prognostic degradation model is then applied for tool wear…
▽ More
In this paper, a multi-state diagnosis and prognosis (MDP) framework is proposed for tool condition monitoring via a deep belief network based multi-state approach (DBNMS). For fault diagnosis, a cost-sensitive deep belief network (namely ECS-DBN) is applied to deal with the imbalanced data problem for tool state estimation. An appropriate prognostic degradation model is then applied for tool wear estimation based on the different tool states. The proposed framework has the advantage of automatic feature representation learning and shows better performance in accuracy and robustness. The effectiveness of the proposed DBNMS is validated using a real-world dataset obtained from the gun drilling process. This dataset contains a large amount of measured signals involving different tool geometries under various operating conditions. The DBNMS is examined for both the tool state estimation and tool wear estimation tasks. In the experimental studies, the prediction results are evaluated and compared with popular machine learning approaches, which show the superior performance of the proposed DBNMS approach.
△ Less
Submitted 30 April, 2018;
originally announced May 2018.