subscribe to arXiv mailings

arXiv:2407.05744 [pdf, other]

Automating Urban Soundscape Enhancements with AI: In-situ Assessment of Quality and Restorativeness in Traffic-Exposed Residential Areas

Authors: Bhan Lam, Zhen-Ting Ong, Kenneth Ooi, Wen-Hui Ong, Trevor Wong, Karn N. Watcharasupat, Vanessa Boey, Irene Lee, Joo Young Hong, Jian Kang, Kar Fye Alvin Lee, Georgios Christopoulos, Woon-Seng Gan

Abstract: Formalized in ISO 12913, the "soundscape" approach is a paradigmatic shift towards perception-based urban sound management, aiming to alleviate the substantial socioeconomic costs of noise pollution to advance the United Nations Sustainable Development Goals. Focusing on traffic-exposed outdoor residential sites, we implemented an automatic masker selection system (AMSS) utilizing natural sounds t… ▽ More Formalized in ISO 12913, the "soundscape" approach is a paradigmatic shift towards perception-based urban sound management, aiming to alleviate the substantial socioeconomic costs of noise pollution to advance the United Nations Sustainable Development Goals. Focusing on traffic-exposed outdoor residential sites, we implemented an automatic masker selection system (AMSS) utilizing natural sounds to mask (or augment) traffic soundscapes. We employed a pre-trained AI model to automatically select the optimal masker and adjust its playback level, adapting to changes over time in the ambient environment to maximize "Pleasantness", a perceptual dimension of soundscape quality in ISO 12913. Our validation study involving ($N=68$) residents revealed a significant 14.6 % enhancement in "Pleasantness" after intervention, correlating with increased restorativeness and positive affect. Perceptual enhancements at the traffic-exposed site matched those at a quieter control site with 6 dB(A) lower $L_\text{A,eq}$ and road traffic noise dominance, affirming the efficacy of AMSS as a soundscape intervention, while streamlining the labour-intensive assessment of "Pleasantness" with probabilistic AI prediction. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 41 pages, 4 figures. Preprint submitted to an Elsevier journal

arXiv:2407.04239 [pdf, other]

Enabling Multicast Transmission for Spatio-Temporally Asynchronous User Requests in Wireless Environments

Authors: Hojung Lee, Jun-Pyo Hong, Wan Choi

Abstract: The surge in wireless devices and data traffic volume necessitates more efficient transmission methods. Multicasting has garnered consistent attention as a means to fulfill the increasing demand for more efficient data transmission methods. Nevertheless, leveraging multicast wireless networks for spatio-temporally asynchronous data requests poses challenges. In this context, this paper introduces… ▽ More The surge in wireless devices and data traffic volume necessitates more efficient transmission methods. Multicasting has garnered consistent attention as a means to fulfill the increasing demand for more efficient data transmission methods. Nevertheless, leveraging multicast wireless networks for spatio-temporally asynchronous data requests poses challenges. In this context, this paper introduces a new multicast mechanism called \emph{set-up based merged multicast (SMMC)} to minimize the delivery time of the requested file in wireless networks by considering the uncertainties inherent in wireless channels. The proposed mechanism comprises two phases. The first phase involves gathering asynchronous requests for a file from users experiencing diverse channel conditions. During this phase, packets of the requested file are transmitted individually in unicast mode within a specified set-up time. Following this, the second phase initiates multicast transmission, which sequentially handles the remaining packets of the file in multicast mode. In the proposed mechanism, we optimize the set-up time and transmission rates of both unicast and multicast modes to minimize the expected file delivery time by jointly taking into account the statistical characteristics of wireless channels, users' locations, and file popularity. Additionally, we also delve into a \emph{fine-tuned SMMC} by utilizing posterior information on the multicast group size and further improve the performance. Our performance evaluations reveal that the proposed SMMC outperforms conventional unicast methods, especially with high-demand data. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03274 [pdf, other]

Using Photoplethysmography to Detect Real-time Blood Pressure Changes with a Calibration-free Deep Learning Model

Authors: Jingyuan Hong, Manasi Nandi, Weiwei Jin, Jordi Alastruey

Abstract: Blood pressure (BP) changes are linked to individual health status in both clinical and non-clinical settings. This study developed a deep learning model to classify systolic (SBP), diastolic (DBP), and mean (MBP) BP changes using photoplethysmography (PPG) waveforms. Data from the Vital Signs Database (VitalDB) comprising 1,005 ICU patients with synchronized PPG and BP recordings was used. BP cha… ▽ More Blood pressure (BP) changes are linked to individual health status in both clinical and non-clinical settings. This study developed a deep learning model to classify systolic (SBP), diastolic (DBP), and mean (MBP) BP changes using photoplethysmography (PPG) waveforms. Data from the Vital Signs Database (VitalDB) comprising 1,005 ICU patients with synchronized PPG and BP recordings was used. BP changes were categorized into three labels: Spike (increase above a threshold), Stable (change within a plus or minus threshold), and Dip (decrease below a threshold). Four time-series classification models were studied: multi-layer perceptron, convolutional neural network, residual network, and Encoder. A subset of 500 patients was randomly selected for training and validation, ensuring a uniform distribution across BP change labels. Two test datasets were compiled: Test-I (n=500) with a uniform distribution selection process, and Test-II (n=5) without. The study also explored the impact of including second-deviation PPG (sdPPG) waveforms as additional input information. The Encoder model with a Softmax weighting process using both PPG and sdPPG waveforms achieved the highest detection accuracy--exceeding 71.3% and 85.4% in Test-I and Test-II, respectively, with thresholds of 30 mmHg for SBP, 15 mmHg for DBP, and 20 mmHg for MBP. Corresponding F1-scores were over 71.8% and 88.5%. These findings confirm that PPG waveforms are effective for real-time monitoring of BP changes in ICU settings and suggest potential for broader applications. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 8 pages, 5 figures, 7 tables, 1 supplementary material

arXiv:2406.05472 [pdf, other]

A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications

Authors: Aydin Zaboli, Seong Lok Choi, Tai-Jin Song, Junho Hong

Abstract: Cybersecurity breaches in digital substations can pose significant challenges to the stability and reliability of power system operations. To address these challenges, defense and mitigation techniques are required. Identifying and detecting anomalies in information and communication technology (ICT) is crucial to ensure secure device interactions within digital substations. This paper proposes a… ▽ More Cybersecurity breaches in digital substations can pose significant challenges to the stability and reliability of power system operations. To address these challenges, defense and mitigation techniques are required. Identifying and detecting anomalies in information and communication technology (ICT) is crucial to ensure secure device interactions within digital substations. This paper proposes a task-oriented dialogue (ToD) system for anomaly detection (AD) in datasets of multicast messages e.g., generic object oriented substation event (GOOSE) and sampled value (SV) in digital substations using large language models (LLMs). This model has a lower potential error and better scalability and adaptability than a process that considers the cybersecurity guidelines recommended by humans, known as the human-in-the-loop (HITL) process. Also, this methodology significantly reduces the effort required when addressing new cyber threats or anomalies compared with machine learning (ML) techniques, since it leaves the models complexity and precision unaffected and offers a faster implementation. These findings present a comparative assessment, conducted utilizing standard and advanced performance evaluation metrics for the proposed AD framework and the HITL process. To generate and extract datasets of IEC 61850 communications, a hardware-in-the-loop (HIL) testbed was employed. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 10 pages, 10 figures, Submitted to IEEE Transactions on Information Forensics and Security

arXiv:2402.18076 [pdf, other]

Online Ecological Gearshift Strategy via Neural Network with Soft-Argmax Operator

Authors: Xi Luo, Shiying Dong, Jinlong Hong, Bingzhao Gao, Hong Chen

Abstract: This paper presents a neural network optimizer with soft-argmax operator to achieve an ecological gearshift strategy in real-time. The strategy is reformulated as the mixed-integer model predictive control (MIMPC) problem to minimize energy consumption. Then the outer convexification is introduced to transform integer variables into relaxed binary controls. To approximate binary solutions properly… ▽ More This paper presents a neural network optimizer with soft-argmax operator to achieve an ecological gearshift strategy in real-time. The strategy is reformulated as the mixed-integer model predictive control (MIMPC) problem to minimize energy consumption. Then the outer convexification is introduced to transform integer variables into relaxed binary controls. To approximate binary solutions properly within training, the soft-argmax operator is applied to the neural network with the fact that all the operations of this scheme are differentiable. Moreover, this operator can help push the relaxed binary variables close to 0 or 1. To evaluate the strategy effect, we deployed it to a 2-speed electric vehicle (EV). In contrast to the mature solver Bonmin, our proposed method not only achieves similar energy-saving effects but also significantly reduces the solution time to meet real-time requirements. This results in a notable energy savings of 6.02% compared to the rule-based method. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 6 pages, 5 figures, submitted to 8th IFAC Conference on Nonlinear Model Predictive Control

arXiv:2402.11632 [pdf, other]

Reliable long timescale decision-directed channel estimation for OFDM system

Authors: Xun Wang, Xin Xie, Cunqing Hua, Jianan Hong, Pengwenlong Gu

Abstract: Decision-directed channel estimation (DDCE) is one kind of blind channel estimation method that tracks the channel blindly by an iterative algorithm without relying on the pilots, which can increase the utilization of wireless resource. However, one major problem of DDCE is the performance degradation caused by error accumulation during the tracking process. In this paper, we propose an reliable D… ▽ More Decision-directed channel estimation (DDCE) is one kind of blind channel estimation method that tracks the channel blindly by an iterative algorithm without relying on the pilots, which can increase the utilization of wireless resource. However, one major problem of DDCE is the performance degradation caused by error accumulation during the tracking process. In this paper, we propose an reliable DDCE (RDDCE) scheme for an OFDM-based communication system in the time-varying deep fading environment. By combining the conventional DDCE and discrete Fourier transform (DFT) channel estimation method, the proposed RDDCE scheme selects the reliable estimated channels on the subcarriers which are less affected by deep fading, and then estimates the channel based on the selected subcarriers by an extended DFT channel estimation where the indices of selected subcarriers are not distributed evenly. Simulation results show that RRDCE can alleviate the performance degradation effectively, track the channel with high accuracy on a long time scale, and has good performance under time-varying and noisy channel conditions. △ Less

Submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.06777 [pdf, other]

doi 10.1145/3613904.3642153

Capturing Cancer as Music: Cancer Mechanisms Expressed through Musification

Authors: Rostyslav Hnatyshyn, Jiayi Hong, Ross Maciejewski, Christopher Norby, Carlo C. Maley

Abstract: The development of cancer is difficult to express on a simple and intuitive level due to its complexity. Since cancer is so widespread, raising public awareness about its mechanisms can help those affected cope with its realities, as well as inspire others to make lifestyle adjustments and screen for the disease. Unfortunately, studies have shown that cancer literature is too technical for the gen… ▽ More The development of cancer is difficult to express on a simple and intuitive level due to its complexity. Since cancer is so widespread, raising public awareness about its mechanisms can help those affected cope with its realities, as well as inspire others to make lifestyle adjustments and screen for the disease. Unfortunately, studies have shown that cancer literature is too technical for the general public to understand. We found that musification, the process of turning data into music, remains an unexplored avenue for conveying this information. We explore the pedagogical effectiveness of musification through the use of an algorithm that manipulates a piece of music in a manner analogous to the development of cancer. We conducted two lab studies and found that our approach is marginally more effective at promoting cancer literacy when accompanied by a text-based article than text-based articles alone. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2312.02669 [pdf, other]

Deep-learning-driven end-to-end metalens imaging

Authors: Joonhyuk Seo, Jaegang Jo, Joohoon Kim, Joonho Kang, Chanik Kang, Seongwon Moon, Eunji Lee, Jehyeong Hong, Junsuk Rho, Haejun Chung

Abstract: Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic ab… ▽ More Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic aberration, angular aberration, and a relatively low efficiency. In this study, a deep-learning-based image restoration framework is proposed to overcome these limitations and realize end-to-end metalens imaging, thereby achieving aberration-free full-color imaging for mass-produced metalenses with 10-mm diameter. Neural-network-assisted metalens imaging achieved a high resolution comparable to that of the ground truth image. △ Less

Submitted 10 May, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: 17 pages, 7 figures, 1 table

arXiv:2311.13488 [pdf, other]

Machine Learning based Post Event Analysis for Cybersecurity of Cyber-Physical System

Authors: Kuchan Park, Junho Hong, Wencong Su, HyoJong Lee

Abstract: As Information and Communication Technology (ICT) equipment continues to be integrated into power systems, issues related to cybersecurity are increasingly emerging. Particularly noteworthy is the transition to digital substations, which is shifting operations from traditional hardwired-based systems to communication-based Supervisory Control and Data Acquisition (SCADA) system operations. These c… ▽ More As Information and Communication Technology (ICT) equipment continues to be integrated into power systems, issues related to cybersecurity are increasingly emerging. Particularly noteworthy is the transition to digital substations, which is shifting operations from traditional hardwired-based systems to communication-based Supervisory Control and Data Acquisition (SCADA) system operations. These changes in the power system have increased the vulnerability of the system to cyber-attacks and emphasized its importance. This paper proposes a machine learning (ML) based post event analysis of the power system in order to respond to these cybersecurity issues. An artificial neural network (ANN) and other ML models are trained using transient fault measurements and cyber-attack data on substations. The trained models can successfully distinguish between power system faults and cyber-attacks. Furthermore, the results of the proposed ML-based methods can also identify 10 different fault types and the location where the event occurred. △ Less

Submitted 7 March, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: Submitted to 2024 IEEE Power and Energy Society General Meeting

arXiv:2311.06829 [pdf, ps, other]

Joint Design of Coding and Modulation for Digital Over-the-Air Computation

Authors: Xin Xie, Cunqinq Hua, Jianan Hong, Yuejun Wei

Abstract: Due to its high communication efficiency, over-the-air computation (AirComp) has been expected to carry out various computing tasks in the next-generation wireless networks. However, up to now, most applications of AirComp are explored in the analog domain, which limits the capability of AirComp in resisting the complex wireless environment, not to mention to integrate the AirComp technique to the… ▽ More Due to its high communication efficiency, over-the-air computation (AirComp) has been expected to carry out various computing tasks in the next-generation wireless networks. However, up to now, most applications of AirComp are explored in the analog domain, which limits the capability of AirComp in resisting the complex wireless environment, not to mention to integrate the AirComp technique to the existing universal communication standards, most of which are based on the digital system. In this paper, we propose a joint design of channel coding and digital modulation for digital AirComp transmission to attempt to reinforce the foundation for the application of AirComp in the digital system. Specifically, we first propose a non-binary LDPC-based channel coding scheme to enhance the error-correction capability of AirComp. Then, a digital modulation scheme is proposed to achieve the number summation from multiple transmitters via the lattice coding technique. We also provide simulation results to demonstrate the feasibility and the performance of the proposed design. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: This paper has been submitted to IEEE ICC 2024

arXiv:2311.05462 [pdf, other]

ChatGPT and Other Large Language Models for Cybersecurity of Smart Grid Applications

Authors: Aydin Zaboli, Seong Lok Choi, Tai-Jin Song, Junho Hong

Abstract: Cybersecurity breaches targeting electrical substations constitute a significant threat to the integrity of the power grid, necessitating comprehensive defense and mitigation strategies. Any anomaly in information and communication technology (ICT) should be detected for secure communications between devices in digital substations. This paper proposes large language models (LLM), e.g., ChatGPT, fo… ▽ More Cybersecurity breaches targeting electrical substations constitute a significant threat to the integrity of the power grid, necessitating comprehensive defense and mitigation strategies. Any anomaly in information and communication technology (ICT) should be detected for secure communications between devices in digital substations. This paper proposes large language models (LLM), e.g., ChatGPT, for the cybersecurity of IEC 61850-based digital substation communications. Multicast messages such as generic object oriented system event (GOOSE) and sampled value (SV) are used for case studies. The proposed LLM-based cybersecurity framework includes, for the first time, data pre-processing of communication systems and human-in-the-loop (HITL) training (considering the cybersecurity guidelines recommended by humans). The results show a comparative analysis of detected anomaly data carried out based on the performance evaluation metrics for different LLMs. A hardware-in-the-loop (HIL) testbed is used to generate and extract dataset of IEC 61850 communications. △ Less

Submitted 25 February, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: 5 pages, 2 figures, Accepted, 2024 IEEE Power & Energy Society General Meeting (PESGM), Seattle, WA, USA

arXiv:2310.14946 [pdf, other]

Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model

Authors: Joanna Hong, Se Jin Park, Yong Man Ro

Abstract: We present a novel approach to multilingual audio-visual speech recognition tasks by introducing a single model on a multilingual dataset. Motivated by a human cognitive system where humans can intuitively distinguish different languages without any conscious effort or guidance, we propose a model that can capture which language is given as an input speech by distinguishing the inherent similariti… ▽ More We present a novel approach to multilingual audio-visual speech recognition tasks by introducing a single model on a multilingual dataset. Motivated by a human cognitive system where humans can intuitively distinguish different languages without any conscious effort or guidance, we propose a model that can capture which language is given as an input speech by distinguishing the inherent similarities and differences between languages. To do so, we design a prompt fine-tuning technique into the largely pre-trained audio-visual representation model so that the network can recognize the language class as well as the speech with the corresponding language. Our work contributes to developing robust and efficient multilingual audio-visual speech recognition systems, reducing the need for language-specific models. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Findings

arXiv:2310.05934 [pdf, other]

DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion

Authors: Se Jin Park, Joanna Hong, Minsu Kim, Yong Man Ro

Abstract: Speech-driven 3D facial animation has gained significant attention for its ability to create realistic and expressive facial animations in 3D space based on speech. Learning-based methods have shown promising progress in achieving accurate facial motion synchronized with speech. However, one-to-many nature of speech-to-3D facial synthesis has not been fully explored: while the lip accurately synch… ▽ More Speech-driven 3D facial animation has gained significant attention for its ability to create realistic and expressive facial animations in 3D space based on speech. Learning-based methods have shown promising progress in achieving accurate facial motion synchronized with speech. However, one-to-many nature of speech-to-3D facial synthesis has not been fully explored: while the lip accurately synchronizes with the speech content, other facial attributes beyond speech-related motions are variable with respect to the speech. To account for the potential variance in the facial attributes within a single speech, we propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis. DF-3DFace captures the complex one-to-many relationships between speech and 3D face based on diffusion. It concurrently achieves aligned lip motion by exploiting audio-mesh synchronization and masked conditioning. Furthermore, the proposed method jointly models identity and pose in addition to facial motions so that it can generate 3D face animation without requiring a reference identity mesh and produce natural head poses. We contribute a new large-scale 3D facial mesh dataset, 3D-HDTF to enable the synthesis of variations in identities, poses, and facial motions of 3D face mesh. Extensive experiments demonstrate that our method successfully generates highly variable facial shapes and motions from speech and simultaneously achieves more realistic facial animation than the state-of-the-art methods. △ Less

Submitted 23 August, 2023; originally announced October 2023.

arXiv:2309.12566 [pdf, other]

Recent Advances in Path Integral Control for Trajectory Optimization: An Overview in Theoretical and Algorithmic Perspectives

Authors: Muhammad Kazim, JunGee Hong, Min-Gyeom Kim, Kwang-Ki K. Kim

Abstract: This paper presents a tutorial overview of path integral (PI) control approaches for stochastic optimal control and trajectory optimization. We concisely summarize the theoretical development of path integral control to compute a solution for stochastic optimal control and provide algorithmic descriptions of the cross-entropy (CE) method, an open-loop controller using the receding horizon scheme k… ▽ More This paper presents a tutorial overview of path integral (PI) control approaches for stochastic optimal control and trajectory optimization. We concisely summarize the theoretical development of path integral control to compute a solution for stochastic optimal control and provide algorithmic descriptions of the cross-entropy (CE) method, an open-loop controller using the receding horizon scheme known as the model predictive path integral (MPPI), and a parameterized state feedback controller based on the path integral control theory. We discuss policy search methods based on path integral control, efficient and stable sampling strategies, extensions to multi-agent decision-making, and MPPI for the trajectory optimization on manifolds. For tutorial demonstrations, some PI-based controllers are implemented in Python, MATLAB and ROS2/Gazebo simulations for trajectory optimization. The simulation frameworks and source codes are publicly available at https://github.com/INHA-Autonomous-Systems-Laboratory-ASL/An-Overview-on-Recent-Advances-in-Path-Integral-Control. △ Less

Submitted 1 December, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 16 pages, 9 figures

MSC Class: 68T40; 13P25 ACM Class: I.2.9; I.2.8; G.1.6; G.4

arXiv:2308.07787 [pdf, other]

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

Authors: Jeongsoo Choi, Joanna Hong, Yong Man Ro

Abstract: Recent research has demonstrated impressive results in video-to-speech synthesis which involves reconstructing speech solely from visual input. However, previous works have struggled to accurately synthesize speech due to a lack of sufficient guidance for the model to infer the correct content with the appropriate sound. To resolve the issue, they have adopted an extra speaker embedding as a speak… ▽ More Recent research has demonstrated impressive results in video-to-speech synthesis which involves reconstructing speech solely from visual input. However, previous works have struggled to accurately synthesize speech due to a lack of sufficient guidance for the model to infer the correct content with the appropriate sound. To resolve the issue, they have adopted an extra speaker embedding as a speaking style guidance from a reference auditory information. Nevertheless, it is not always possible to obtain the audio information from the corresponding video input, especially during the inference time. In this paper, we present a novel vision-guided speaker embedding extractor using a self-supervised pre-trained model and prompt tuning technique. In doing so, the rich speaker embedding information can be produced solely from input visual information, and the extra audio information is not necessary during the inference time. Using the extracted vision-guided speaker embedding representations, we further develop a diffusion-based video-to-speech synthesis model, so called DiffV2S, conditioned on those speaker embeddings and the visual representation extracted from the input video. The proposed DiffV2S not only maintains phoneme details contained in the input video frames, but also creates a highly intelligible mel-spectrogram in which the speaker identities of the multiple speakers are all preserved. Our experimental results show that DiffV2S achieves the state-of-the-art performance compared to the previous video-to-speech synthesis technique. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: ICCV 2023

arXiv:2306.15212 [pdf, other]

TranssionADD: A multi-frame reinforcement based sequence tagging model for audio deepfake detection

Authors: Jie Liu, Zhiba Su, Hui Huang, Caiyan Wan, Quanxiu Wang, Jiangli Hong, Benlai Tang, Fengjie Zhu

Abstract: Thanks to recent advancements in end-to-end speech modeling technology, it has become increasingly feasible to imitate and clone a user`s voice. This leads to a significant challenge in differentiating between authentic and fabricated audio segments. To address the issue of user voice abuse and misuse, the second Audio Deepfake Detection Challenge (ADD 2023) aims to detect and analyze deepfake spe… ▽ More Thanks to recent advancements in end-to-end speech modeling technology, it has become increasingly feasible to imitate and clone a user`s voice. This leads to a significant challenge in differentiating between authentic and fabricated audio segments. To address the issue of user voice abuse and misuse, the second Audio Deepfake Detection Challenge (ADD 2023) aims to detect and analyze deepfake speech utterances. Specifically, Track 2, named the Manipulation Region Location (RL), aims to pinpoint the location of manipulated regions in audio, which can be present in both real and generated audio segments. We propose our novel TranssionADD system as a solution to the challenging problem of model robustness and audio segment outliers in the trace competition. Our system provides three unique contributions: 1) we adapt sequence tagging task for audio deepfake detection; 2) we improve model generalization by various data augmentation techniques; 3) we incorporate multi-frame detection (MFD) module to overcome limited representation provided by a single frame and use isolated-frame penalty (IFP) loss to handle outliers in segments. Our best submission achieved 2nd place in Track 2, demonstrating the effectiveness and robustness of our proposed system. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2304.06237 [pdf, other]

Deep learning based ECG segmentation for delineation of diverse arrhythmias

Authors: Chankyu Joung, Mijin Kim, Taejin Paik, Seong-Ho Kong, Seung-Young Oh, Won Kyeong Jeon, Jae-hu Jeon, Joong-Sik Hong, Wan-Joong Kim, Woong Kook, Myung-Jin Cha, Otto van Koert

Abstract: Accurate delineation of key waveforms in an ECG is a critical initial step in extracting relevant features to support the diagnosis and treatment of heart conditions. Although deep learning based methods using a segmentation model to locate the P, QRS, and T waves have shown promising results, their ability to handle signals exhibiting arrhythmia remains unclear. This study builds on existing rese… ▽ More Accurate delineation of key waveforms in an ECG is a critical initial step in extracting relevant features to support the diagnosis and treatment of heart conditions. Although deep learning based methods using a segmentation model to locate the P, QRS, and T waves have shown promising results, their ability to handle signals exhibiting arrhythmia remains unclear. This study builds on existing research by introducing a U-Net-like segmentation model for ECG delineation, with a particular focus on diverse arrhythmias. For this purpose, we curate an internal dataset containing waveform boundary annotations for various arrhythmia types to train and validate our model. Our key contributions include identifying segmentation model failures in different arrhythmia types, developing a robust model using a diverse training set, achieving comparable performance on benchmark datasets, and introducing a classification guided strategy to reduce false P wave predictions for specific arrhythmias. This study advances deep learning based ECG delineation in the context of arrhythmias and highlights its challenges. △ Less

Submitted 6 September, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

arXiv:2304.01544 [pdf]

doi 10.1063/5.0152878

Numerical Investigation of Airborne Infection Risk in an Elevator Cabin under Different Ventilation Designs

Authors: Ata Nazari, Changchang Wang, Ruichen He, Farzad Taghizadeh-Hesary, Jiarong Hong

Abstract: Airborne transmission of SARS-CoV-2 via virus-laden aerosols in enclosed spaces poses a significant concern. Elevators, commonly utilized enclosed spaces in modern tall buildings, present a challenge as the impact of varying heating, ventilation, and air conditioning (HVAC) systems on virus transmission within these cabins remains unclear. In this study, we employ computational modeling to examine… ▽ More Airborne transmission of SARS-CoV-2 via virus-laden aerosols in enclosed spaces poses a significant concern. Elevators, commonly utilized enclosed spaces in modern tall buildings, present a challenge as the impact of varying heating, ventilation, and air conditioning (HVAC) systems on virus transmission within these cabins remains unclear. In this study, we employ computational modeling to examine aerosol transmission within an elevator cabin outfitted with diverse HVAC systems. Using a transport equation, we model aerosol concentration and assess infection risk distribution across passengers' breathing zones. We calculate particle removal efficiency for each HVAC design and introduce a suppression effect criterion to evaluate the effectiveness of the HVAC systems. Our findings reveal that mixing ventilation, featuring both inlet and outlet at the ceiling, proves most efficient in reducing particle spread, achieving a maximum removal efficiency of 79.40% during the exposure time. Conversely, the stratum ventilation model attains a mere removal efficiency of 3.97%. These results underscore the importance of careful HVAC system selection in mitigating the risk of SARS-CoV-2 transmission within elevator cabins. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: 38 pages, 14 figures

arXiv:2303.08536 [pdf, other]

Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring

Authors: Joanna Hong, Minsu Kim, Jeongsoo Choi, Yong Man Ro

Abstract: This paper deals with Audio-Visual Speech Recognition (AVSR) under multimodal input corruption situations where audio inputs and visual inputs are both corrupted, which is not well addressed in previous research directions. Previous studies have focused on how to complement the corrupted audio inputs with the clean visual inputs with the assumption of the availability of clean visual inputs. Howev… ▽ More This paper deals with Audio-Visual Speech Recognition (AVSR) under multimodal input corruption situations where audio inputs and visual inputs are both corrupted, which is not well addressed in previous research directions. Previous studies have focused on how to complement the corrupted audio inputs with the clean visual inputs with the assumption of the availability of clean visual inputs. However, in real life, clean visual inputs are not always accessible and can even be corrupted by occluded lip regions or noises. Thus, we firstly analyze that the previous AVSR models are not indeed robust to the corruption of multimodal input streams, the audio and the visual inputs, compared to uni-modal models. Then, we design multimodal input corruption modeling to develop robust AVSR models. Lastly, we propose a novel AVSR framework, namely Audio-Visual Reliability Scoring module (AV-RelScore), that is robust to the corrupted multimodal inputs. The AV-RelScore can determine which input modal stream is reliable or not for the prediction and also can exploit the more reliable streams in prediction. The effectiveness of the proposed method is evaluated with comprehensive experiments on popular benchmark databases, LRS2 and LRS3. We also show that the reliability scores obtained by AV-RelScore well reflect the degree of corruption and make the proposed model focus on the reliable multimodal representations. △ Less

Submitted 20 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: Accepted at CVPR 2023. Implementation available: https://github.com/joannahong/AV-RelScore

arXiv:2303.05732 [pdf]

doi 10.3745/KTSDE.2021.10.8.287

Securing Safety in Collaborative Cyber-Physical Systems through Fault Criticality Analysis

Authors: Manzoor Hussain, Nazakat Ali, Jang-Eui Hong

Abstract: Collaborative Cyber-Physical Systems (CCPS) are systems that contain tightly coupled physical and cyber components, massively interconnected subsystems, and collaborate to achieve a common goal. The safety of a single Cyber-Physical System (CPS) can be achieved by following the safety standards such as ISO 26262 and IEC 61508 or by applying hazard analysis techniques. However, due to the complex,… ▽ More Collaborative Cyber-Physical Systems (CCPS) are systems that contain tightly coupled physical and cyber components, massively interconnected subsystems, and collaborate to achieve a common goal. The safety of a single Cyber-Physical System (CPS) can be achieved by following the safety standards such as ISO 26262 and IEC 61508 or by applying hazard analysis techniques. However, due to the complex, highly interconnected, heterogeneous, and collaborative nature of CCPS, a fault in one CPS's components can trigger many other faults in other collaborating CPSs. Therefore, a safety assurance technique based on fault criticality analysis would require to ensure safety in CCPS. This paper presents a Fault Criticality Matrix (FCM) implemented in our tool called CPSTracer, which contains several data such as identified fault, fault criticality, safety guard, etc. The proposed FCM is based on composite hazard analysis and content-based relationships among the hazard analysis artifacts, and ensures that the safety guard controls the identified faults at design time; thus, we can effectively manage and control the fault at the design phase to ensure the safe development of CPSs. To validate our approach, we introduce a case study on the Platooning system (a collaborative CPS). We perform the criticality analysis of the Platooning system using FCM in our developed tool. After the detailed fault criticality analysis, we investigate the results to check the appropriateness and effectiveness with two research questions. Also, by performing simulation for the Platooning, we showed that the rate of collision of the Platooning system without using FCM was quite high as compared to the rate of collisions of the system after analyzing the fault criticality using FCM. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: This paper is an extended version of an article submitted to KCSE-2021

Journal ref: KIPS Transactions on Software and Data Engineering, vol. 10, no. 8, pp. 287-300, 2021

arXiv:2302.08841 [pdf, other]

Lip-to-Speech Synthesis in the Wild with Multi-task Learning

Authors: Minsu Kim, Joanna Hong, Yong Man Ro

Abstract: Recent studies have shown impressive performance in Lip-to-speech synthesis that aims to reconstruct speech from visual information alone. However, they have been suffering from synthesizing accurate speech in the wild, due to insufficient supervision for guiding the model to infer the correct content. Distinct from the previous methods, in this paper, we develop a powerful Lip2Speech method that… ▽ More Recent studies have shown impressive performance in Lip-to-speech synthesis that aims to reconstruct speech from visual information alone. However, they have been suffering from synthesizing accurate speech in the wild, due to insufficient supervision for guiding the model to infer the correct content. Distinct from the previous methods, in this paper, we develop a powerful Lip2Speech method that can reconstruct speech with correct contents from the input lip movements, even in a wild environment. To this end, we design multi-task learning that guides the model using multimodal supervision, i.e., text and audio, to complement the insufficient word representations of acoustic feature reconstruction loss. Thus, the proposed framework brings the advantage of synthesizing speech containing the right content of multiple speakers with unconstrained sentences. We verify the effectiveness of the proposed method using LRS2, LRS3, and LRW datasets. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: Accepted at ICASSP 2023. Demo available: https://github.com/joannahong/Lip-to-Speech-Synthesis-in-the-Wild

arXiv:2301.09638 [pdf]

In situ Biological Particle Analyzer based on Digital Inline Holography

Authors: Delaney Sanborn, Ruichen He, Lei Feng, Jiarong Hong

Abstract: Obtaining in situ measurements of biological microparticles is crucial for both scientific research and numerous industrial applications (e.g., early detection of harmful algal blooms, monitoring yeast during fermentation). However, existing methods are limited to offer timely diagnostics of these particles with sufficient accuracy and information. Here, we introduce a novel method for real-time,… ▽ More Obtaining in situ measurements of biological microparticles is crucial for both scientific research and numerous industrial applications (e.g., early detection of harmful algal blooms, monitoring yeast during fermentation). However, existing methods are limited to offer timely diagnostics of these particles with sufficient accuracy and information. Here, we introduce a novel method for real-time, in situ analysis using machine learning assisted digital inline holography (DIH). Our machine learning model uses a customized YOLO v5 architecture specialized for the detection and classification of small biological particles. We demonstrate the effectiveness of our method in the analysis of 10 plankton species with equivalent high accuracy and significantly reduced processing time compared to previous methods. We also applied our method to differentiate yeast cells under four metabolic states and from two strains. Our results show that the proposed method can accurately detect and differentiate cellular and subcellular features related to metabolic states and strains. This study demonstrates the potential of machine learning driven DIH approach as a sensitive and versatile diagnostic tool for real-time, in situ analysis of both biotic and abiotic particles. This method can be readily deployed in a distributive manner for scientific research and manufacturing on an industrial scale. △ Less

Submitted 14 January, 2023; originally announced January 2023.

Comments: 18 pages, 9 figures

arXiv:2212.06368 [pdf, other]

Single Cell Training on Architecture Search for Image Denoising

Authors: Bokyeung Lee, Kyungdeuk Ko, Jonghwan Hong, Hanseok Ko

Abstract: Neural Architecture Search (NAS) for automatically finding the optimal network architecture has shown some success with competitive performances in various computer vision tasks. However, NAS in general requires a tremendous amount of computations. Thus reducing computational cost has emerged as an important issue. Most of the attempts so far has been based on manual approaches, and often the arch… ▽ More Neural Architecture Search (NAS) for automatically finding the optimal network architecture has shown some success with competitive performances in various computer vision tasks. However, NAS in general requires a tremendous amount of computations. Thus reducing computational cost has emerged as an important issue. Most of the attempts so far has been based on manual approaches, and often the architectures developed from such efforts dwell in the balance of the network optimality and the search cost. Additionally, recent NAS methods for image restoration generally do not consider dynamic operations that may transform dimensions of feature maps because of the dimensionality mismatch in tensor calculations. This can greatly limit NAS in its search for optimal network structure. To address these issues, we re-frame the optimal search problem by focusing at component block level. From previous work, it's been shown that an effective denoising block can be connected in series to further improve the network performance. By focusing at block level, the search space of reinforcement learning becomes significantly smaller and evaluation process can be conducted more rapidly. In addition, we integrate an innovative dimension matching modules for dealing with spatial and channel-wise mismatch that may occur in the optimal design search. This allows much flexibility in optimal network search within the cell block. With these modules, then we employ reinforcement learning in search of an optimal image denoising network at a module level. Computational efficiency of our proposed Denoising Prior Neural Architecture Search (DPNAS) was demonstrated by having it complete an optimal architecture search for an image restoration task by just one day with a single GPU. △ Less

Submitted 12 December, 2022; originally announced December 2022.

arXiv:2211.08530 [pdf, ps, other]

Cyber-Attack Event Analysis for EV Charging Stations

Authors: Mansi Girdhar, Junho Hong, Yongsik You, Tai-jin Song, Manimaran Govindarasu

Abstract: Safe and secure electric vehicle charging stations (EVCSs) are important in smart transportation infrastructure. The prevalence of EVCSs has rapidly increased over time in response to the rising demand for EV charging. However, developments in information and communication technologies (ICT) have made the cyber-physical system (CPS) of EVCSs susceptible to cyber-attacks, which might destabilize th… ▽ More Safe and secure electric vehicle charging stations (EVCSs) are important in smart transportation infrastructure. The prevalence of EVCSs has rapidly increased over time in response to the rising demand for EV charging. However, developments in information and communication technologies (ICT) have made the cyber-physical system (CPS) of EVCSs susceptible to cyber-attacks, which might destabilize the infrastructure of the electric grid as well as the environment for charging. This study suggests a 5Ws \& 1H-based investigation approach to deal with cyber-attack-related incidents due to the incapacity of the current investigation frameworks to comprehend and handle these mishaps. Also, a stochastic anomaly detection system (ADS) is proposed to identify the anomalies, abnormal activities, and unusual operations of the station entities as a post cyber event analysis. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 5 Pages, 2 Figures, 2 Tables, 10 Mathematical Equations, PES GM Conference Paper

arXiv:2211.00924 [pdf, other]

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

Authors: Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, Yong Man Ro

Abstract: The challenge of talking face generation from speech lies in aligning two different modal information, audio and video, such that the mouth region corresponds to input audio. Previous methods either exploit audio-visual representation learning or leverage intermediate structural information such as landmarks and 3D models. However, they struggle to synthesize fine details of the lips varying at th… ▽ More The challenge of talking face generation from speech lies in aligning two different modal information, audio and video, such that the mouth region corresponds to input audio. Previous methods either exploit audio-visual representation learning or leverage intermediate structural information such as landmarks and 3D models. However, they struggle to synthesize fine details of the lips varying at the phoneme level as they do not sufficiently provide visual information of the lips at the video synthesis step. To overcome this limitation, our work proposes Audio-Lip Memory that brings in visual information of the mouth region corresponding to input audio and enforces fine-grained audio-visual coherence. It stores lip motion features from sequential ground truth images in the value memory and aligns them with corresponding audio features so that they can be retrieved using audio input at inference time. Therefore, using the retrieved lip motion features as visual hints, it can easily correlate audio with visual dynamics in the synthesis step. By analyzing the memory, we demonstrate that unique lip features are stored in each memory slot at the phoneme level, capturing subtle lip motion based on memory addressing. In addition, we introduce visual-visual synchronization loss which can enhance lip-syncing performance when used along with audio-visual synchronization loss in our model. Extensive experiments are performed to verify that our method generates high-quality video with mouth shapes that best align with the input audio, outperforming previous state-of-the-art methods. △ Less

Submitted 2 November, 2022; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted at AAAI 2022 (Oral)

arXiv:2210.14297 [pdf, other]

Progressively refined deep joint registration segmentation (ProRSeg) of gastrointestinal organs at risk: Application to MRI and cone-beam CT

Authors: Jue Jiang, Jun Hong, Kathryn Tringale, Marsha Reyngold, Christopher Crane, Neelam Tyagi, Harini Veeraraghavan

Abstract: Method: ProRSeg was trained using 5-fold cross-validation with 110 T2-weighted MRI acquired at 5 treatment fractions from 10 different patients, taking care that same patient scans were not placed in training and testing folds. Segmentation accuracy was measured using Dice similarity coefficient (DSC) and Hausdorff distance at 95th percentile (HD95). Registration consistency was measured using coe… ▽ More Method: ProRSeg was trained using 5-fold cross-validation with 110 T2-weighted MRI acquired at 5 treatment fractions from 10 different patients, taking care that same patient scans were not placed in training and testing folds. Segmentation accuracy was measured using Dice similarity coefficient (DSC) and Hausdorff distance at 95th percentile (HD95). Registration consistency was measured using coefficient of variation (CV) in displacement of OARs. Ablation tests and accuracy comparisons against multiple methods were done. Finally, applicability of ProRSeg to segment cone-beam CT (CBCT) scans was evaluated on 80 scans using 5-fold cross-validation. Results: ProRSeg processed 3D volumes (128 $\times$ 192 $\times$ 128) in 3 secs on a NVIDIA Tesla V100 GPU. It's segmentations were significantly more accurate ($p<0.001$) than compared methods, achieving a DSC of 0.94 $\pm$0.02 for liver, 0.88$\pm$0.04 for large bowel, 0.78$\pm$0.03 for small bowel and 0.82$\pm$0.04 for stomach-duodenum from MRI. ProRSeg achieved a DSC of 0.72$\pm$0.01 for small bowel and 0.76$\pm$0.03 for stomach-duodenum from CBCT. ProRSeg registrations resulted in the lowest CV in displacement (stomach-duodenum $CV_{x}$: 0.75\%, $CV_{y}$: 0.73\%, and $CV_{z}$: 0.81\%; small bowel $CV_{x}$: 0.80\%, $CV_{y}$: 0.80\%, and $CV_{z}$: 0.68\%; large bowel $CV_{x}$: 0.71\%, $CV_{y}$ : 0.81\%, and $CV_{z}$: 0.75\%). ProRSeg based dose accumulation accounting for intra-fraction (pre-treatment to post-treatment MRI scan) and inter-fraction motion showed that the organ dose constraints were violated in 4 patients for stomach-duodenum and for 3 patients for small bowel. Study limitations include lack of independent testing and ground truth phantom datasets to measure dose accumulation accuracy. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: This manuscript is currently under review at Medical Physics

arXiv:2208.10644 [pdf, other]

Machine Learning-Enabled Cyber Attack Prediction and Mitigation for EV Charging Stations

Authors: Mansi Girdhar, Junho Hong, Yongsik Yoo, Tai-Jin Song

Abstract: Safe and reliable electric vehicle charging stations (EVCSs) have become imperative in an intelligent transportation infrastructure. Over the years, there has been a rapid increase in the deployment of EVCSs to address the upsurging charging demands. However, advances in information and communication technologies (ICT) have rendered this cyber-physical system (CPS) vulnerable to suffering cyber th… ▽ More Safe and reliable electric vehicle charging stations (EVCSs) have become imperative in an intelligent transportation infrastructure. Over the years, there has been a rapid increase in the deployment of EVCSs to address the upsurging charging demands. However, advances in information and communication technologies (ICT) have rendered this cyber-physical system (CPS) vulnerable to suffering cyber threats, thereby destabilizing the charging ecosystem and even the entire electric grid infrastructure. This paper develops an advanced cybersecurity framework, where STRIDE threat modeling is used to identify potential vulnerabilities in an EVCS. Further, the weighted attack defense tree approach is employed to create multiple attack scenarios, followed by developing Hidden Markov Model (HMM) and Partially Observable Monte-Carlo Planning (POMCP) algorithms for modeling the security attacks. Also, potential mitigation strategies are suggested for the identified threats. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: 5 pages, 4 figures, 11 mathematical equations

arXiv:2207.06020 [pdf, other]

Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition

Authors: Joanna Hong, Minsu Kim, Daehun Yoo, Yong Man Ro

Abstract: This paper focuses on designing a noise-robust end-to-end Audio-Visual Speech Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature Enhancement module (V-CAFE) to enhance the input noisy audio speech with a help of audio-visual correspondence. The proposed V-CAFE is designed to capture the transition of lip movements, namely visual context and to generate a noise r… ▽ More This paper focuses on designing a noise-robust end-to-end Audio-Visual Speech Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature Enhancement module (V-CAFE) to enhance the input noisy audio speech with a help of audio-visual correspondence. The proposed V-CAFE is designed to capture the transition of lip movements, namely visual context and to generate a noise reduction mask by considering the obtained visual context. Through context-dependent modeling, the ambiguity in viseme-to-phoneme mapping can be refined for mask generation. The noisy representations are masked out with the noise reduction mask resulting in enhanced audio features. The enhanced audio features are fused with the visual features and taken to an encoder-decoder model composed of Conformer and Transformer for speech recognition. We show the proposed end-to-end AVSR with the V-CAFE can further improve the noise-robustness of AVSR. The effectiveness of the proposed method is evaluated in noisy speech recognition and overlapped speech recognition experiments using the two largest audio-visual datasets, LRS2 and LRS3. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: Accepted at Interspeech 2022

arXiv:2207.01078 [pdf, other]

doi 10.1109/TAFFC.2023.3247914

ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes

Authors: Kenneth Ooi, Zhen-Ting Ong, Karn N. Watcharasupat, Bhan Lam, Joo Young Hong, Woon-Seng Gan

Abstract: Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which… ▽ More Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which comprises a five-fold cross-validation set and independent test set totaling 25,440 unique subjective perceptual responses to augmented soundscapes presented as audio-visual stimuli. Each augmented soundscape is made by digitally adding "maskers" (bird, water, wind, traffic, construction, or silence) to urban soundscape recordings at fixed soundscape-to-masker ratios. Responses were then collected by asking participants to rate how pleasant, annoying, eventful, uneventful, vibrant, monotonous, chaotic, calm, and appropriate each augmented soundscape was, in accordance with ISO 12913-2:2018. Participants also provided relevant demographic information and completed standard psychological questionnaires. We perform exploratory and statistical analysis of the responses obtained to verify internal consistency and agreement with known results in the literature. Finally, we demonstrate the benchmarking capability of the dataset by training and comparing four baseline models for urban soundscape pleasantness: a low-parameter regression model, a high-parameter convolutional neural network, and two attention-based networks in the literature. △ Less

Submitted 2 July, 2024; v1 submitted 3 July, 2022; originally announced July 2022.

Comments: [v1, v2] 25 pages, 11 figures. [v3] 33 pages, 18 figures. v3 updated with changes made after peer review. in IEEE Transactions on Affective Computing, 2023. [v4] 33 pages, 18 figures. Fixed inaccurate author list in citation #90

Journal ref: IEEE Trans. Affect. Comput., pp. 1-17, 2023

arXiv:2206.07458 [pdf, other]

VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection

Authors: Joanna Hong, Minsu Kim, Yong Man Ro

Abstract: The goal of this work is to reconstruct speech from a silent talking face video. Recent studies have shown impressive performance on synthesizing speech from silent talking face videos. However, they have not explicitly considered on varying identity characteristics of different speakers, which place a challenge in the video-to-speech synthesis, and this becomes more critical in unseen-speaker set… ▽ More The goal of this work is to reconstruct speech from a silent talking face video. Recent studies have shown impressive performance on synthesizing speech from silent talking face videos. However, they have not explicitly considered on varying identity characteristics of different speakers, which place a challenge in the video-to-speech synthesis, and this becomes more critical in unseen-speaker settings. Our approach is to separate the speech content and the visage-style from a given silent talking face video. By guiding the model to independently focus on modeling the two representations, we can obtain the speech of high intelligibility from the model even when the input video of an unseen subject is given. To this end, we introduce speech-visage selection that separates the speech content and the speaker identity from the visual features of the input video. The disentangled representations are jointly incorporated to synthesize speech through visage-style based synthesizer which generates speech by coating the visage-styles while maintaining the speech content. Thus, the proposed framework brings the advantage of synthesizing the speech containing the right content even with the silent talking face video of an unseen subject. We validate the effectiveness of the proposed framework on the GRID, TCD-TIMIT volunteer, and LRW datasets. △ Less

Submitted 20 July, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: Accepted by ECCV 2022

arXiv:2206.03112 [pdf]

doi 10.3390/su14127485

Singapore Soundscape Site Selection Survey (S5): Identification of Characteristic Soundscapes of Singapore via Weighted k-means Clustering

Authors: Kenneth Ooi, Bhan Lam, Joo Young Hong, Karn N. Watcharasupat, Zhen-Ting Ong, Woon-Seng Gan

Abstract: The ecological validity of soundscape studies usually rests on a choice of soundscapes that are representative of the perceptual space under investigation. For example, a soundscape pleasantness study might investigate locations with soundscapes ranging from "pleasant" to "annoying". The choice of soundscapes is typically researcher-led, but a participant-led process can reduce selection bias and… ▽ More The ecological validity of soundscape studies usually rests on a choice of soundscapes that are representative of the perceptual space under investigation. For example, a soundscape pleasantness study might investigate locations with soundscapes ranging from "pleasant" to "annoying". The choice of soundscapes is typically researcher-led, but a participant-led process can reduce selection bias and improve result reliability. Hence, we propose a robust participant-led method to pinpoint characteristic soundscapes possessing arbitrary perceptual attributes. We validate our method by identifying Singaporean soundscapes spanning the perceptual quadrants generated from the "Pleasantness" and "Eventfulness" axes of the ISO 12913-2 circumplex model of soundscape perception, as perceived by local experts. From memory and experience, 67 participants first selected locations corresponding to each perceptual quadrant in each major planning region of Singapore. We then performed weighted k-means clustering on the selected locations, with weights for each location derived from previous frequencies and durations spent in each location by each participant. Weights hence acted as proxies for participant confidence. In total, 62 locations were thereby identified as suitable locations with characteristic soundscapes for further research utilizing the ISO 12913-2 perceptual quadrants. Audio-visual recordings and acoustic characterization of the soundscapes will be made in a future study. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: 23 pages, 8 figures. Submitted to Sustainability

Journal ref: MDPI Sustainability. 2022; 14(12):7485

arXiv:2204.01726 [pdf, other]

Lip to Speech Synthesis with Visual Context Attentional GAN

Authors: Minsu Kim, Joanna Hong, Yong Man Ro

Abstract: In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis. Specifically, the proposed VCA-GAN synthesizes the speech from local lip visual features by finding a mapping function of viseme-to-phoneme, while global visual context is embedded into the intermed… ▽ More In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis. Specifically, the proposed VCA-GAN synthesizes the speech from local lip visual features by finding a mapping function of viseme-to-phoneme, while global visual context is embedded into the intermediate layers of the generator to clarify the ambiguity in the mapping induced by homophene. To achieve this, a visual context attention module is proposed where it encodes global representations from the local visual features, and provides the desired global visual context corresponding to the given coarse speech representation to the generator through audio-visual attention. In addition to the explicit modelling of local and global visual representations, synchronization learning is introduced as a form of contrastive learning that guides the generator to synthesize a speech in sync with the given input lip movements. Extensive experiments demonstrate that the proposed VCA-GAN outperforms existing state-of-the-art and is able to effectively synthesize the speech from multi-speaker that has been barely handled in the previous works. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: Published at NeurIPS 2021

arXiv:2204.01265 [pdf, other]

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

Authors: Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro

Abstract: In this paper, we introduce a novel audio-visual multi-modal bridging framework that can utilize both audio and visual information, even with uni-modal inputs. We exploit a memory network that stores source (i.e., visual) and target (i.e., audio) modal representations, where source modal representation is what we are given, and target modal representations are what we want to obtain from the memor… ▽ More In this paper, we introduce a novel audio-visual multi-modal bridging framework that can utilize both audio and visual information, even with uni-modal inputs. We exploit a memory network that stores source (i.e., visual) and target (i.e., audio) modal representations, where source modal representation is what we are given, and target modal representations are what we want to obtain from the memory network. We then construct an associative bridge between source and target memories that considers the interrelationship between the two memories. By learning the interrelationship through the associative bridge, the proposed bridging framework is able to obtain the target modal representations inside the memory network, even with the source modal input only, and it provides rich information for its downstream tasks. We apply the proposed framework to two tasks: lip reading and speech reconstruction from silent video. Through the proposed associative bridge and modality-specific memories, each task knowledge is enriched with the recalled audio context, achieving state-of-the-art performance. We also verify that the associative bridge properly relates the source and target memories. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: Published at ICCV 2021

arXiv:2111.07377 [pdf, other]

Eco-Coasting Strategies Using Road Grade Preview: Evaluation and Online Implementation Based on Mixed Integer Model Predictive Control

Authors: Yongjun Yan, Nan Li, Jinlong Hong, Bingzhao Gao, Hong Chen, Jing Sun, Ziyou Song

Abstract: Coasting has been widely used in the eco-driving guidelines to reduce fuel consumption by profiting from kinetic energy. However, the comprehensive comparison between different coasting strategies and online performance of the eco-coasting strategy using road grade preview are still unclear because of the oversimplification and the integer variable in the optimal control problems. Herein, two diff… ▽ More Coasting has been widely used in the eco-driving guidelines to reduce fuel consumption by profiting from kinetic energy. However, the comprehensive comparison between different coasting strategies and online performance of the eco-coasting strategy using road grade preview are still unclear because of the oversimplification and the integer variable in the optimal control problems. Herein, two different coasting strategies (fuel cut-off and engine start/stop) are proposed to reveal the potential benefit of eco-coasting using the road grade preview. Engine drag torque and energy cost used for engine restart are considered in the modeling to give a fair evaluation of the offline and online performance. The offline performance of these two coasting methods is evaluated through dynamic programming (DP) under various driving scenarios with different slope profiles. Offline simulation shows that the engine start/stop method outperforms the fuel cut-off method in terms of fuel consumption and travel time by getting rid of the engine drag torque. Then, online performance of these two coasting methods is evaluated using Mixed Integer Model Predictive Control (MIMPC). A novel operational constraint on the minimum off steps is added in the MIMPC formulation to avoid frequent switch of the integer variables which represent the fuel cut-off and the engine start/stop mechanism. Simulation results show that, for both fuel cut-off and engine start/stop coasting methods, the MPC controller reduces fuel consumption to a level comparable to DP without sacrificing the travel time. △ Less

Submitted 25 December, 2021; v1 submitted 14 November, 2021; originally announced November 2021.

Comments: 13 pages, 18 figures

arXiv:2110.10965 [pdf, other]

2020 CATARACTS Semantic Segmentation Challenge

Authors: Imanol Luengo, Maria Grammatikopoulou, Rahim Mohammadi, Chris Walsh, Chinedu Innocent Nwoye, Deepak Alapatt, Nicolas Padoy, Zhen-Liang Ni, Chen-Chen Fan, Gui-Bin Bian, Zeng-Guang Hou, Heonjin Ha, Jiacheng Wang, Haojie Wang, Dong Guo, Lu Wang, Guotai Wang, Mobarakol Islam, Bharat Giddwani, Ren Hongliang, Theodoros Pissas, Claudio Ravasio, Martin Huber, Jeremy Birch, Joan M. Nunez Do Rio , et al. (15 additional authors not shown)

Abstract: Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presenc… ▽ More Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set. △ Less

Submitted 24 February, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2109.08998 [pdf]

doi 10.1063/5.0056671

Data-driven yaw misalignment correction for utility-scale wind turbines

Authors: Linyue Gao, Jiarong Hong

Abstract: In recent years, wind turbine yaw misalignment that tends to degrade the turbine power production and impact the blade fatigue loads raises more attention along with the rapid development of large-scale wind turbines. The state-of-the-art correction methods require additional instruments such as LiDAR to provide the ground truths and are not suitable for long-term operation and large-scale impleme… ▽ More In recent years, wind turbine yaw misalignment that tends to degrade the turbine power production and impact the blade fatigue loads raises more attention along with the rapid development of large-scale wind turbines. The state-of-the-art correction methods require additional instruments such as LiDAR to provide the ground truths and are not suitable for long-term operation and large-scale implementation due to the high costs. In the present study, we propose a framework that enables the effective and efficient detection and correction of static and dynamic yaw errors by using only turbine SCADA data, suitable for a low-cost regular inspection for large-scale wind farms in onshore, coastal, and offshore sites. This framework includes a short-period data collection of the turbine operating under multiple static yaw errors, a data mining correction for the static yaw error, and ultra-short-term dynamic yaw error forecasts with machine learning algorithms. Three regression algorithms, i.e., linear, support vector machine, and random forest, and a hybrid model based on the average prediction of the three, have been tested for dynamic yaw error prediction and compared using the field measurement data from a 2.5 MW turbine. For the data collected in the present study, the hybrid method shows the best performance and can reduce total yaw error by up to 85% (on average of 71%) compared to the cases without static and dynamic yaw error corrections. In addition, we have tested the transferability of the proposed method in the application of detecting other static and dynamic yaw errors. △ Less

Submitted 18 September, 2021; originally announced September 2021.

Comments: 24 pages, 9 figures

arXiv:2109.05664 [pdf]

Unsupervised domain adaptation for cross-modality liver segmentation via joint adversarial learning and self-learning

Authors: Jin Hong, Simon Chun-Ho Yu, Weitian Chen

Abstract: Liver segmentation on images acquired using computed tomography (CT) and magnetic resonance imaging (MRI) plays an important role in clinical management of liver diseases. Compared to MRI, CT images of liver are more abundant and readily available. However, MRI can provide richer quantitative information of the liver compared to CT. Thus, it is desirable to achieve unsupervised domain adaptation f… ▽ More Liver segmentation on images acquired using computed tomography (CT) and magnetic resonance imaging (MRI) plays an important role in clinical management of liver diseases. Compared to MRI, CT images of liver are more abundant and readily available. However, MRI can provide richer quantitative information of the liver compared to CT. Thus, it is desirable to achieve unsupervised domain adaptation for transferring the learned knowledge from the source domain containing labeled CT images to the target domain containing unlabeled MR images. In this work, we report a novel unsupervised domain adaptation framework for cross-modality liver segmentation via joint adversarial learning and self-learning. We propose joint semantic-aware and shape-entropy-aware adversarial learning with post-situ identification manner to implicitly align the distribution of task-related features extracted from the target domain with those from the source domain. In proposed framework, a network is trained with the above two adversarial losses in an unsupervised manner, and then a mean completer of pseudo-label generation is employed to produce pseudo-labels to train the next network (desired model). Additionally, semantic-aware adversarial learning and two self-learning methods, including pixel-adaptive mask refinement and student-to-partner learning, are proposed to train the desired model. To improve the robustness of the desired model, a low-signal augmentation function is proposed to transform MRI images as the input of the desired model to handle hard samples. Using the public data sets, our experiments demonstrated the proposed unsupervised domain adaptation framework reached four supervised learning methods with a Dice score 0.912 plus or minus 0.037 (mean plus or minus standard deviation). △ Less

Submitted 24 February, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

arXiv:2108.11364 [pdf, other]

Blind Image Decomposition

Authors: Junlin Han, Weihao Li, Pengfei Fang, Chunyi Sun, Jie Hong, Mohammad Ali Armin, Lars Petersson, Hongdong Li

Abstract: We propose and study a novel task named Blind Image Decomposition (BID), which requires separating a superimposed image into constituent underlying images in a blind setting, that is, both the source components involved in mixing as well as the mixing mechanism are unknown. For example, rain may consist of multiple components, such as rain streaks, raindrops, snow, and haze. Rainy images can be tr… ▽ More We propose and study a novel task named Blind Image Decomposition (BID), which requires separating a superimposed image into constituent underlying images in a blind setting, that is, both the source components involved in mixing as well as the mixing mechanism are unknown. For example, rain may consist of multiple components, such as rain streaks, raindrops, snow, and haze. Rainy images can be treated as an arbitrary combination of these components, some of them or all of them. How to decompose superimposed images, like rainy images, into distinct source components is a crucial step toward real-world vision systems. To facilitate research on this new task, we construct multiple benchmark datasets, including mixed image decomposition across multiple domains, real-scenario deraining, and joint shadow/reflection/watermark removal. Moreover, we propose a simple yet general Blind Image Decomposition Network (BIDeN) to serve as a strong baseline for future work. Experimental results demonstrate the tenability of our benchmarks and the effectiveness of BIDeN. △ Less

Submitted 18 July, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

Comments: ECCV 2022. Project page: https://junlinhan.github.io/projects/BID.html. Code: https://github.com/JunlinHan/BID

arXiv:2009.00863 [pdf]

doi 10.1049/rpg2.12195

Power Management of Nanogrid Cluster with P2P Electricity Trading Based on Future Trends of Load Demand and PV Power Production

Authors: Sangkeum Lee, Hojun Jin, Luiz Felipe Vecchietti, Junhee Hong, Ki-Bum Park, Dongsoo Har

Abstract: This paper presents the power management of the nanogrid clusters assisted by a novel peer-to-peer(P2P) electricity trading. In our work, unbalance of power consumption among clusters is mitigated by the proposed P2P trading method. For power management of individual clusters, multi-objective optimization simultaneously minimizing total power consumption, portion of grid power consumption, and tot… ▽ More This paper presents the power management of the nanogrid clusters assisted by a novel peer-to-peer(P2P) electricity trading. In our work, unbalance of power consumption among clusters is mitigated by the proposed P2P trading method. For power management of individual clusters, multi-objective optimization simultaneously minimizing total power consumption, portion of grid power consumption, and total delay incurred by scheduling is attempted. A renewable power source photovoltaic(PV) system is adopted for each cluster as a secondary source. The temporal surplus of self-supply PV power of a cluster can be sold through P2P trading to another cluster (s) experiencing temporal power shortage. The cluster in temporal shortage of electric power buys the PV power to reduce peak load and total delay. In P2P trading, a cooperative game model is used for buyers and sellers to maximize their welfare. To increase P2P trading efficiency, future trends of load demand and PV power production are considered for power management of each cluster to resolve instantaneous unbalance between load demand and PV power production. To this end, a gated recurrent unit network is used to forecast future load demand and future PV power production. Simulations verify the effectiveness of the proposed P2P trading for nanogrid clusters. △ Less

Submitted 2 December, 2020; v1 submitted 2 September, 2020; originally announced September 2020.

Comments: This article is submitted for publication in Sustainable Cities and Society

arXiv:2003.14373 [pdf]

doi 10.1088/1361-6501/abae90

Machine learning shadowgraph for particle size and shape characterization

Authors: Jiaqi Li, Siyao Shao, Jiarong Hong

Abstract: Conventional image processing for particle shadow image is usually time-consuming and suffers degraded image segmentation when dealing with the images consisting of complex-shaped and clustered particles with varying backgrounds. In this paper, we introduce a robust learning-based method using a single convolution neural network (CNN) for analyzing particle shadow images. Our approach employs a tw… ▽ More Conventional image processing for particle shadow image is usually time-consuming and suffers degraded image segmentation when dealing with the images consisting of complex-shaped and clustered particles with varying backgrounds. In this paper, we introduce a robust learning-based method using a single convolution neural network (CNN) for analyzing particle shadow images. Our approach employs a two-channel-output U-net model to generate a binary particle image and a particle centroid image. The binary particle image is subsequently segmented through marker-controlled watershed approach with particle centroid image as the marker image. The assessment of this method on both synthetic and experimental bubble images has shown better performance compared to the state-of-art non-machine-learning method. The proposed machine learning shadow image processing approach provides a promising tool for real-time particle image analysis. △ Less

Submitted 31 March, 2020; originally announced March 2020.

Comments: 11 pages, 6 figures

arXiv:2003.03053 [pdf, ps, other]

Experimental Demonstration of Location-aware Beam Alignment

Authors: Junyeol Hong, Hyeonjin Chung, Sunwoo Kim

Abstract: The main focus of beam alignment is to find the optimal beam which yields the largest received signal strength (RSS) with faster speed.In this paper, we demonstrate an efficient beam alignment scheme with our testbed. The algorithm we experiment uses the location information for the computation efficient beam alignment.The testbed transmits and receives the 13.8 GHz signal and steers a beam on bot… ▽ More The main focus of beam alignment is to find the optimal beam which yields the largest received signal strength (RSS) with faster speed.In this paper, we demonstrate an efficient beam alignment scheme with our testbed. The algorithm we experiment uses the location information for the computation efficient beam alignment.The testbed transmits and receives the 13.8 GHz signal and steers a beam on both transmitter and receiver with various radio frequency (RF) components. The location information is estimated with the indoor positioning module. The experiment shows that the location-aware algorithm significantly reduces the time consumption for beam alignment than the exhaustive search. △ Less

Submitted 6 March, 2020; originally announced March 2020.

Comments: 4 pages, 6 figures

arXiv:1912.13036 [pdf]

Machine learning holography for measuring 3D particle size distribution

Authors: Siyao Shao, Kevin Mallery, Jiarong Hong

Abstract: Particle size measurement based on digital holography with conventional algorithms are usually time-consuming and susceptible to noises associated with hologram quality and particle complexity, limiting its usage in a broad range of engineering applications and fundamental research. We propose a learning-based hologram processing method to cope with the aforementioned issues. The proposed approach… ▽ More Particle size measurement based on digital holography with conventional algorithms are usually time-consuming and susceptible to noises associated with hologram quality and particle complexity, limiting its usage in a broad range of engineering applications and fundamental research. We propose a learning-based hologram processing method to cope with the aforementioned issues. The proposed approach uses a modified U-net architecture with three input channels and two output channels, and specially-designed loss functions. The proposed method has been assessed using synthetic, manually-labeled experimental, and water tunnel bubbly flow data containing particles of different shapes. The results demonstrate that our approach can achieve better performance in comparison to the state-of-the-art non-machine-learning methods in terms of particle extraction rate and positioning accuracy with significantly improved processing speed. Our learning-based approach can be extended to other types of image-based particle size measurements. △ Less

Submitted 30 December, 2019; originally announced December 2019.

Comments: 14 pages, 6 figures

arXiv:1912.06258 [pdf, other]

Mcity Data Collection for Automated Vehicles Study

Authors: Yiqun Dong, Yuanxin Zhong, Wenbo Yu, Minghan Zhu, Pingping Lu, Yeyang Fang, Jiajun Hong, Huei Peng

Abstract: The main goal of this paper is to introduce the data collection effort at Mcity targeting automated vehicle development. We captured a comprehensive set of data from a set of perception sensors (Lidars, Radars, Cameras) as well as vehicle steering/brake/throttle inputs and an RTK unit. Two in-cabin cameras record the human driver's behaviors for possible future use. The naturalistic driving on sel… ▽ More The main goal of this paper is to introduce the data collection effort at Mcity targeting automated vehicle development. We captured a comprehensive set of data from a set of perception sensors (Lidars, Radars, Cameras) as well as vehicle steering/brake/throttle inputs and an RTK unit. Two in-cabin cameras record the human driver's behaviors for possible future use. The naturalistic driving on selected open roads is recorded at different time of day and weather conditions. We also perform designed choreography data collection inside the Mcity test facility focusing on vehicle to vehicle, and vehicle to vulnerable road user interactions which is quite unique among existing open-source datasets. The vehicle platform, data content, tags/labels, and selected analysis results are shown in this paper. △ Less

Submitted 12 December, 2019; originally announced December 2019.

arXiv:1911.08656 [pdf, other]

doi 10.1109/ICCVW.2019.00448

W-Net: Two-stage U-Net with misaligned data for raw-to-RGB mapping

Authors: Kwang-Hyun Uhm, Seung-Wook Kim, Seo-Won Ji, Sung-Jin Cho, Jun-Pyo Hong, Sung-Jea Ko

Abstract: Recent research on learning a mapping between raw Bayer images and RGB images has progressed with the development of deep convolutional neural networks. A challenging data set namely the Zurich Raw-to-RGB data set (ZRR) has been released in the AIM 2019 raw-to-RGB mapping challenge. In ZRR, input raw and target RGB images are captured by two different cameras and thus not perfectly aligned. Moreov… ▽ More Recent research on learning a mapping between raw Bayer images and RGB images has progressed with the development of deep convolutional neural networks. A challenging data set namely the Zurich Raw-to-RGB data set (ZRR) has been released in the AIM 2019 raw-to-RGB mapping challenge. In ZRR, input raw and target RGB images are captured by two different cameras and thus not perfectly aligned. Moreover, camera metadata such as white balance gains and color correction matrix are not provided, which makes the challenge more difficult. In this paper, we explore an effective network structure and a loss function to address these issues. We exploit a two-stage U-Net architecture and also introduce a loss function that is less variant to alignment and more sensitive to color differences. In addition, we show an ensemble of networks trained with different loss functions can bring a significant performance gain. We demonstrate the superiority of our method by achieving the highest score in terms of both the peak signal-to-noise ratio and the structural similarity and obtaining the second-best mean-opinion-score in the challenge. △ Less

Submitted 21 November, 2019; v1 submitted 19 November, 2019; originally announced November 2019.

Comments: Accepted by ICCVW 2019

arXiv:1911.00805 [pdf]

doi 10.1364/OE.379480

Machine Learning Holography for 3D Particle Field Imaging

Authors: Siyao Shao, Kevin Mallery, Santosh Kumar, Jiarong Hong

Abstract: We propose a new learning-based approach for 3D particle field imaging using holography. Our approach uses a U-net architecture incorporating residual connections, Swish activation, hologram preprocessing, and transfer learning to cope with challenges arising in particle holograms where accurate measurement of individual particles is crucial. Assessments on both synthetic and experimental hologram… ▽ More We propose a new learning-based approach for 3D particle field imaging using holography. Our approach uses a U-net architecture incorporating residual connections, Swish activation, hologram preprocessing, and transfer learning to cope with challenges arising in particle holograms where accurate measurement of individual particles is crucial. Assessments on both synthetic and experimental holograms demonstrate a significant improvement in particle extraction rate, localization accuracy and speed compared to prior methods over a wide range of particle concentrations, including highly-dense concentrations where other methods are unsuitable. Our approach can be potentially extended to other types of computational imaging tasks with similar features. △ Less

Submitted 2 November, 2019; originally announced November 2019.

Comments: 12 pages, 7 figures

arXiv:1910.04681 [pdf]

Laser scanning reflection-matrix microscopy for label-free in vivo imaging of a mouse brain through an intact skull

Authors: Seokchan Yoon, Hojun Lee, Jin Hee Hong, Yong-Sik Lim, Wonshik Choi

Abstract: We present a laser scanning reflection-matrix microscopy combining the scanning of laser focus and the wide-field mapping of the electric field of the backscattered waves for eliminating higher-order aberrations even in the presence of strong multiple light scattering noise. Unlike conventional confocal laser scanning microscopy, we record the amplitude and phase maps of reflected waves from the s… ▽ More We present a laser scanning reflection-matrix microscopy combining the scanning of laser focus and the wide-field mapping of the electric field of the backscattered waves for eliminating higher-order aberrations even in the presence of strong multiple light scattering noise. Unlike conventional confocal laser scanning microscopy, we record the amplitude and phase maps of reflected waves from the sample not only at the confocal pinhole, but also at other non-confocal points. These additional measurements lead us to constructing a time-resolved reflection matrix, with which the sample-induced aberrations for the illumination and detection pathways are separately identified and corrected. We realized in vivo reflectance imaging of myelinated axons through an intact skull of a living mouse with the spatial resolution close to the ideal diffraction limit. Furthermore, we demonstrated near-diffraction-limited multiphoton imaging through an intact skull by physically correcting the aberrations identified from the reflection matrix. The proposed method is expected to extend the range of applications, where the knowledge of the detailed microscopic information deep within biological tissues is critical. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: 14 pages, 4 figures

arXiv:1904.04884 [pdf, ps, other]

doi 10.1364/OE.27.018069

Regularized Inverse Holographic Volume Reconstruction for 3D Particle Tracking

Authors: Kevin Mallery, Jiarong Hong

Abstract: The key limitations of digital inline holography (DIH) for particle tracking applications are poor longitudinal resolution, particle concentration limits, and case-specific processing. We utilize an inverse problem method with fused lasso regularization to perform full volumetric reconstructions of particle fields. By exploiting data sparsity in the solution and utilizing GPU processing, we dramat… ▽ More The key limitations of digital inline holography (DIH) for particle tracking applications are poor longitudinal resolution, particle concentration limits, and case-specific processing. We utilize an inverse problem method with fused lasso regularization to perform full volumetric reconstructions of particle fields. By exploiting data sparsity in the solution and utilizing GPU processing, we dramatically reduce the computational cost usually associated with inverse reconstruction approaches. We demonstrate the accuracy of the proposed method using synthetic and experimental holograms. Finally, we present two practical applications (high concentration microorganism swimming and microfiber rotation) to extend the capabilities of DIH beyond what was possible using prior methods. △ Less

Submitted 9 April, 2019; originally announced April 2019.

Comments: 15 pages, 6 figures

arXiv:1903.09495 [pdf, other]

Substation One-Line Diagram Automatic Generation and Visualization

Authors: Jing Hong, Yue Li, Yiran Xu, Chen Yuan, Hong Fan, Guangyi Liu, Renchang Dai

Abstract: In Energy Management System (EMS) applications and many other off-line planning and study tools, one-line diagram (OLND) of the whole system and stations is a straightforward view for planners and operators to design, monitor, analyze, and control the power system. Large-scale power system OLND is usually manually developed and maintained. The work is tedious, time-consuming and ease to make mista… ▽ More In Energy Management System (EMS) applications and many other off-line planning and study tools, one-line diagram (OLND) of the whole system and stations is a straightforward view for planners and operators to design, monitor, analyze, and control the power system. Large-scale power system OLND is usually manually developed and maintained. The work is tedious, time-consuming and ease to make mistake. Meanwhile, the manually created diagrams are hard to be shared among the on-line and off-line systems. To save the time and efforts to draw and maintain OLNDs, and provide the capability to share the OLNDs, a tool to automatically develop substation based upon Common Information Model (CIM) standard is needed. Currently, there is no standard rule to draw the substation OLND. Besides, the substation layouts can be altered from the typical formats in textbooks based on factors of economy, efficiency, engineering practice, etc. This paper presents a tool on substation OLND automatic generation and visualization. This tool takes the substation CIM/E model as input, then automatically computes the coordinates of all components and generates the substation OLND based on its components attributes and connectivity relations. Evaluation of the proposed approach is presented using a real provincial power system. Over 95\% of substation OLNDs are decently presented and the rest are corner cases, needing extra effort to do specific reconfiguration. △ Less

Submitted 20 March, 2019; originally announced March 2019.

Comments: 6 pages, 6 figures, 1 table, accepted by 2019 IEEE PES ISGT ASIA

arXiv:1805.00367 [pdf, other]

A Multi-State Diagnosis and Prognosis Framework with Feature Learning for Tool Condition Monitoring

Authors: Chong Zhang, Geok Soon Hong, Jun-Hong Zhou, Kay Chen Tan, Haizhou Li, Huan Xu, Jihoon Hong, Hian-Leng Chan

Abstract: In this paper, a multi-state diagnosis and prognosis (MDP) framework is proposed for tool condition monitoring via a deep belief network based multi-state approach (DBNMS). For fault diagnosis, a cost-sensitive deep belief network (namely ECS-DBN) is applied to deal with the imbalanced data problem for tool state estimation. An appropriate prognostic degradation model is then applied for tool wear… ▽ More In this paper, a multi-state diagnosis and prognosis (MDP) framework is proposed for tool condition monitoring via a deep belief network based multi-state approach (DBNMS). For fault diagnosis, a cost-sensitive deep belief network (namely ECS-DBN) is applied to deal with the imbalanced data problem for tool state estimation. An appropriate prognostic degradation model is then applied for tool wear estimation based on the different tool states. The proposed framework has the advantage of automatic feature representation learning and shows better performance in accuracy and robustness. The effectiveness of the proposed DBNMS is validated using a real-world dataset obtained from the gun drilling process. This dataset contains a large amount of measured signals involving different tool geometries under various operating conditions. The DBNMS is examined for both the tool state estimation and tool wear estimation tasks. In the experimental studies, the prediction results are evaluated and compared with popular machine learning approaches, which show the superior performance of the proposed DBNMS approach. △ Less

Submitted 30 April, 2018; originally announced May 2018.

Comments: 14 pages, 12 figures, 10 tables, submitted to IEEE Transactions on Cybernetics

Showing 1–49 of 49 results for author: Hong, J