subscribe to arXiv mailings

Using Photoplethysmography to Detect Real-time Blood Pressure Changes with a Calibration-free Deep Learning Model

Authors: Jingyuan Hong, Manasi Nandi, Weiwei Jin, Jordi Alastruey

Abstract: Blood pressure (BP) changes are linked to individual health status in both clinical and non-clinical settings. This study developed a deep learning model to classify systolic (SBP), diastolic (DBP), and mean (MBP) BP changes using photoplethysmography (PPG) waveforms. Data from the Vital Signs Database (VitalDB) comprising 1,005 ICU patients with synchronized PPG and BP recordings was used. BP cha… ▽ More Blood pressure (BP) changes are linked to individual health status in both clinical and non-clinical settings. This study developed a deep learning model to classify systolic (SBP), diastolic (DBP), and mean (MBP) BP changes using photoplethysmography (PPG) waveforms. Data from the Vital Signs Database (VitalDB) comprising 1,005 ICU patients with synchronized PPG and BP recordings was used. BP changes were categorized into three labels: Spike (increase above a threshold), Stable (change within a plus or minus threshold), and Dip (decrease below a threshold). Four time-series classification models were studied: multi-layer perceptron, convolutional neural network, residual network, and Encoder. A subset of 500 patients was randomly selected for training and validation, ensuring a uniform distribution across BP change labels. Two test datasets were compiled: Test-I (n=500) with a uniform distribution selection process, and Test-II (n=5) without. The study also explored the impact of including second-deviation PPG (sdPPG) waveforms as additional input information. The Encoder model with a Softmax weighting process using both PPG and sdPPG waveforms achieved the highest detection accuracy--exceeding 71.3% and 85.4% in Test-I and Test-II, respectively, with thresholds of 30 mmHg for SBP, 15 mmHg for DBP, and 20 mmHg for MBP. Corresponding F1-scores were over 71.8% and 88.5%. These findings confirm that PPG waveforms are effective for real-time monitoring of BP changes in ICU settings and suggest potential for broader applications. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 8 pages, 5 figures, 7 tables, 1 supplementary material

arXiv:2405.09470 [pdf, other]

Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

Authors: Weifei Jin, Yuxin Cao, Junjie Su, Qi Shen, Kai Ye, Derui Wang, Jie Hao, Ziyao Liu

Abstract: In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of… ▽ More In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted to SecTL (AsiaCCS Workshop) 2024

arXiv:2402.16003 [pdf, other]

Exploring the Power of Pure Attention Mechanisms in Blind Room Parameter Estimation

Authors: Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao, Wenyu Jin

Abstract: Dynamic parameterization of acoustic environments has drawn widespread attention in the field of audio processing. Precise representation of local room acoustic characteristics is crucial when designing audio filters for various audio rendering applications. Key parameters in this context include reverberation time (RT60) and geometric room volume. In recent years, neural networks have been extens… ▽ More Dynamic parameterization of acoustic environments has drawn widespread attention in the field of audio processing. Precise representation of local room acoustic characteristics is crucial when designing audio filters for various audio rendering applications. Key parameters in this context include reverberation time (RT60) and geometric room volume. In recent years, neural networks have been extensively applied in the task of blind room parameter estimation. However, there remains a question of whether pure attention mechanisms can achieve superior performance in this task. To address this issue, this study employs blind room parameter estimation based on monaural noisy speech signals. Various model architectures are investigated, including a proposed attention-based model. This model is a convolution-free Audio Spectrogram Transformer, utilizing patch splitting, attention mechanisms, and cross-modality transfer learning from a pretrained Vision Transformer. Experimental results suggest that the proposed attention mechanism-based model, relying purely on attention mechanisms without using convolution, exhibits significantly improved performance across various room parameter estimation tasks, especially with the help of dedicated pretraining and data augmentation schemes. Additionally, the model demonstrates more advantageous adaptability and robustness when handling variable-length audio inputs compared to existing methods. △ Less

Submitted 25 April, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

Comments: 28 pages, 9 figures, accepted for publishing to EURASIP Journal On Audio Speech And Music Processing

arXiv:2311.15313 [pdf, ps, other]

Low-Complexity Joint Beamforming for RIS-Assisted MU-MISO Systems Based on Model-Driven Deep Learning

Authors: Weijie Jin, Jing Zhang, Chao-Kai Wen, Shi Jin, Xiao Li, Shuangfeng Han

Abstract: Reconfigurable intelligent surfaces (RIS) can improve signal propagation environments by adjusting the phase of the incident signal. However, optimizing the phase shifts jointly with the beamforming vector at the access point is challenging due to the non-convex objective function and constraints. In this study, we propose an algorithm based on weighted minimum mean square error optimization and p… ▽ More Reconfigurable intelligent surfaces (RIS) can improve signal propagation environments by adjusting the phase of the incident signal. However, optimizing the phase shifts jointly with the beamforming vector at the access point is challenging due to the non-convex objective function and constraints. In this study, we propose an algorithm based on weighted minimum mean square error optimization and power iteration to maximize the weighted sum rate (WSR) of a RIS-assisted downlink multi-user multiple-input single-output system. To further improve performance, a model-driven deep learning (DL) approach is designed, where trainable variables and graph neural networks are introduced to accelerate the convergence of the proposed algorithm. We also extend the proposed method to include beamforming with imperfect channel state information and derive a two-timescale stochastic optimization algorithm. Simulation results show that the proposed algorithm outperforms state-of-the-art algorithms in terms of complexity and WSR. Specifically, the model-driven DL approach has a runtime that is approximately 3% of the state-of-the-art algorithm to achieve the same performance. Additionally, the proposed algorithm with 2-bit phase shifters outperforms the compared algorithm with continuous phase shift. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 14 pages, 9 figures, 2 tables. This paper has been accepted for publication by the IEEE Transactions on Wireless Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2309.13504 [pdf, other]

Attention Is All You Need For Blind Room Volume Estimation

Authors: Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao, Wenyu Jin

Abstract: In recent years, dynamic parameterization of acoustic environments has raised increasing attention in the field of audio processing. One of the key parameters that characterize the local room acoustics in isolation from orientation and directivity of sources and receivers is the geometric room volume. Convolutional neural networks (CNNs) have been widely selected as the main models for conducting… ▽ More In recent years, dynamic parameterization of acoustic environments has raised increasing attention in the field of audio processing. One of the key parameters that characterize the local room acoustics in isolation from orientation and directivity of sources and receivers is the geometric room volume. Convolutional neural networks (CNNs) have been widely selected as the main models for conducting blind room acoustic parameter estimation, which aims to learn a direct mapping from audio spectrograms to corresponding labels. With the recent trend of self-attention mechanisms, this paper introduces a purely attention-based model to blindly estimate room volumes based on single-channel noisy speech signals. We demonstrate the feasibility of eliminating the reliance on CNN for this task and the proposed Transformer architecture takes Gammatone magnitude spectral coefficients and phase spectrograms as inputs. To enhance the model performance given the task-specific dataset, cross-modality transfer learning is also applied. Experimental results demonstrate that the proposed model outperforms traditional CNN models across a wide range of real-world acoustics spaces, especially with the help of the dedicated pretraining and data augmentation schemes. △ Less

Submitted 27 December, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: 5 pages, 4 figures, to be published in proceedings of ICASSP 2024

arXiv:2309.01990 [pdf, other]

Dynamic distance-based pricing scheme for high-occupancy-toll lanes along a freeway corridor

Authors: Irene Martínez, Wen-Long Jin

Abstract: Single-occupancy vehicles (SOVs) are charged to use the highoccupancy-toll (HOT) lanes, while high-occupancy-vehicles (HOVs) can drive in them at no cost. The pricing scheme for HOT lanes has been extensively studied at local bottlenecks or at the network level through computationally expensive simulations. However, the HOT lane pricing study on a freeway corridor with multiple origins and destina… ▽ More Single-occupancy vehicles (SOVs) are charged to use the highoccupancy-toll (HOT) lanes, while high-occupancy-vehicles (HOVs) can drive in them at no cost. The pricing scheme for HOT lanes has been extensively studied at local bottlenecks or at the network level through computationally expensive simulations. However, the HOT lane pricing study on a freeway corridor with multiple origins and destinations as well as multiple interacting bottlenecks is a challenging problem for which no analytical results are available. In this paper, we attempt to fill the gap by proposing to study the traffic dynamics in the corridor based on the relative space paradigm. In this new paradigm, the interaction of multiple bottlenecks and trips can be captured with Vickrey's bathtub model by a simple ordinary differential equation. We consider three types of lane choice behavior and analyze their properties. Then, we propose a distance-based dynamic pricing scheme based on a linear combination of I-controllers. This closed-loop controller is independent of the model and feeds back the travel time difference between HOT lanes and general-purpose lanes. Given the mathematical tractability of the system model, we analytically study the performance of the proposed closed-loop control under constant demand and show the existence and stability of the optimal equilibrium. Finally, we verify the results with numerical simulations considering a typical peak period demand pattern. In the future, we are interested in extending this work and testing the performance of the proposed linear combination of I-controllers for other traffic flow models. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2309.01970 [pdf, other]

Priority Queue Formulation of Agent-Based Bathtub Model for Network Trip Flows in the Relative Space

Authors: Irene Martinez, Wen-long Jin

Abstract: Agent-based models have been extensively used to simulate the behavior of travelers in transportation systems because they allow for realistic and versatile modeling of interactions. However, traditional agent-based models suffer from high computational costs and rely on tracking physical locations, raising privacy concerns. This paper proposes an efficient formulation for the agent-based bathtub… ▽ More Agent-based models have been extensively used to simulate the behavior of travelers in transportation systems because they allow for realistic and versatile modeling of interactions. However, traditional agent-based models suffer from high computational costs and rely on tracking physical locations, raising privacy concerns. This paper proposes an efficient formulation for the agent-based bathtub model (AB2M) in the relative space, where each agent's trajectory is represented by a time series of the remaining distance to its destination. The AB2M can be understood as a microscopic model that tracks individual trips' initiation, progression, and completion and is an exact numerical solution of the bathtub model for generic (time-dependent) trip distance distributions. The model can be solved for a deterministic set of trips with a given demand pattern (defined by the start time of each trip and its distance), or it can be used to run Monte Carlo simulations to capture the average behavior and variation stochastic demand patterns, described by probabilistic distributions of trip distances and departure times. To enhance the computational efficiency, we introduce a priority queue formulation, eliminating the need to update trip positions at each time step and allowing us to run large-scale scenarios with millions of individual trips in seconds. We systematically explore the scaling properties and discuss the introduction of biases and numerical errors. The systematic exploration of scaling properties of the modeling of individual agents in the relative space with the AB2M further enhances its applicability to large-scale transportation systems and opens up opportunities for studying travel time reliability, scheduling, and mode choices. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2306.12562 [pdf, other]

Neural Spectro-polarimetric Fields

Authors: Youngchan Kim, Wonjoon Jin, Sunghyun Cho, Seung-Hwan Baek

Abstract: Modeling the spatial radiance distribution of light rays in a scene has been extensively explored for applications, including view synthesis. Spectrum and polarization, the wave properties of light, are often neglected due to their integration into three RGB spectral bands and their non-perceptibility to human vision. However, these properties are known to encompass substantial material and geomet… ▽ More Modeling the spatial radiance distribution of light rays in a scene has been extensively explored for applications, including view synthesis. Spectrum and polarization, the wave properties of light, are often neglected due to their integration into three RGB spectral bands and their non-perceptibility to human vision. However, these properties are known to encompass substantial material and geometric information about a scene. Here, we propose to model spectro-polarimetric fields, the spatial Stokes-vector distribution of any light ray at an arbitrary wavelength. We present Neural Spectro-polarimetric Fields (NeSpoF), a neural representation that models the physically-valid Stokes vector at given continuous variables of position, direction, and wavelength. NeSpoF manages inherently noisy raw measurements, showcases memory efficiency, and preserves physically vital signals - factors that are crucial for representing the high-dimensional signal of a spectro-polarimetric field. To validate NeSpoF, we introduce the first multi-view hyperspectral-polarimetric image dataset, comprised of both synthetic and real-world scenes. These were captured using our compact hyperspectral-polarimetric imaging system, which has been calibrated for robustness against system imperfections. We demonstrate the capabilities of NeSpoF on diverse scenes. △ Less

Submitted 10 December, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

arXiv:2303.17614 [pdf, other]

Estimating Continuous Muscle Fatigue For Multi-Muscle Coordinated Exercise: A Pilot Study

Authors: Chunzhi Yi, Baichun Wei, Wei Jin, Jianfei Zhu, Seungmin Rho, Zhiyuan Chen, Feng Jiang

Abstract: Assessing the progression of muscle fatigue for daily exercises provides vital indicators for precise rehabilitation, personalized training dose, especially under the context of Metaverse. Assessing fatigue of multi-muscle coordination-involved daily exercises requires the neuromuscular features that represent the fatigue-induced characteristics of spatiotemporal adaptions of multiple muscles and… ▽ More Assessing the progression of muscle fatigue for daily exercises provides vital indicators for precise rehabilitation, personalized training dose, especially under the context of Metaverse. Assessing fatigue of multi-muscle coordination-involved daily exercises requires the neuromuscular features that represent the fatigue-induced characteristics of spatiotemporal adaptions of multiple muscles and the estimator that captures the time-evolving progression of fatigue. In this paper, we propose to depict fatigue by the features of muscle compensation and spinal module activation changes and estimate continuous fatigue by a physiological rationale model. First, we extract muscle synergy fractionation and the variance of spinal module spikings as features inspired by the prior of fatigue-induced neuromuscular adaptations. Second, we treat the features as observations and develop a Bayesian Gaussian process to capture the time-evolving progression. Third, we solve the issue of lacking supervision information by mathematically formulating the time-evolving characteristics of fatigue as the loss function. Finally, we adapt the metrics that follow the physiological principles of fatigue to quantitatively evaluate the performance. Our extensive experiments present a 0.99 similarity between days, a over 0.7 similarity with other views of fatigue and a nearly 1 weak monotonicity, which outperform other methods. This study would aim the objective assessment of muscle fatigue. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: submitted to IEEE JBHI

arXiv:2303.13032 [pdf]

V2V-based Collision-avoidance Decision Strategy for Autonomous Vehicles Interacting with Fully Occluded Pedestrians at Midblock on Multilane Roadways

Authors: Fengjiao Zou, Hsien-Wen Deng, Tsing-Un Iunn, Jennifer Harper Ogle, Weimin Jin

Abstract: Pedestrian occlusion is challenging for autonomous vehicles (AVs) at midblock locations on multilane roadways because an AV cannot detect crossing pedestrians that are fully occluded by downstream vehicles in adjacent lanes. This paper tests the capability of vehicle-to-vehicle (V2V) communication between an AV and its downstream vehicles to share midblock pedestrian crossings information. The res… ▽ More Pedestrian occlusion is challenging for autonomous vehicles (AVs) at midblock locations on multilane roadways because an AV cannot detect crossing pedestrians that are fully occluded by downstream vehicles in adjacent lanes. This paper tests the capability of vehicle-to-vehicle (V2V) communication between an AV and its downstream vehicles to share midblock pedestrian crossings information. The researchers developed a V2V-based collision-avoidance decision strategy and compared it to a base scenario (i.e., decision strategy without the utilization of V2V). Simulation results showed that for the base scenario, the near-zero time-to-collision (TTC) indicated no time for the AV to take appropriate action and resulted in dramatic braking followed by collisions. But the V2V-based collision-avoidance decision strategy allowed for a proportional braking approach to increase the TTC allowing the pedestrian to cross safely. To conclude, the V2V-based collision-avoidance decision strategy has higher safety benefits for an AV interacting with fully occluded pedestrians at midblock locations on multilane roadways. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.07449 [pdf, other]

Blind Acoustic Room Parameter Estimation Using Phase Features

Authors: Christopher Ick, Adib Mehrabi, Wenyu Jin

Abstract: Modeling room acoustics in a field setting involves some degree of blind parameter estimation from noisy and reverberant audio. Modern approaches leverage convolutional neural networks (CNNs) in tandem with time-frequency representation. Using short-time Fourier transforms to develop these spectrogram-like features has shown promising results, but this method implicitly discards a significant amou… ▽ More Modeling room acoustics in a field setting involves some degree of blind parameter estimation from noisy and reverberant audio. Modern approaches leverage convolutional neural networks (CNNs) in tandem with time-frequency representation. Using short-time Fourier transforms to develop these spectrogram-like features has shown promising results, but this method implicitly discards a significant amount of audio information in the phase domain. Inspired by recent works in speech enhancement, we propose utilizing novel phase-related features to extend recent approaches to blindly estimate the so-called "reverberation fingerprint" parameters, namely, volume and RT60. The addition of these features is shown to outperform existing methods that rely solely on magnitude-based spectral features across a wide range of acoustics spaces. We evaluate the effectiveness of the deployment of these novel features in both single-parameter and multi-parameter estimation strategies, using a novel dataset that consists of publicly available room impulse responses (RIRs), synthesized RIRs, and in-house measurements of real acoustic spaces. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 4 pages + 1 page bibliography, 3 figures, to be published in proceedings of ICASSP 2023

arXiv:2211.16657 [pdf, other]

Task-Driven Hybrid Model Reduction for Dexterous Manipulation

Authors: Wanxin Jin, Michael Posa

Abstract: In contact-rich tasks, like dexterous manipulation, the hybrid nature of making and breaking contact creates challenges for model representation and control. For example, choosing and sequencing contact locations for in-hand manipulation, where there are thousands of potential hybrid modes, is not generally tractable. In this paper, we are inspired by the observation that far fewer modes are actua… ▽ More In contact-rich tasks, like dexterous manipulation, the hybrid nature of making and breaking contact creates challenges for model representation and control. For example, choosing and sequencing contact locations for in-hand manipulation, where there are thousands of potential hybrid modes, is not generally tractable. In this paper, we are inspired by the observation that far fewer modes are actually necessary to accomplish many tasks. Building on our prior work learning hybrid models, represented as linear complementarity systems, we find a reduced-order hybrid model requiring only a limited number of task-relevant modes. This simplified representation, in combination with model predictive control, enables real-time control yet is sufficient for achieving high performance. We demonstrate the proposed method first on synthetic hybrid systems, reducing the mode count by multiple orders of magnitude while achieving task performance loss of less than 5%. We also apply the proposed method to a three-fingered robotic hand manipulating a previously unknown object. With no prior knowledge, we achieve state-of-the-art closed-loop performance within a few minutes of online learning, by collecting only a few thousand environment samples. △ Less

Submitted 28 February, 2024; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: Reproducing code: https://github.com/wanxinjin/Task-Driven-Hybrid-Reduction. This is a preprint. The published version can be accessed at IEEE Transactions on Robotics

arXiv:2210.09531

The Brain-Inspired Cooperative Shared Control for Brain-Machine Interface

Authors: Shengjie Zheng, Ling Liu, Junjie Yang, Lang Qian, Gang Gao, Xin Chen, Wenqi Jin, Chunshan Deng, Xiaojian Li

Abstract: In the practical application of brain-machine interface technology, the problem often faced is the low information content and high noise of the neural signals collected by the electrode and the difficulty of decoding by the decoder, which makes it difficult for the robotic to obtain stable instructions to complete the task. The idea based on the principle of cooperative shared control can be achi… ▽ More In the practical application of brain-machine interface technology, the problem often faced is the low information content and high noise of the neural signals collected by the electrode and the difficulty of decoding by the decoder, which makes it difficult for the robotic to obtain stable instructions to complete the task. The idea based on the principle of cooperative shared control can be achieved by extracting general motor commands from brain activity, while the fine details of the movement can be hosted to the robot for completion, or the brain can have complete control. This study proposes a brain-machine interface shared control system based on spiking neural networks for robotic arm movement control and wheeled robots wheel speed control and steering, respectively. The former can reliably control the robotic arm to move to the destination position, while the latter controls the wheeled robots for object tracking and map generation. The results show that the shared control based on brain-inspired intelligence can perform some typical tasks in complex environments and positively improve the fluency and ease of use of brain-machine interaction, and also demonstrate the potential of this control method in clinical applications of brain-machine interfaces. △ Less

Submitted 25 June, 2024; v1 submitted 17 October, 2022; originally announced October 2022.

Comments: This article need to update the corrected figure and data

arXiv:2209.15090 [pdf, other]

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

Authors: Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, Qi Zhu

Abstract: It is quite challenging to ensure the safety of reinforcement learning (RL) agents in an unknown and stochastic environment under hard constraints that require the system state not to reach certain specified unsafe regions. Many popular safe RL methods such as those based on the Constrained Markov Decision Process (CMDP) paradigm formulate safety violations in a cost function and try to constrain… ▽ More It is quite challenging to ensure the safety of reinforcement learning (RL) agents in an unknown and stochastic environment under hard constraints that require the system state not to reach certain specified unsafe regions. Many popular safe RL methods such as those based on the Constrained Markov Decision Process (CMDP) paradigm formulate safety violations in a cost function and try to constrain the expectation of cumulative cost under a threshold. However, it is often difficult to effectively capture and enforce hard reachability-based safety constraints indirectly with such constraints on safety violation costs. In this work, we leverage the notion of barrier function to explicitly encode the hard safety constraints, and given that the environment is unknown, relax them to our design of \emph{generative-model-based soft barrier functions}. Based on such soft barriers, we propose a safe RL approach that can jointly learn the environment and optimize the control policy, while effectively avoiding unsafe regions with safety probability optimization. Experiments on a set of examples demonstrate that our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations. △ Less

Submitted 13 June, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: Accepted to ICML 2023

arXiv:2209.12017 [pdf, other]

Cooperative Tuning of Multi-Agent Optimal Control Systems

Authors: Zehui Lu, Wanxin Jin, Shaoshuai Mou, Brian D. O. Anderson

Abstract: This paper investigates the problem of cooperative tuning of multi-agent optimal control systems, where a network of agents (i.e. multiple coupled optimal control systems) adjusts parameters in their dynamics, objective functions, or controllers in a coordinated way to minimize the sum of their loss functions. Different from classical techniques for tuning parameters in a controller, we allow tuna… ▽ More This paper investigates the problem of cooperative tuning of multi-agent optimal control systems, where a network of agents (i.e. multiple coupled optimal control systems) adjusts parameters in their dynamics, objective functions, or controllers in a coordinated way to minimize the sum of their loss functions. Different from classical techniques for tuning parameters in a controller, we allow tunable parameters appearing in both the system dynamics and the objective functions of each agent. A framework is developed to allow all agents to reach a consensus on the tunable parameter, which minimizes team loss. The key idea of the proposed algorithm rests on the integration of consensus-based distributed optimization for a multi-agent system and a gradient generator capturing the optimal performance as a function of the parameter in the feedback loop tuning the parameter for each agent. Both theoretical results and simulations for a synchronous multi-agent rendezvous problem are provided to validate the proposed method for cooperative tuning of multi-agent optimal control. △ Less

Submitted 24 September, 2022; originally announced September 2022.

arXiv:2207.08892 [pdf, other]

Distributed Differentiable Dynamic Game for Multi-robot Coordination

Authors: Xuan Wang, Yizhi Zhou, Wanxin Jin

Abstract: This paper develops a Distributed Differentiable Dynamic Game (D3G) framework, which can efficiently solve the forward and inverse problems in multi-robot coordination. We formulate multi-robot coordination as a dynamic game, where the behavior of a robot is dictated by its own dynamics and objective that also depends on others' behavior. In the forward problem, D3G enables all robots collaborativ… ▽ More This paper develops a Distributed Differentiable Dynamic Game (D3G) framework, which can efficiently solve the forward and inverse problems in multi-robot coordination. We formulate multi-robot coordination as a dynamic game, where the behavior of a robot is dictated by its own dynamics and objective that also depends on others' behavior. In the forward problem, D3G enables all robots collaboratively to seek the Nash equilibrium of the game in a distributed manner, by developing a distributed shooting-based Nash solver. In the inverse problem, where each robot aims to find (learn) its objective (and dynamics) parameters to mimic given coordination demonstrations, D3G proposes a differentiation solver based on Differential Pontryagin's Maximum Principle, which allows each robot to update its parameters in a distributed and coordinated manner. We test the D3G in simulation with two types of robots given different task configurations. The results demonstrate the effectiveness of D3G for solving both forward and inverse problems in comparison with existing methods. △ Less

Submitted 27 November, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.15356 [pdf, other]

Acoustic Room Compensation Using Local PCA-based Room Average Power Response Estimation

Authors: Wenyu Jin, Patrick McPherson, Chris Pike, Adib Mehrabi

Abstract: Acoustic room compensation techniques, which allow a sound reproduction system to counteract undesired alteration to the sound scene due to excessive room resonances, have been widely studied. Extensive efforts have been reported to enlarge the region over which room equalization is effective and to contrast variations of room transfer functions in space. A speaker-tuning technology "Trueplay" all… ▽ More Acoustic room compensation techniques, which allow a sound reproduction system to counteract undesired alteration to the sound scene due to excessive room resonances, have been widely studied. Extensive efforts have been reported to enlarge the region over which room equalization is effective and to contrast variations of room transfer functions in space. A speaker-tuning technology "Trueplay" allows users to compensate for undesired room effects over an extended listening area based on a spatially averaged power response of the room, which is conventionally measured using microphones on portable devices when users move around the room. In this work, we propose a novel system that leverages measured speaker echo path self-responses to predict the room average power responses using a local PCA based approach. Experimental results confirm the effectiveness of the proposed estimation method, which further leads to a room compensation filter design that achieves a good sound similarity compared to the reference system with the ground-truth room average power response while outperforming other systems that do not leverage the proposed estimator. △ Less

Submitted 23 July, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

Comments: 5 pages, 7 figures, to appear in IWAENC 2022

arXiv:2202.10553 [pdf, other]

doi 10.1016/j.media.2022.102684

Guidelines and Evaluation of Clinical Explainable AI in Medical Image Analysis

Authors: Weina Jin, Xiaoxiao Li, Mostafa Fatehi, Ghassan Hamarneh

Abstract: Explainable artificial intelligence (XAI) is essential for enabling clinical users to get informed decision support from AI and comply with evidence-based medical practice. Applying XAI in clinical settings requires proper evaluation criteria to ensure the explanation technique is both technically sound and clinically useful, but specific support is lacking to achieve this goal. To bridge the rese… ▽ More Explainable artificial intelligence (XAI) is essential for enabling clinical users to get informed decision support from AI and comply with evidence-based medical practice. Applying XAI in clinical settings requires proper evaluation criteria to ensure the explanation technique is both technically sound and clinically useful, but specific support is lacking to achieve this goal. To bridge the research gap, we propose the Clinical XAI Guidelines that consist of five criteria a clinical XAI needs to be optimized for. The guidelines recommend choosing an explanation form based on Guideline 1 (G1) Understandability and G2 Clinical relevance. For the chosen explanation form, its specific XAI technique should be optimized for G3 Truthfulness, G4 Informative plausibility, and G5 Computational efficiency. Following the guidelines, we conducted a systematic evaluation on a novel problem of multi-modal medical image explanation with two clinical tasks, and proposed new evaluation metrics accordingly. Sixteen commonly-used heatmap XAI techniques were evaluated and found to be insufficient for clinical use due to their failure in G3 and G4. Our evaluation demonstrated the use of Clinical XAI Guidelines to support the design and evaluation of clinically viable XAI. △ Less

Submitted 8 December, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Code: http://github.com/weinajin/multimodal_explanation, Supplementary Material S1 and S2: https://github.com/weinajin/multimodal_explanation/tree/main/paper

MSC Class: 92C55; 92C50; 68T45; 68T01

Journal ref: Medical Image Analysis, 2022

arXiv:2112.13284 [pdf, other]

Learning Linear Complementarity Systems

Authors: Wanxin Jin, Alp Aydinoglu, Mathew Halm, Michael Posa

Abstract: This paper investigates the learning, or system identification, of a class of piecewise-affine dynamical systems known as linear complementarity systems (LCSs). We propose a violation-based loss which enables efficient learning of the LCS parameterization, without prior knowledge of the hybrid mode boundaries, using gradient-based methods. The proposed violation-based loss incorporates both dynami… ▽ More This paper investigates the learning, or system identification, of a class of piecewise-affine dynamical systems known as linear complementarity systems (LCSs). We propose a violation-based loss which enables efficient learning of the LCS parameterization, without prior knowledge of the hybrid mode boundaries, using gradient-based methods. The proposed violation-based loss incorporates both dynamics prediction loss and a novel complementarity - violation loss. We show several properties attained by this loss formulation, including its differentiability, the efficient computation of first- and second-order derivatives, and its relationship to the traditional prediction loss, which strictly enforces complementarity. We apply this violation-based loss formulation to learn LCSs with tens of thousands of (potentially stiff) hybrid modes. The results demonstrate a state-of-the-art ability to identify piecewise-affine dynamics, outperforming methods which must differentiate through non-smooth linear complementarity problems. △ Less

Submitted 25 December, 2021; originally announced December 2021.

Comments: 10 pages

arXiv:2111.12869 [pdf, other]

Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation

Authors: Wangkai Jin, Junyu Liu, Jianfeng Ren, Xiangjun Peng

Abstract: The challenges of polyphonic sound event detection (PSED) stem from the detection of multiple overlapping events in a time series. Recent efforts exploit Deep Neural Networks (DNNs) on Time-Frequency Representations (TFRs) of audio clips as model inputs to mitigate such issues. However, existing solutions often rely on a single type of TFR, which causes under-utilization of input features. To this… ▽ More The challenges of polyphonic sound event detection (PSED) stem from the detection of multiple overlapping events in a time series. Recent efforts exploit Deep Neural Networks (DNNs) on Time-Frequency Representations (TFRs) of audio clips as model inputs to mitigate such issues. However, existing solutions often rely on a single type of TFR, which causes under-utilization of input features. To this end, we propose a novel PSED framework, which incorporates Multi-Type-Multi-Scale TFRs. Our key insight is that: TFRs, which are of different types or in different scales, can reveal acoustics patterns in a complementary manner, so that the overlapped events can be best extracted by combining different TFRs. Moreover, our framework design applies a novel approach, to adaptively fuse different models and TFRs symbiotically. Hence, the overall performance can be significantly improved. We quantitatively examine the benefits of our framework by using Capsule Neural Networks, a state-of-the-art approach for PSED. The experimental results show that our method achieves a reduction of 7\% in error rate compared with the state-of-the-art solutions on the TUT-SED 2016 dataset. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: Under reviewed in ICASSP 2022

arXiv:2110.04385 [pdf, other]

Individualized Hear-through For Acoustic Transparency Using PCA-Based Sound Pressure Estimation At The Eardrum

Authors: Wenyu Jin, Tim Schoof, Henning Schepker

Abstract: The hear-through functionality on hearing devices, which allows hearing equivalent to the open-ear while providing the possibility to modify the sound pressure at the eardrum in a desired manner, has drawn great attention from researchers in recent years. To this end, the output of the device is processed by means of an equalization filter, such that the transfer function between external sound so… ▽ More The hear-through functionality on hearing devices, which allows hearing equivalent to the open-ear while providing the possibility to modify the sound pressure at the eardrum in a desired manner, has drawn great attention from researchers in recent years. To this end, the output of the device is processed by means of an equalization filter, such that the transfer function between external sound sources and the eardrum is equivalent for the open-ear and the aided condition with the device in the ear. To achieve an ideal performance, the equalization filter design assumes the exact knowledge of all the relevant acoustic transfer functions. A particular challenge is the transfer function between the hearing device receiver and the eardrum, which is difficult to obtain in practice as it requires additional probe-tube measurements. In this work, we address this issue by proposing an individualized hear-through equalization filter design that leverages the measurement of the so-called secondary path to predict the sound pressure at the eardrum. Experimental results using real-ear measured transfer functions confirm that the proposed method achieves a good sound quality compared to the open-ear while outperforming filter designs that do not leverage the proposed estimator. △ Less

Submitted 17 February, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

Comments: 5 pages, 5 figures, accepted to ICASSP 2022

arXiv:2105.14937 [pdf, other]

Safe Pontryagin Differentiable Programming

Authors: Wanxin Jin, Shaoshuai Mou, George J. Pappas

Abstract: We propose a Safe Pontryagin Differentiable Programming (Safe PDP) methodology, which establishes a theoretical and algorithmic framework to solve a broad class of safety-critical learning and control tasks -- problems that require the guarantee of safety constraint satisfaction at any stage of the learning and control progress. In the spirit of interior-point methods, Safe PDP handles different t… ▽ More We propose a Safe Pontryagin Differentiable Programming (Safe PDP) methodology, which establishes a theoretical and algorithmic framework to solve a broad class of safety-critical learning and control tasks -- problems that require the guarantee of safety constraint satisfaction at any stage of the learning and control progress. In the spirit of interior-point methods, Safe PDP handles different types of system constraints on states and inputs by incorporating them into the cost or loss through barrier functions. We prove three fundamentals of the proposed Safe PDP: first, both the solution and its gradient in the backward pass can be approximated by solving their more efficient unconstrained counterparts; second, the approximation for both the solution and its gradient can be controlled for arbitrary accuracy by a barrier parameter; and third, importantly, all intermediate results throughout the approximation and optimization strictly respect the constraints, thus guaranteeing safety throughout the entire learning and control process. We demonstrate the capabilities of Safe PDP in solving various safety-critical tasks, including safe policy optimization, safe motion planning, and learning MPCs from demonstrations, on different challenging systems such as 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing. △ Less

Submitted 25 October, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: This paper has been accepted by NeurIPS 2021

arXiv:2104.13656 [pdf, ps, other]

Adaptive Channel Estimation Based on Model-Driven Deep Learning for Wideband mmWave Systems

Authors: Weijie Jin, Hengtao He, Chao-Kai Wen, Shi Jin, Geoffrey Ye Li

Abstract: Channel estimation in wideband millimeter-wave (mmWave) systems is very challenging due to the beam squint effect. To solve the problem, we propose a learnable iterative shrinkage thresholding algorithm-based channel estimator (LISTA-CE) based on deep learning. The proposed channel estimator can learn to transform the beam-frequency mmWave channel into the domain with sparse features through train… ▽ More Channel estimation in wideband millimeter-wave (mmWave) systems is very challenging due to the beam squint effect. To solve the problem, we propose a learnable iterative shrinkage thresholding algorithm-based channel estimator (LISTA-CE) based on deep learning. The proposed channel estimator can learn to transform the beam-frequency mmWave channel into the domain with sparse features through training data. The transform domain enables us to adopt a simple denoiser with few trainable parameters. We further enhance the adaptivity of the estimator by introducing hypernetwork to automatically generate learnable parameters for LISTA-CE online. Simulation results show that the proposed approach can significantly outperform the state-of-the-art deep learning-based algorithms with lower complexity and fewer parameters and adapt to new scenarios rapidly. △ Less

Submitted 20 September, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: 6 pages, 8 figures, 1 table. Accepted by IEEE GLOBECOM 2021

arXiv:2011.15014 [pdf, other]

Learning from Human Directional Corrections

Authors: Wanxin Jin, Todd D. Murphey, Zehui Lu, Shaoshuai Mou

Abstract: This paper proposes a novel approach that enables a robot to learn an objective function incrementally from human directional corrections. Existing methods learn from human magnitude corrections; since a human needs to carefully choose the magnitude of each correction, those methods can easily lead to over-corrections and learning inefficiency. The proposed method only requires human directional c… ▽ More This paper proposes a novel approach that enables a robot to learn an objective function incrementally from human directional corrections. Existing methods learn from human magnitude corrections; since a human needs to carefully choose the magnitude of each correction, those methods can easily lead to over-corrections and learning inefficiency. The proposed method only requires human directional corrections -- corrections that only indicate the direction of an input change without indicating its magnitude. We only assume that each correction, regardless of its magnitude, points in a direction that improves the robot's current motion relative to an unknown objective function. The allowable corrections satisfying this assumption account for half of the input space, as opposed to the magnitude corrections which have to lie in a shrinking level set. For each directional correction, the proposed method updates the estimate of the objective function based on a cutting plane method, which has a geometric interpretation. We have established theoretical results to show the convergence of the learning process. The proposed method has been tested in numerical examples, a user study on two human-robot games, and a real-world quadrotor experiment. The results confirm the convergence of the proposed method and further show that the method is significantly more effective (higher success rate), efficient/effortless (less human corrections needed), and potentially more accessible (fewer early wasted trials) than the state-of-the-art robot learning frameworks. △ Less

Submitted 5 August, 2022; v1 submitted 30 November, 2020; originally announced November 2020.

Comments: This is a preprint. The published version can be accessed at IEEE Transactions on Robotics

arXiv:2008.03367 [pdf, other]

doi 10.21437/Interspeech.2018-2029

Classification of Huntington Disease using Acoustic and Lexical Features

Authors: Matthew Perez, Wenyu Jin, Duc Le, Noelle Carlozzi, Praveen Dayalu, Angela Roberts, Emily Mower Provost

Abstract: Speech is a critical biomarker for Huntington Disease (HD), with changes in speech increasing in severity as the disease progresses. Speech analyses are currently conducted using either transcriptions created manually by trained professionals or using global rating scales. Manual transcription is both expensive and time-consuming and global rating scales may lack sufficient sensitivity and fidelit… ▽ More Speech is a critical biomarker for Huntington Disease (HD), with changes in speech increasing in severity as the disease progresses. Speech analyses are currently conducted using either transcriptions created manually by trained professionals or using global rating scales. Manual transcription is both expensive and time-consuming and global rating scales may lack sufficient sensitivity and fidelity. Ultimately, what is needed is an unobtrusive measure that can cheaply and continuously track disease progression. We present first steps towards the development of such a system, demonstrating the ability to automatically differentiate between healthy controls and individuals with HD using speech cues. The results provide evidence that objective analyses can be used to support clinical diagnoses, moving towards the tracking of symptomatology outside of laboratory and clinical environments. △ Less

Submitted 7 August, 2020; originally announced August 2020.

Comments: 4 pages

arXiv:2008.02159 [pdf, other]

Learning from Sparse Demonstrations

Authors: Wanxin Jin, Todd D. Murphey, Dana Kulić, Neta Ezer, Shaoshuai Mou

Abstract: This paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the… ▽ More This paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot's actual execution. The method jointly finds an objective function and a time-warping function such that the robot's resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent, by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions. △ Less

Submitted 8 August, 2022; v1 submitted 5 August, 2020; originally announced August 2020.

Comments: This is a preprint. The published version can be accessed at IEEE Transactions on Robotics

arXiv:2006.16628 [pdf, other]

Beamspace Channel Estimation for Wideband Millimeter-Wave MIMO: A Model-Driven Unsupervised Learning Approach

Authors: Hengtao He, Rui Wang, Weijie Jin, Shi Jin, Chao-Kai Wen, Geoffrey Ye Li

Abstract: Millimeter-wave (mmWave) communications have been one of the promising technologies for future wireless networks that integrate a wide range of data-demanding applications. To compensate for the large channel attenuation in mmWave band and avoid high hardware cost, a lens-based beamspace massive multiple-input multiple-output (MIMO) system is considered. However, the beam squint effect in wideband… ▽ More Millimeter-wave (mmWave) communications have been one of the promising technologies for future wireless networks that integrate a wide range of data-demanding applications. To compensate for the large channel attenuation in mmWave band and avoid high hardware cost, a lens-based beamspace massive multiple-input multiple-output (MIMO) system is considered. However, the beam squint effect in wideband mmWave systems makes channel estimation very challenging, especially when the receiver is equipped with a limited number of radio-frequency (RF) chains. Furthermore, the real channel data cannot be obtained before the mmWave system is used in a new environment, which makes it impossible to train a deep learning (DL)-based channel estimator using real data set beforehand. To solve the problem, we propose a model-driven unsupervised learning network, named learned denoising-based generalized expectation consistent (LDGEC) signal recovery network. By utilizing the Stein's unbiased risk estimator loss, the LDGEC network can be trained only with limited measurements corresponding to the pilot symbols, instead of the real channel data. Even if designed for unsupervised learning, the LDGEC network can be supervisingly trained with the real channel via the denoiser-by-denoiser way. The numerical results demonstrate that the LDGEC-based channel estimator significantly outperforms state-of-the-art compressive sensing-based algorithms when the receiver is equipped with a small number of RF chains and low-resolution ADCs. △ Less

Submitted 12 May, 2021; v1 submitted 30 June, 2020; originally announced June 2020.

Comments: 28 Pages, 9 Figures. This paper has been submitted to the IEEE for possible publication

arXiv:2006.08532 [pdf, other]

Improved Conditional Flow Models for Molecule to Image Synthesis

Authors: Karren Yang, Samuel Goldman, Wengong Jin, Alex Lu, Regina Barzilay, Tommi Jaakkola, Caroline Uhler

Abstract: In this paper, we aim to synthesize cell microscopy images under different molecular interventions, motivated by practical applications to drug development. Building on the recent success of graph neural networks for learning molecular embeddings and flow-based models for image generation, we propose Mol2Image: a flow-based generative model for molecule to cell image synthesis. To generate cell fe… ▽ More In this paper, we aim to synthesize cell microscopy images under different molecular interventions, motivated by practical applications to drug development. Building on the recent success of graph neural networks for learning molecular embeddings and flow-based models for image generation, we propose Mol2Image: a flow-based generative model for molecule to cell image synthesis. To generate cell features at different resolutions and scale to high-resolution images, we develop a novel multi-scale flow architecture based on a Haar wavelet image pyramid. To maximize the mutual information between the generated images and the molecular interventions, we devise a training strategy based on contrastive learning. To evaluate our model, we propose a new set of metrics for biological image generation that are robust, interpretable, and relevant to practitioners. We show quantitatively that our method learns a meaningful embedding of the molecular intervention, which is translated into an image representation reflecting the biological effects of the intervention. △ Less

Submitted 15 June, 2020; originally announced June 2020.

MSC Class: 92-08

arXiv:2006.08465 [pdf, other]

Neural Certificates for Safe Control Policies

Authors: Wanxin Jin, Zhaoran Wang, Zhuoran Yang, Shaoshuai Mou

Abstract: This paper develops an approach to learn a policy of a dynamical system that is guaranteed to be both provably safe and goal-reaching. Here, the safety means that a policy must not drive the state of the system to any unsafe region, while the goal-reaching requires the trajectory of the controlled system asymptotically converges to a goal region (a generalization of stability). We obtain the safe… ▽ More This paper develops an approach to learn a policy of a dynamical system that is guaranteed to be both provably safe and goal-reaching. Here, the safety means that a policy must not drive the state of the system to any unsafe region, while the goal-reaching requires the trajectory of the controlled system asymptotically converges to a goal region (a generalization of stability). We obtain the safe and goal-reaching policy by jointly learning two additional certificate functions: a barrier function that guarantees the safety and a developed Lyapunov-like function to fulfill the goal-reaching requirement, both of which are represented by neural networks. We show the effectiveness of the method to learn both safe and goal-reaching policies on various systems, including pendulums, cart-poles, and UAVs. △ Less

Submitted 15 June, 2020; originally announced June 2020.

arXiv:2004.13181 [pdf, ps, other]

EM-GAN: Fast Stress Analysis for Multi-Segment Interconnect Using Generative Adversarial Networks

Authors: Wentian Jin, Sheriff Sadiqbatcha, Jinwei Zhang, Sheldon X. -D. Tan

Abstract: In this paper, we propose a fast transient hydrostatic stress analysis for electromigration (EM) failure assessment for multi-segment interconnects using generative adversarial networks (GANs). Our work leverages the image synthesis feature of GAN-based generative deep neural networks. The stress evaluation of multi-segment interconnects, modeled by partial differential equations, can be viewed as… ▽ More In this paper, we propose a fast transient hydrostatic stress analysis for electromigration (EM) failure assessment for multi-segment interconnects using generative adversarial networks (GANs). Our work leverages the image synthesis feature of GAN-based generative deep neural networks. The stress evaluation of multi-segment interconnects, modeled by partial differential equations, can be viewed as time-varying 2D-images-to-image problem where the input is the multi-segment interconnects topology with current densities and the output is the EM stress distribution in those wire segments at the given aging time. Based on this observation, we train conditional GAN model using the images of many self-generated multi-segment wires and wire current densities and aging time (as conditions) against the COMSOL simulation results. Different hyperparameters of GAN were studied and compared. The proposed algorithm, called {\it EM-GAN}, can quickly give accurate stress distribution of a general multi-segment wire tree for a given aging time, which is important for full-chip fast EM failure assessment. Our experimental results show that the EM-GAN shows 6.6\% averaged error compared to COMSOL simulation results with orders of magnitude speedup. It also delivers 8.3X speedup over state-of-the-art analytic based EM analysis solver. △ Less

Submitted 27 April, 2020; originally announced April 2020.

arXiv:1912.12970 [pdf, other]

Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework

Authors: Wanxin Jin, Zhaoran Wang, Zhuoran Yang, Shaoshuai Mou

Abstract: This paper develops a Pontryagin Differentiable Programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks. The PDP distinguishes from existing methods by two novel techniques: first, we differentiate through Pontryagin's Maximum Principle, and this allows to obtain the analytical derivative of a trajectory with respect to tunable para… ▽ More This paper develops a Pontryagin Differentiable Programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks. The PDP distinguishes from existing methods by two novel techniques: first, we differentiate through Pontryagin's Maximum Principle, and this allows to obtain the analytical derivative of a trajectory with respect to tunable parameters within an optimal control system, enabling end-to-end learning of dynamics, policies, or/and control objective functions; and second, we propose an auxiliary control system in the backward pass of the PDP framework, and the output of this auxiliary control system is the analytical derivative of the original system's trajectory with respect to the parameters, which can be iteratively solved using standard control tools. We investigate three learning modes of the PDP: inverse reinforcement learning, system identification, and control/planning. We demonstrate the capability of the PDP in each learning mode on different high-dimensional systems, including multi-link robot arm, 6-DoF maneuvering quadrotor, and 6-DoF rocket powered landing. △ Less

Submitted 12 January, 2021; v1 submitted 30 December, 2019; originally announced December 2019.

Comments: Published in NeurIPS 2020, Codes are at https://github.com/wanxinjin/Pontryagin-Differentiable-Programming

arXiv:1911.12886 [pdf, other]

doi 10.1088/1741-2552/ab8131

Artificial Intelligence in Glioma Imaging: Challenges and Advances

Authors: Weina Jin, Mostafa Fatehi, Kumar Abhishek, Mayur Mallya, Brian Toyota, Ghassan Hamarneh

Abstract: Primary brain tumors including gliomas continue to pose significant management challenges to clinicians. While the presentation, the pathology, and the clinical course of these lesions are variable, the initial investigations are usually similar. Patients who are suspected to have a brain tumor will be assessed with computed tomography (CT) and magnetic resonance imaging (MRI). The imaging finding… ▽ More Primary brain tumors including gliomas continue to pose significant management challenges to clinicians. While the presentation, the pathology, and the clinical course of these lesions are variable, the initial investigations are usually similar. Patients who are suspected to have a brain tumor will be assessed with computed tomography (CT) and magnetic resonance imaging (MRI). The imaging findings are used by neurosurgeons to determine the feasibility of surgical resection and plan such an undertaking. Imaging studies are also an indispensable tool in tracking tumor progression or its response to treatment. As these imaging studies are non-invasive, relatively cheap and accessible to patients, there have been many efforts over the past two decades to increase the amount of clinically-relevant information that can be extracted from brain imaging. Most recently, artificial intelligence (AI) techniques have been employed to segment and characterize brain tumors, as well as to detect progression or treatment-response. However, the clinical utility of such endeavours remains limited due to challenges in data collection and annotation, model training, and the reliability of AI-generated information. We provide a review of recent advances in addressing the above challenges. First, to overcome the challenge of data paucity, different image imputation and synthesis techniques along with annotation collection efforts are summarized. Next, various training strategies are presented to meet multiple desiderata, such as model performance, generalization ability, data privacy protection, and learning with sparse annotations. Finally, standardized performance evaluation and model interpretability methods have been reviewed. We believe that these technical approaches will facilitate the development of a fully-functional AI tool in the clinical care of patients with gliomas. △ Less

Submitted 10 April, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

Comments: 31 pages, 6 figures. Accepted for publication in the Journal of Neural Engineering

arXiv:1906.08847 [pdf, ps, other]

A Signal Subspace Rotation Method for Localization of Multiple Wideband Sound Sources

Authors: Kainan Chen, Wenyu Jin, Bharadwaj Desikan

Abstract: In this paper, the problem of extending narrowband multichannel sound source localization algorithms to the wideband case is addressed. The DOA estimation of narrowband algorithms is based on the estimate of inter-channel phase differences (IPD) between microphones of the sound sources. A new method for wideband sound source DOA estimation based on signal subspace rotation is present. The proposed… ▽ More In this paper, the problem of extending narrowband multichannel sound source localization algorithms to the wideband case is addressed. The DOA estimation of narrowband algorithms is based on the estimate of inter-channel phase differences (IPD) between microphones of the sound sources. A new method for wideband sound source DOA estimation based on signal subspace rotation is present. The proposed algorithm normalizes the narrowband signal statistics by rotating the estimated signal subspace to the wideband counterpart in the eigenvector domain. Then the wideband DOA estimate can be obtained by estimating the normalized IPD from these wideband signal statistics. In addition to requiring less computational complexity compared to repeating the narrowband algorithms for all relevant frequencies of wideband signals, the proposed method also does not require any additional prior knowledge. The experimental results demonstrate the efficacy and the robustness of the proposed method. △ Less

Submitted 20 June, 2019; originally announced June 2019.

Comments: 5 pages, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

arXiv:1904.10341 [pdf, other]

doi 10.1364/PRJ.390091

Single-photon computational 3D imaging at 45 km

Authors: Zheng-Ping Li, Xin Huang, Yuan Cao, Bin Wang, Yu-Huai Li, Weijie Jin, Chao Yu, Jun Zhang, Qiang Zhang, Cheng-Zhi Peng, Feihu Xu, Jian-Wei Pan

Abstract: Long-range active imaging has a variety of applications in remote sensing and target recognition. Single-photon LiDAR (light detection and ranging) offers single-photon sensitivity and picosecond timing resolution, which is desirable for high-precision three-dimensional (3D) imaging over long distances. Despite important progress, further extending the imaging range presents enormous challenges be… ▽ More Long-range active imaging has a variety of applications in remote sensing and target recognition. Single-photon LiDAR (light detection and ranging) offers single-photon sensitivity and picosecond timing resolution, which is desirable for high-precision three-dimensional (3D) imaging over long distances. Despite important progress, further extending the imaging range presents enormous challenges because only weak echo photons return and are mixed with strong noise. Herein, we tackled these challenges by constructing a high-efficiency, low-noise confocal single-photon LiDAR system, and developing a long-range-tailored computational algorithm that provides high photon efficiency and super-resolution in the transverse domain. Using this technique, we experimentally demonstrated active single-photon 3D-imaging at a distance of up to 45 km in an urban environment, with a low return-signal level of $\sim$1 photon per pixel. Our system is feasible for imaging at a few hundreds of kilometers by refining the setup, and thus represents a significant milestone towards rapid, low-power, and high-resolution LiDAR over extra-long ranges. △ Less

Submitted 22 April, 2019; originally announced April 2019.

Comments: 22 pages, 5 figures

Journal ref: Photonics Research 8, 1532 (2020)

arXiv:1803.07696 [pdf, other]

doi 10.1177/0278364921996384

Inverse Optimal Control from Incomplete Trajectory Observations

Authors: Wanxin Jin, Dana Kulić, Shaoshuai Mou, Sandra Hirche

Abstract: This article develops a methodology that enables learning an objective function of an optimal control system from incomplete trajectory observations. The objective function is assumed to be a weighted sum of features (or basis functions) with unknown weights, and the observed data is a segment of a trajectory of system states and inputs. The proposed technique introduces the concept of the recover… ▽ More This article develops a methodology that enables learning an objective function of an optimal control system from incomplete trajectory observations. The objective function is assumed to be a weighted sum of features (or basis functions) with unknown weights, and the observed data is a segment of a trajectory of system states and inputs. The proposed technique introduces the concept of the recovery matrix to establish the relationship between any available segment of the trajectory and the weights of given candidate features. The rank of the recovery matrix indicates whether a subset of relevant features can be found among the candidate features and the corresponding weights can be learned from the segment data. The recovery matrix can be obtained iteratively and its rank non-decreasing property shows that additional observations may contribute to the objective learning. Based on the recovery matrix, a method for using incomplete trajectory observations to learn the weights of selected features is established, and an incremental inverse optimal control algorithm is developed by automatically finding the minimal required observation. The effectiveness of the proposed method is demonstrated on a linear quadratic regulator system and a simulated robot manipulator. △ Less

Submitted 21 January, 2021; v1 submitted 20 March, 2018; originally announced March 2018.

Comments: Codes: https://github.com/wanxinjin/IOC-from-Incomplete-Trajectory-Observations

Journal ref: The International Journal of Robotics Research. 2021;40(6-7):848-865

Showing 1–35 of 35 results for author: Jin, W