-
Exploring Facial Biomarkers for Depression through Temporal Analysis of Action Units
Authors:
Aditya Parikh,
Misha Sadeghi,
Bjorn Eskofier
Abstract:
Depression is characterized by persistent sadness and loss of interest, significantly impairing daily functioning and now a widespread mental disorder. Traditional diagnostic methods rely on subjective assessments, necessitating objective approaches for accurate diagnosis. Our study investigates the use of facial action units (AUs) and emotions as biomarkers for depression. We analyzed facial expr…
▽ More
Depression is characterized by persistent sadness and loss of interest, significantly impairing daily functioning and now a widespread mental disorder. Traditional diagnostic methods rely on subjective assessments, necessitating objective approaches for accurate diagnosis. Our study investigates the use of facial action units (AUs) and emotions as biomarkers for depression. We analyzed facial expressions from video data of participants classified with or without depression. Our methodology involved detailed feature extraction, mean intensity comparisons of key AUs, and the application of time series classification models. Furthermore, we employed Principal Component Analysis (PCA) and various clustering algorithms to explore the variability in emotional expression patterns. Results indicate significant differences in the intensities of AUs associated with sadness and happiness between the groups, highlighting the potential of facial analysis in depression assessment.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
How Intermodal Interaction Affects the Performance of Deep Multimodal Fusion for Mixed-Type Time Series
Authors:
Simon Dietz,
Thomas Altstidl,
Dario Zanca,
Björn Eskofier,
An Nguyen
Abstract:
Mixed-type time series (MTTS) is a bimodal data type that is common in many domains, such as healthcare, finance, environmental monitoring, and social media. It consists of regularly sampled continuous time series and irregularly sampled categorical event sequences. The integration of both modalities through multimodal fusion is a promising approach for processing MTTS. However, the question of ho…
▽ More
Mixed-type time series (MTTS) is a bimodal data type that is common in many domains, such as healthcare, finance, environmental monitoring, and social media. It consists of regularly sampled continuous time series and irregularly sampled categorical event sequences. The integration of both modalities through multimodal fusion is a promising approach for processing MTTS. However, the question of how to effectively fuse both modalities remains open. In this paper, we present a comprehensive evaluation of several deep multimodal fusion approaches for MTTS forecasting. Our comparison includes three fusion types (early, intermediate, and late) and five fusion methods (concatenation, weighted mean, weighted mean with correlation, gating, and feature sharing). We evaluate these fusion approaches on three distinct datasets, one of which was generated using a novel framework. This framework allows for the control of key data properties, such as the strength and direction of intermodal interactions, modality imbalance, and the degree of randomness in each modality, providing a more controlled environment for testing fusion approaches. Our findings show that the performance of different fusion approaches can be substantially influenced by the direction and strength of intermodal interactions. The study reveals that early and intermediate fusion approaches excel at capturing fine-grained and coarse-grained cross-modal features, respectively. These findings underscore the crucial role of intermodal interactions in determining the most effective fusion strategy for MTTS forecasting.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Large-Scale Dataset Pruning in Adversarial Training through Data Importance Extrapolation
Authors:
Björn Nieth,
Thomas Altstidl,
Leo Schwinn,
Björn Eskofier
Abstract:
Their vulnerability to small, imperceptible attacks limits the adoption of deep learning models to real-world systems. Adversarial training has proven to be one of the most promising strategies against these attacks, at the expense of a substantial increase in training time. With the ongoing trend of integrating large-scale synthetic data this is only expected to increase even further. Thus, the n…
▽ More
Their vulnerability to small, imperceptible attacks limits the adoption of deep learning models to real-world systems. Adversarial training has proven to be one of the most promising strategies against these attacks, at the expense of a substantial increase in training time. With the ongoing trend of integrating large-scale synthetic data this is only expected to increase even further. Thus, the need for data-centric approaches that reduce the number of training samples while maintaining accuracy and robustness arises. While data pruning and active learning are prominent research topics in deep learning, they are as of now largely unexplored in the adversarial training literature. We address this gap and propose a new data pruning strategy based on extrapolating data importance scores from a small set of data to a larger set. In an empirical evaluation, we demonstrate that extrapolation-based pruning can efficiently reduce dataset size while maintaining robustness.
△ Less
Submitted 11 July, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Velocity-Based Channel Charting with Spatial Distribution Map Matching
Authors:
Maximilian Stahlke,
George Yammine,
Tobias Feigl,
Bjoern M. Eskofier,
Christopher Mutschler
Abstract:
Fingerprint-based localization improves the positioning performance in challenging, non-line-of-sight (NLoS) dominated indoor environments. However, fingerprinting models require an expensive life-cycle management including recording and labeling of radio signals for the initial training and regularly at environmental changes. Alternatively, channel-charting avoids this labeling effort as it impli…
▽ More
Fingerprint-based localization improves the positioning performance in challenging, non-line-of-sight (NLoS) dominated indoor environments. However, fingerprinting models require an expensive life-cycle management including recording and labeling of radio signals for the initial training and regularly at environmental changes. Alternatively, channel-charting avoids this labeling effort as it implicitly associates relative coordinates to the recorded radio signals. Then, with reference real-world coordinates (positions) we can use such charts for positioning tasks. However, current channel-charting approaches lag behind fingerprinting in their positioning accuracy and still require reference samples for localization, regular data recording and labeling to keep the models up to date. Hence, we propose a novel framework that does not require reference positions. We only require information from velocity information, e.g., from pedestrian dead reckoning or odometry to model the channel charts, and topological map information, e.g., a building floor plan, to transform the channel charts into real coordinates. We evaluate our approach on two different real-world datasets using 5G and distributed single-input/multiple-output system (SIMO) radio systems. Our experiments show that even with noisy velocity estimates and coarse map information, we achieve similar position accuracies
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
From Patches to Objects: Exploiting Spatial Reasoning for Better Visual Representations
Authors:
Toni Albert,
Bjoern Eskofier,
Dario Zanca
Abstract:
As the field of deep learning steadily transitions from the realm of academic research to practical application, the significance of self-supervised pretraining methods has become increasingly prominent. These methods, particularly in the image domain, offer a compelling strategy to effectively utilize the abundance of unlabeled image data, thereby enhancing downstream tasks' performance. In this…
▽ More
As the field of deep learning steadily transitions from the realm of academic research to practical application, the significance of self-supervised pretraining methods has become increasingly prominent. These methods, particularly in the image domain, offer a compelling strategy to effectively utilize the abundance of unlabeled image data, thereby enhancing downstream tasks' performance. In this paper, we propose a novel auxiliary pretraining method that is based on spatial reasoning. Our proposed method takes advantage of a more flexible formulation of contrastive learning by introducing spatial reasoning as an auxiliary task for discriminative self-supervised methods. Spatial Reasoning works by having the network predict the relative distances between sampled non-overlapping patches. We argue that this forces the network to learn more detailed and intricate internal representations of the objects and the relationships between their constituting parts. Our experiments demonstrate substantial improvement in downstream performance in linear evaluation compared to similar work and provide directions for further research into spatial reasoning.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
Contrastive Language-Image Pretrained Models are Zero-Shot Human Scanpath Predictors
Authors:
Dario Zanca,
Andrea Zugarini,
Simon Dietz,
Thomas R. Altstidl,
Mark A. Turban Ndjeuha,
Leo Schwinn,
Bjoern Eskofier
Abstract:
Understanding the mechanisms underlying human attention is a fundamental challenge for both vision science and artificial intelligence. While numerous computational models of free-viewing have been proposed, less is known about the mechanisms underlying task-driven image exploration. To address this gap, we present CapMIT1003, a database of captions and click-contingent image explorations collecte…
▽ More
Understanding the mechanisms underlying human attention is a fundamental challenge for both vision science and artificial intelligence. While numerous computational models of free-viewing have been proposed, less is known about the mechanisms underlying task-driven image exploration. To address this gap, we present CapMIT1003, a database of captions and click-contingent image explorations collected during captioning tasks. CapMIT1003 is based on the same stimuli from the well-known MIT1003 benchmark, for which eye-tracking data under free-viewing conditions is available, which offers a promising opportunity to concurrently study human attention under both tasks. We make this dataset publicly available to facilitate future research in this field. In addition, we introduce NevaClip, a novel zero-shot method for predicting visual scanpaths that combines contrastive language-image pretrained (CLIP) models with biologically-inspired neural visual attention (NeVA) algorithms. NevaClip simulates human scanpaths by aligning the representation of the foveated visual stimulus and the representation of the associated caption, employing gradient-driven visual exploration to generate scanpaths. Our experimental results demonstrate that NevaClip outperforms existing unsupervised computational models of human visual attention in terms of scanpath plausibility, for both captioning and free-viewing tasks. Furthermore, we show that conditioning NevaClip with incorrect or misleading captions leads to random behavior, highlighting the significant impact of caption guidance in the decision-making process. These findings contribute to a better understanding of mechanisms that guide human attention and pave the way for more sophisticated computational approaches to scanpath prediction that can integrate direct top-down guidance of downstream tasks.
△ Less
Submitted 23 May, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Raising the Bar for Certified Adversarial Robustness with Diffusion Models
Authors:
Thomas Altstidl,
David Dobre,
Björn Eskofier,
Gauthier Gidel,
Leo Schwinn
Abstract:
Certified defenses against adversarial attacks offer formal guarantees on the robustness of a model, making them more reliable than empirical methods such as adversarial training, whose effectiveness is often later reduced by unseen attacks. Still, the limited certified robustness that is currently achievable has been a bottleneck for their practical adoption. Gowal et al. and Wang et al. have sho…
▽ More
Certified defenses against adversarial attacks offer formal guarantees on the robustness of a model, making them more reliable than empirical methods such as adversarial training, whose effectiveness is often later reduced by unseen attacks. Still, the limited certified robustness that is currently achievable has been a bottleneck for their practical adoption. Gowal et al. and Wang et al. have shown that generating additional training data using state-of-the-art diffusion models can considerably improve the robustness of adversarial training. In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses. In addition, we provide a list of recommendations to scale the robustness of certified training approaches. One of our main insights is that the generalization gap, i.e., the difference between the training and test accuracy of the original model, is a good predictor of the magnitude of the robustness improvement when using additional generated data. Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the $\ell_2$ ($ε= 36/255$) and $\ell_\infty$ ($ε= 8/255$) threat models, outperforming the previous best results by $+3.95\%$ and $+1.39\%$, respectively. Furthermore, we report similar improvements for CIFAR-100.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
FastAMI -- a Monte Carlo Approach to the Adjustment for Chance in Clustering Comparison Metrics
Authors:
Kai Klede,
Leo Schwinn,
Dario Zanca,
Björn Eskofier
Abstract:
Clustering is at the very core of machine learning, and its applications proliferate with the increasing availability of data. However, as datasets grow, comparing clusterings with an adjustment for chance becomes computationally difficult, preventing unbiased ground-truth comparisons and solution selection. We propose FastAMI, a Monte Carlo-based method to efficiently approximate the Adjusted Mut…
▽ More
Clustering is at the very core of machine learning, and its applications proliferate with the increasing availability of data. However, as datasets grow, comparing clusterings with an adjustment for chance becomes computationally difficult, preventing unbiased ground-truth comparisons and solution selection. We propose FastAMI, a Monte Carlo-based method to efficiently approximate the Adjusted Mutual Information (AMI) and extend it to the Standardized Mutual Information (SMI). The approach is compared with the exact calculation and a recently developed variant of the AMI based on pairwise permutations, using both synthetic and real data. In contrast to the exact calculation our method is fast enough to enable these adjusted information-theoretic comparisons for large datasets while maintaining considerably more accurate results than the pairwise approach.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Simulating Human Gaze with Neural Visual Attention
Authors:
Leo Schwinn,
Doina Precup,
Bjoern Eskofier,
Dario Zanca
Abstract:
Existing models of human visual attention are generally unable to incorporate direct task guidance and therefore cannot model an intent or goal when exploring a scene. To integrate guidance of any downstream visual task into attention modeling, we propose the Neural Visual Attention (NeVA) algorithm. To this end, we impose to neural networks the biological constraint of foveated vision and train a…
▽ More
Existing models of human visual attention are generally unable to incorporate direct task guidance and therefore cannot model an intent or goal when exploring a scene. To integrate guidance of any downstream visual task into attention modeling, we propose the Neural Visual Attention (NeVA) algorithm. To this end, we impose to neural networks the biological constraint of foveated vision and train an attention mechanism to generate visual explorations that maximize the performance with respect to the downstream task. We observe that biologically constrained neural networks generate human-like scanpaths without being trained for this objective. Extensive experiments on three common benchmark datasets show that our method outperforms state-of-the-art unsupervised human attention models in generating human-like scanpaths.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks
Authors:
Thomas Altstidl,
An Nguyen,
Leo Schwinn,
Franz Köferl,
Christopher Mutschler,
Björn Eskofier,
Dario Zanca
Abstract:
The widespread success of convolutional neural networks may largely be attributed to their intrinsic property of translation equivariance. However, convolutions are not equivariant to variations in scale and fail to generalize to objects of different sizes. Despite recent advances in this field, it remains unclear how well current methods generalize to unobserved scales on real-world data and to w…
▽ More
The widespread success of convolutional neural networks may largely be attributed to their intrinsic property of translation equivariance. However, convolutions are not equivariant to variations in scale and fail to generalize to objects of different sizes. Despite recent advances in this field, it remains unclear how well current methods generalize to unobserved scales on real-world data and to what extent scale equivariance plays a role. To address this, we propose the novel Scaled and Translated Image Recognition (STIR) benchmark based on four different domains. Additionally, we introduce a new family of models that applies many re-scaled kernels with shared weights in parallel and then selects the most appropriate one. Our experimental results on STIR show that both the existing and proposed approaches can improve generalization across scales compared to standard convolutions. We also demonstrate that our family of models is able to generalize well towards larger scales and improve scale equivariance. Moreover, due to their unique design we can validate that kernel selection is consistent with input scale. Even so, none of the evaluated models maintain their performance for large differences in scale, demonstrating that a general understanding of how scale equivariance can improve generalization and robustness is still lacking.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Indoor Localization with Robust Global Channel Charting: A Time-Distance-Based Approach
Authors:
Maximilian Stahlke,
George Yammine,
Tobias Feigl,
Bjoern M. Eskofier,
Christopher Mutschler
Abstract:
Fingerprinting-based positioning significantly improves the indoor localization performance in non-line-of-sight-dominated areas. However, its deployment and maintenance is cost-intensive as it needs ground-truth reference systems for both the initial training and the adaption to environmental changes. In contrast, channel charting (CC) works without explicit reference information and only require…
▽ More
Fingerprinting-based positioning significantly improves the indoor localization performance in non-line-of-sight-dominated areas. However, its deployment and maintenance is cost-intensive as it needs ground-truth reference systems for both the initial training and the adaption to environmental changes. In contrast, channel charting (CC) works without explicit reference information and only requires the spatial correlations of channel state information (CSI). While CC has shown promising results in modelling the geometry of the radio environment, a deeper insight into CC for localization using multi-anchor large-bandwidth measurements is still pending. We contribute a novel distance metric for time-synchronized single-input/single-output CSIs that approaches a linear correlation to the Euclidean distance. This allows to learn the environment's global geometry without annotations. To efficiently optimize the global channel chart we approximate the metric with a Siamese neural network. This enables full CC-assisted fingerprinting and positioning only using a linear transformation from the chart to the real-world coordinates. We compare our approach to the state-of-the-art of CC on two different real-world data sets recorded with a 5G and UWB radio setup. Our approach outperforms others with localization accuracies of 0.69m for the UWB and 1.4m for the 5G setup. We show that CC-assisted fingerprinting enables highly accurate localization and reduces (or eliminates) the need for annotated training data.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Active Learning of Ordinal Embeddings: A User Study on Football Data
Authors:
Christoffer Loeffler,
Kion Fallah,
Stefano Fenu,
Dario Zanca,
Bjoern Eskofier,
Christopher John Rozell,
Christopher Mutschler
Abstract:
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function. Distance metrics can only serve as proxy for similarity in information retrieval of similar instances. Learning a good similarity function from human annotations improves the quality of retrievals. This work uses deep metric learning to learn these user-defined similarity functions from…
▽ More
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function. Distance metrics can only serve as proxy for similarity in information retrieval of similar instances. Learning a good similarity function from human annotations improves the quality of retrievals. This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset. We adapt an entropy-based active learning method with recent work from triplet mining to collect easy-to-answer but still informative annotations from human participants and use them to train a deep convolutional network that generalizes to unseen samples. Our user study shows that our approach improves the quality of the information retrieval compared to a previous deep metric learning approach that relies on a Siamese network. Specifically, we shed light on the strengths and weaknesses of passive sampling heuristics and active learners alike by analyzing the participants' response efficacy. To this end, we collect accuracy, algorithmic time complexity, the participants' fatigue and time-to-response, qualitative self-assessment and statements, as well as the effects of mixed-expertise annotators and their consistency on model performance and transfer-learning.
△ Less
Submitted 10 November, 2022; v1 submitted 26 July, 2022;
originally announced July 2022.
-
Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification
Authors:
Leo Schwinn,
Leon Bungert,
An Nguyen,
René Raab,
Falk Pulsmeyer,
Doina Precup,
Björn Eskofier,
Dario Zanca
Abstract:
The reliability of neural networks is essential for their use in safety-critical applications. Existing approaches generally aim at improving the robustness of neural networks to either real-world distribution shifts (e.g., common corruptions and perturbations, spatial transformations, and natural adversarial examples) or worst-case distribution shifts (e.g., optimized adversarial examples). In th…
▽ More
The reliability of neural networks is essential for their use in safety-critical applications. Existing approaches generally aim at improving the robustness of neural networks to either real-world distribution shifts (e.g., common corruptions and perturbations, spatial transformations, and natural adversarial examples) or worst-case distribution shifts (e.g., optimized adversarial examples). In this work, we propose the Decision Region Quantification (DRQ) algorithm to improve the robustness of any differentiable pre-trained model against both real-world and worst-case distribution shifts in the data. DRQ analyzes the robustness of local decision regions in the vicinity of a given data point to make more reliable predictions. We theoretically motivate the DRQ algorithm by showing that it effectively smooths spurious local extrema in the decision surface. Furthermore, we propose an implementation using targeted and untargeted adversarial attacks. An extensive empirical evaluation shows that DRQ increases the robustness of adversarially and non-adversarially trained models against real-world and worst-case distribution shifts on several computer vision benchmark datasets.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Behind the Machine's Gaze: Neural Networks with Biologically-inspired Constraints Exhibit Human-like Visual Attention
Authors:
Leo Schwinn,
Doina Precup,
Björn Eskofier,
Dario Zanca
Abstract:
By and large, existing computational models of visual attention tacitly assume perfect vision and full access to the stimulus and thereby deviate from foveated biological vision. Moreover, modeling top-down attention is generally reduced to the integration of semantic features without incorporating the signal of a high-level visual tasks that have been shown to partially guide human attention. We…
▽ More
By and large, existing computational models of visual attention tacitly assume perfect vision and full access to the stimulus and thereby deviate from foveated biological vision. Moreover, modeling top-down attention is generally reduced to the integration of semantic features without incorporating the signal of a high-level visual tasks that have been shown to partially guide human attention. We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner. With our method, we explore the ability of neural networks on which we impose a biologically-inspired foveated vision constraint to generate human-like scanpaths without directly training for this objective. The loss of a neural network performing a downstream visual task (i.e., classification or reconstruction) flexibly provides top-down guidance to the scanpath. Extensive experiments show that our method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths. Additionally, the flexibility of the framework allows to quantitatively investigate the role of different tasks in the generated visual behaviors. Finally, we demonstrate the superiority of the approach in a novel experiment that investigates the utility of scanpaths in real-world applications, where imperfect viewing conditions are given.
△ Less
Submitted 19 November, 2022; v1 submitted 19 April, 2022;
originally announced April 2022.
-
How to Learn from Risk: Explicit Risk-Utility Reinforcement Learning for Efficient and Safe Driving Strategies
Authors:
Lukas M. Schmidt,
Sebastian Rietsch,
Axel Plinge,
Bjoern M. Eskofier,
Christopher Mutschler
Abstract:
Autonomous driving has the potential to revolutionize mobility and is hence an active area of research. In practice, the behavior of autonomous vehicles must be acceptable, i.e., efficient, safe, and interpretable. While vanilla reinforcement learning (RL) finds performant behavioral strategies, they are often unsafe and uninterpretable. Safety is introduced through Safe RL approaches, but they st…
▽ More
Autonomous driving has the potential to revolutionize mobility and is hence an active area of research. In practice, the behavior of autonomous vehicles must be acceptable, i.e., efficient, safe, and interpretable. While vanilla reinforcement learning (RL) finds performant behavioral strategies, they are often unsafe and uninterpretable. Safety is introduced through Safe RL approaches, but they still mostly remain uninterpretable as the learned behaviour is jointly optimized for safety and performance without modeling them separately. Interpretable machine learning is rarely applied to RL. This paper proposes SafeDQN, which allows to make the behavior of autonomous vehicles safe and interpretable while still being efficient. SafeDQN offers an understandable, semantic trade-off between the expected risk and the utility of actions while being algorithmically transparent. We show that SafeDQN finds interpretable and safe driving policies for a variety of scenarios and demonstrate how state-of-the-art saliency techniques can help to assess both risk and utility.
△ Less
Submitted 2 August, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Don't Get Me Wrong: How to Apply Deep Visual Interpretations to Time Series
Authors:
Christoffer Loeffler,
Wei-Cheng Lai,
Bjoern Eskofier,
Dario Zanca,
Lukas Schmidt,
Christopher Mutschler
Abstract:
The correct interpretation and understanding of deep learning models are essential in many applications. Explanatory visual interpretation approaches for image, and natural language processing allow domain experts to validate and understand almost any deep learning model. However, they fall short when generalizing to arbitrary time series, which is inherently less intuitive and more diverse. Wheth…
▽ More
The correct interpretation and understanding of deep learning models are essential in many applications. Explanatory visual interpretation approaches for image, and natural language processing allow domain experts to validate and understand almost any deep learning model. However, they fall short when generalizing to arbitrary time series, which is inherently less intuitive and more diverse. Whether a visualization explains valid reasoning or captures the actual features is difficult to judge. Hence, instead of blind trust, we need an objective evaluation to obtain trustworthy quality metrics. We propose a framework of six orthogonal metrics for gradient-, propagation- or perturbation-based post-hoc visual interpretation methods for time series classification and segmentation tasks. An experimental study includes popular neural network architectures for time series and nine visual interpretation methods. We evaluate the visual interpretation methods with diverse datasets from the UCR repository and a complex, real-world dataset and study the influence of standard regularization techniques during training. We show that none of the methods consistently outperforms others on all metrics, while some are sometimes ahead. Our insights and recommendations allow experts to choose suitable visualization techniques for the model and task.
△ Less
Submitted 15 September, 2023; v1 submitted 14 March, 2022;
originally announced March 2022.
-
An Introduction to Multi-Agent Reinforcement Learning and Review of its Application to Autonomous Mobility
Authors:
Lukas M. Schmidt,
Johanna Brosig,
Axel Plinge,
Bjoern M. Eskofier,
Christopher Mutschler
Abstract:
Many scenarios in mobility and traffic involve multiple different agents that need to cooperate to find a joint solution. Recent advances in behavioral planning use Reinforcement Learning to find effective and performant behavior strategies. However, as autonomous vehicles and vehicle-to-X communications become more mature, solutions that only utilize single, independent agents leave potential per…
▽ More
Many scenarios in mobility and traffic involve multiple different agents that need to cooperate to find a joint solution. Recent advances in behavioral planning use Reinforcement Learning to find effective and performant behavior strategies. However, as autonomous vehicles and vehicle-to-X communications become more mature, solutions that only utilize single, independent agents leave potential performance gains on the road. Multi-Agent Reinforcement Learning (MARL) is a research field that aims to find optimal solutions for multiple agents that interact with each other. This work aims to give an overview of the field to researchers in autonomous mobility. We first explain MARL and introduce important concepts. Then, we discuss the central paradigms that underlie MARL algorithms, and give an overview of state-of-the-art methods and ideas in each paradigm. With this background, we survey applications of MARL in autonomous mobility scenarios and give an overview of existing scenarios and implementations.
△ Less
Submitted 2 August, 2022; v1 submitted 15 March, 2022;
originally announced March 2022.
-
Digitizing Handwriting with a Sensor Pen: A Writer-Independent Recognizer
Authors:
Mohamad Wehbi,
Tim Hamann,
Jens Barth,
Bjoern Eskofier
Abstract:
Online handwriting recognition has been studied for a long time with only few practicable results when writing on normal paper. Previous approaches using sensor-based devices encountered problems that limited the usage of the developed systems in real-world applications. This paper presents a writer-independent system that recognizes characters written on plain paper with the use of a sensor-equip…
▽ More
Online handwriting recognition has been studied for a long time with only few practicable results when writing on normal paper. Previous approaches using sensor-based devices encountered problems that limited the usage of the developed systems in real-world applications. This paper presents a writer-independent system that recognizes characters written on plain paper with the use of a sensor-equipped pen. This system is applicable in real-world applications and requires no user-specific training for recognition. The pen provides linear acceleration, angular velocity, magnetic field, and force applied by the user, and acts as a digitizer that transforms the analogue signals of the sensors into timeseries data while writing on regular paper. The dataset we collected with this pen consists of Latin lower-case and upper-case alphabets. We present the results of a convolutional neural network model for letter classification and show that this approach is practical and achieves promising results for writer-independent character recognition. This work aims at providing a realtime handwriting recognition system to be used for writing on normal paper.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Towards an IMU-based Pen Online Handwriting Recognizer
Authors:
Mohamad Wehbi,
Tim Hamann,
Jens Barth,
Peter Kaempf,
Dario Zanca,
Bjoern Eskofier
Abstract:
Most online handwriting recognition systems require the use of specific writing surfaces to extract positional data. In this paper we present a online handwriting recognition system for word recognition which is based on inertial measurement units (IMUs) for digitizing text written on paper. This is obtained by means of a sensor-equipped pen that provides acceleration, angular velocity, and magnet…
▽ More
Most online handwriting recognition systems require the use of specific writing surfaces to extract positional data. In this paper we present a online handwriting recognition system for word recognition which is based on inertial measurement units (IMUs) for digitizing text written on paper. This is obtained by means of a sensor-equipped pen that provides acceleration, angular velocity, and magnetic forces streamed via Bluetooth. Our model combines convolutional and bidirectional LSTM networks, and is trained with the Connectionist Temporal Classification loss that allows the interpretation of raw sensor data into words without the need of sequence segmentation. We use a dataset of words collected using multiple sensor-enhanced pens and evaluate our model on distinct test sets of seen and unseen words achieving a character error rate of 17.97% and 17.08%, respectively, without the use of a dictionary or language model
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
Exploring Misclassifications of Robust Neural Networks to Enhance Adversarial Attacks
Authors:
Leo Schwinn,
René Raab,
An Nguyen,
Dario Zanca,
Bjoern Eskofier
Abstract:
Progress in making neural networks more robust against adversarial attacks is mostly marginal, despite the great efforts of the research community. Moreover, the robustness evaluation is often imprecise, making it difficult to identify promising approaches. We analyze the classification decisions of 19 different state-of-the-art neural networks trained to be robust against adversarial attacks. Our…
▽ More
Progress in making neural networks more robust against adversarial attacks is mostly marginal, despite the great efforts of the research community. Moreover, the robustness evaluation is often imprecise, making it difficult to identify promising approaches. We analyze the classification decisions of 19 different state-of-the-art neural networks trained to be robust against adversarial attacks. Our findings suggest that current untargeted adversarial attacks induce misclassification towards only a limited amount of different classes. Additionally, we observe that both over- and under-confidence in model predictions result in an inaccurate assessment of model robustness. Based on these observations, we propose a novel loss function for adversarial attacks that consistently improves attack success rate compared to prior loss functions for 19 out of 19 analyzed models.
△ Less
Submitted 25 May, 2021; v1 submitted 21 May, 2021;
originally announced May 2021.
-
Rigid and non-rigid motion compensation in weight-bearing cone-beam CT of the knee using (noisy) inertial measurements
Authors:
Jennifer Maier,
Marlies Nitschke,
Jang-Hwan Choi,
Garry Gold,
Rebecca Fahrig,
Bjoern M. Eskofier,
Andreas Maier
Abstract:
Involuntary subject motion is the main source of artifacts in weight-bearing cone-beam CT of the knee. To achieve image quality for clinical diagnosis, the motion needs to be compensated. We propose to use inertial measurement units (IMUs) attached to the leg for motion estimation. We perform a simulation study using real motion recorded with an optical tracking system. Three IMU-based correction…
▽ More
Involuntary subject motion is the main source of artifacts in weight-bearing cone-beam CT of the knee. To achieve image quality for clinical diagnosis, the motion needs to be compensated. We propose to use inertial measurement units (IMUs) attached to the leg for motion estimation. We perform a simulation study using real motion recorded with an optical tracking system. Three IMU-based correction approaches are evaluated, namely rigid motion correction, non-rigid 2D projection deformation and non-rigid 3D dynamic reconstruction. We present an initialization process based on the system geometry. With an IMU noise simulation, we investigate the applicability of the proposed methods in real applications. All proposed IMU-based approaches correct motion at least as good as a state-of-the-art marker-based approach. The structural similarity index and the root mean squared error between motion-free and motion corrected volumes are improved by 24-35% and 78-85%, respectively, compared with the uncorrected case. The noise analysis shows that the noise levels of commercially available IMUs need to be improved by a factor of $10^5$ which is currently only achieved by specialized hardware not robust enough for the application. The presented study confirms the feasibility of this novel approach and defines improvements necessary for a real application.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Identifying Untrustworthy Predictions in Neural Networks by Geometric Gradient Analysis
Authors:
Leo Schwinn,
An Nguyen,
René Raab,
Leon Bungert,
Daniel Tenbrinck,
Dario Zanca,
Martin Burger,
Bjoern Eskofier
Abstract:
The susceptibility of deep neural networks to untrustworthy predictions, including out-of-distribution (OOD) data and adversarial examples, still prevent their widespread use in safety-critical applications. Most existing methods either require a re-training of a given model to achieve robust identification of adversarial attacks or are limited to out-of-distribution sample detection only. In this…
▽ More
The susceptibility of deep neural networks to untrustworthy predictions, including out-of-distribution (OOD) data and adversarial examples, still prevent their widespread use in safety-critical applications. Most existing methods either require a re-training of a given model to achieve robust identification of adversarial attacks or are limited to out-of-distribution sample detection only. In this work, we propose a geometric gradient analysis (GGA) to improve the identification of untrustworthy predictions without retraining of a given model. GGA analyzes the geometry of the loss landscape of neural networks based on the saliency maps of their respective input. To motivate the proposed approach, we provide theoretical connections between gradients' geometrical properties and local minima of the loss function. Furthermore, we demonstrate that the proposed method outperforms prior approaches in detecting OOD data and adversarial attacks, including state-of-the-art and adaptive attacks.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
System Design for a Data-driven and Explainable Customer Sentiment Monitor
Authors:
An Nguyen,
Stefan Foerstel,
Thomas Kittler,
Andrey Kurzyukov,
Leo Schwinn,
Dario Zanca,
Tobias Hipp,
Da Jun Sun,
Michael Schrapp,
Eva Rothgang,
Bjoern Eskofier
Abstract:
The most important goal of customer services is to keep the customer satisfied. However, service resources are always limited and must be prioritized. Therefore, it is important to identify customers who potentially become unsatisfied and might lead to escalations. Today this prioritization of customers is often done manually. Data science on IoT data (esp. log data) for machine health monitoring,…
▽ More
The most important goal of customer services is to keep the customer satisfied. However, service resources are always limited and must be prioritized. Therefore, it is important to identify customers who potentially become unsatisfied and might lead to escalations. Today this prioritization of customers is often done manually. Data science on IoT data (esp. log data) for machine health monitoring, as well as analytics on enterprise data for customer relationship management (CRM) have mainly been researched and applied independently. In this paper, we present a framework for a data-driven decision support system which combines IoT and enterprise data to model customer sentiment. Such decision support systems can help to prioritize customers and service resources to effectively troubleshoot problems or even avoid them. The framework is applied in a real-world case study with a major medical device manufacturer. This includes a fully automated and interpretable machine learning pipeline designed to meet the requirements defined with domain experts and end users. The overall framework is currently deployed, learns and evaluates predictive models from terabytes of IoT and enterprise data to actively monitor the customer sentiment for a fleet of thousands of high-end medical devices. Furthermore, we provide an anonymized industrial benchmark dataset for the research community.
△ Less
Submitted 11 January, 2021;
originally announced January 2021.
-
Dynamically Sampled Nonlocal Gradients for Stronger Adversarial Attacks
Authors:
Leo Schwinn,
An Nguyen,
René Raab,
Dario Zanca,
Bjoern Eskofier,
Daniel Tenbrinck,
Martin Burger
Abstract:
The vulnerability of deep neural networks to small and even imperceptible perturbations has become a central topic in deep learning research. Although several sophisticated defense mechanisms have been introduced, most were later shown to be ineffective. However, a reliable evaluation of model robustness is mandatory for deployment in safety-critical scenarios. To overcome this problem we propose…
▽ More
The vulnerability of deep neural networks to small and even imperceptible perturbations has become a central topic in deep learning research. Although several sophisticated defense mechanisms have been introduced, most were later shown to be ineffective. However, a reliable evaluation of model robustness is mandatory for deployment in safety-critical scenarios. To overcome this problem we propose a simple yet effective modification to the gradient calculation of state-of-the-art first-order adversarial attacks. Normally, the gradient update of an attack is directly calculated for the given data point. This approach is sensitive to noise and small local optima of the loss function. Inspired by gradient sampling techniques from non-convex optimization, we propose Dynamically Sampled Nonlocal Gradient Descent (DSNGD). DSNGD calculates the gradient direction of the adversarial attack as the weighted average over past gradients of the optimization history. Moreover, distribution hyperparameters that define the sampling operation are automatically learned during the optimization scheme. We empirically show that by incorporating this nonlocal gradient information, we are able to give a more accurate estimation of the global descent direction on noisy and non-convex loss surfaces. In addition, we show that DSNGD-based attacks are on average 35% faster while achieving 0.9% to 27.1% higher success rates compared to their gradient descent-based counterparts.
△ Less
Submitted 27 September, 2021; v1 submitted 5 November, 2020;
originally announced November 2020.
-
Conformance Checking for a Medical Training Process Using Petri net Simulation and Sequence Alignment
Authors:
An Nguyen,
Wenyu Zhang,
Leo Schwinn,
Bjoern Eskofier
Abstract:
Process Mining has recently gained popularity in healthcare due to its potential to provide a transparent, objective and data-based view on processes. Conformance checking is a sub-discipline of process mining that has the potential to answer how the actual process executions deviate from existing guidelines. In this work, we analyze a medical training process for a surgical procedure. Ten student…
▽ More
Process Mining has recently gained popularity in healthcare due to its potential to provide a transparent, objective and data-based view on processes. Conformance checking is a sub-discipline of process mining that has the potential to answer how the actual process executions deviate from existing guidelines. In this work, we analyze a medical training process for a surgical procedure. Ten students were trained to install a Central Venous Catheters (CVC) with ultrasound. Event log data was collected directly after instruction by the supervisors during a first test run and additionally after a subsequent individual training phase. In order to provide objective performance measures, we formulate an optimal, global sequence alignment problem inspired by approaches in bioinformatics. Therefore, we use the Petri net model representation of the medical process guideline to simulate a representative set of guideline conform sequences. Next, we calculate the optimal, global sequence alignment of the recorded and simulated event logs. Finally, the output measures and visualization of aligned sequences are provided for objective feedback.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Time Matters: Time-Aware LSTMs for Predictive Business Process Monitoring
Authors:
An Nguyen,
Srijeet Chatterjee,
Sven Weinzierl,
Leo Schwinn,
Martin Matzner,
Bjoern Eskofier
Abstract:
Predictive business process monitoring (PBPM) aims to predict future process behavior during ongoing process executions based on event log data. Especially, techniques for the next activity and timestamp prediction can help to improve the performance of operational business processes. Recently, many PBPM solutions based on deep learning were proposed by researchers. Due to the sequential nature of…
▽ More
Predictive business process monitoring (PBPM) aims to predict future process behavior during ongoing process executions based on event log data. Especially, techniques for the next activity and timestamp prediction can help to improve the performance of operational business processes. Recently, many PBPM solutions based on deep learning were proposed by researchers. Due to the sequential nature of event log data, a common choice is to apply recurrent neural networks with long short-term memory (LSTM) cells. We argue, that the elapsed time between events is informative. However, current PBPM techniques mainly use 'vanilla' LSTM cells and hand-crafted time-related control flow features. To better model the time dependencies between events, we propose a new PBPM technique based on time-aware LSTM (T-LSTM) cells. T-LSTM cells incorporate the elapsed time between consecutive events inherently to adjust the cell memory. Furthermore, we introduce cost-sensitive learning to account for the common class imbalance in event logs. Our experiments on publicly available benchmark event logs indicate the effectiveness of the introduced techniques.
△ Less
Submitted 5 November, 2020; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Inertial Measurements for Motion Compensation in Weight-bearing Cone-beam CT of the Knee
Authors:
Jennifer Maier,
Marlies Nitschke,
Jang-Hwan Choi,
Garry Gold,
Rebecca Fahrig,
Bjoern M. Eskofier,
Andreas Maier
Abstract:
Involuntary motion during weight-bearing cone-beam computed tomography (CT) scans of the knee causes artifacts in the reconstructed volumes making them unusable for clinical diagnosis. Currently, image-based or marker-based methods are applied to correct for this motion, but often require long execution or preparation times. We propose to attach an inertial measurement unit (IMU) containing an acc…
▽ More
Involuntary motion during weight-bearing cone-beam computed tomography (CT) scans of the knee causes artifacts in the reconstructed volumes making them unusable for clinical diagnosis. Currently, image-based or marker-based methods are applied to correct for this motion, but often require long execution or preparation times. We propose to attach an inertial measurement unit (IMU) containing an accelerometer and a gyroscope to the leg of the subject in order to measure the motion during the scan and correct for it. To validate this approach, we present a simulation study using real motion measured with an optical 3D tracking system. With this motion, an XCAT numerical knee phantom is non-rigidly deformed during a simulated CT scan creating motion corrupted projections. A biomechanical model is animated with the same tracked motion in order to generate measurements of an IMU placed below the knee. In our proposed multi-stage algorithm, these signals are transformed to the global coordinate system of the CT scan and applied for motion compensation during reconstruction. Our proposed approach can effectively reduce motion artifacts in the reconstructed volumes. Compared to the motion corrupted case, the average structural similarity index and root mean squared error with respect to the no-motion case improved by 13-21% and 68-70%, respectively. These results are qualitatively and quantitatively on par with a state-of-the-art marker-based method we compared our approach to. The presented study shows the feasibility of this novel approach, and yields promising results towards a purely IMU-based motion compensation in C-arm CT.
△ Less
Submitted 9 July, 2020;
originally announced July 2020.
-
An empirical comparison of deep-neural-network architectures for next activity prediction using context-enriched process event logs
Authors:
S. Weinzierl,
S. Zilker,
J. Brunk,
K. Revoredo,
A. Nguyen,
M. Matzner,
J. Becker,
B. Eskofier
Abstract:
Researchers have proposed a variety of predictive business process monitoring (PBPM) techniques aiming to predict future process behaviour during the process execution. Especially, techniques for the next activity prediction anticipate great potential in improving operational business processes. To gain more accurate predictions, a plethora of these techniques rely on deep neural networks (DNNs) a…
▽ More
Researchers have proposed a variety of predictive business process monitoring (PBPM) techniques aiming to predict future process behaviour during the process execution. Especially, techniques for the next activity prediction anticipate great potential in improving operational business processes. To gain more accurate predictions, a plethora of these techniques rely on deep neural networks (DNNs) and consider information about the context, in which the process is running. However, an in-depth comparison of such techniques is missing in the PBPM literature, which prevents researchers and practitioners from selecting the best solution for a given event log. To remedy this problem, we empirically evaluate the predictive quality of three promising DNN architectures, combined with five proven encoding techniques and based on five context-enriched real-life event logs. We provide four findings that can support researchers and practitioners in designing novel PBPM techniques for predicting the next activities.
△ Less
Submitted 3 May, 2020;
originally announced May 2020.
-
Towards Rapid and Robust Adversarial Training with One-Step Attacks
Authors:
Leo Schwinn,
René Raab,
Björn Eskofier
Abstract:
Adversarial training is the most successful empirical method for increasing the robustness of neural networks against adversarial attacks. However, the most effective approaches, like training with Projected Gradient Descent (PGD) are accompanied by high computational complexity. In this paper, we present two ideas that, in combination, enable adversarial training with the computationally less exp…
▽ More
Adversarial training is the most successful empirical method for increasing the robustness of neural networks against adversarial attacks. However, the most effective approaches, like training with Projected Gradient Descent (PGD) are accompanied by high computational complexity. In this paper, we present two ideas that, in combination, enable adversarial training with the computationally less expensive Fast Gradient Sign Method (FGSM). First, we add uniform noise to the initial data point of the FGSM attack, which creates a wider variety of adversaries, thus prohibiting overfitting to one particular perturbation bound. Further, we add a learnable regularization step prior to the neural network, which we call Pixelwise Noise Injection Layer (PNIL). Inputs propagated trough the PNIL are resampled from a learned Gaussian distribution. The regularization induced by the PNIL prevents the model form learning to obfuscate its gradients, a factor that hindered prior approaches from successfully applying one-step methods for adversarial training. We show that noise injection in conjunction with FGSM-based adversarial training achieves comparable results to adversarial training with PGD while being considerably faster. Moreover, we outperform PGD-based adversarial training by combining noise injection and PNIL.
△ Less
Submitted 17 March, 2020; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Sensor-based Gait Parameter Extraction with Deep Convolutional Neural Networks
Authors:
Julius Hannink,
Thomas Kautz,
Cristian F. Pasluosta,
Karl-Günter Gaßmann,
Jochen Klucken,
Bjoern M. Eskofier
Abstract:
Measurement of stride-related, biomechanical parameters is the common rationale for objective gait impairment scoring. State-of-the-art double integration approaches to extract these parameters from inertial sensor data are, however, limited in their clinical applicability due to the underlying assumptions. To overcome this, we present a method to translate the abstract information provided by wea…
▽ More
Measurement of stride-related, biomechanical parameters is the common rationale for objective gait impairment scoring. State-of-the-art double integration approaches to extract these parameters from inertial sensor data are, however, limited in their clinical applicability due to the underlying assumptions. To overcome this, we present a method to translate the abstract information provided by wearable sensors to context-related expert features based on deep convolutional neural networks. Regarding mobile gait analysis, this enables integration-free and data-driven extraction of a set of 8 spatio-temporal stride parameters. To this end, two modelling approaches are compared: A combined network estimating all parameters of interest and an ensemble approach that spawns less complex networks for each parameter individually. The ensemble approach is outperforming the combined modelling in the current application. On a clinically relevant and publicly available benchmark dataset, we estimate stride length, width and medio-lateral change in foot angle up to ${-0.15\pm6.09}$ cm, ${-0.09\pm4.22}$ cm and ${0.13 \pm 3.78^\circ}$ respectively. Stride, swing and stance time as well as heel and toe contact times are estimated up to ${\pm 0.07}$, ${\pm0.05}$, ${\pm 0.07}$, ${\pm0.07}$ and ${\pm0.12}$ s respectively. This is comparable to and in parts outperforming or defining state-of-the-art. Our results further indicate that the proposed change in methodology could substitute assumption-driven double-integration methods and enable mobile assessment of spatio-temporal stride parameters in clinically critical situations as e.g. in the case of spastic gait impairments.
△ Less
Submitted 13 January, 2017; v1 submitted 12 September, 2016;
originally announced September 2016.
-
Stride Length Estimation with Deep Learning
Authors:
Julius Hannink,
Thomas Kautz,
Cristian F. Pasluosta,
Jens Barth,
Samuel Schülein,
Karl-Günter Gaßmann,
Jochen Klucken,
Bjoern M. Eskofier
Abstract:
Accurate estimation of spatial gait characteristics is critical to assess motor impairments resulting from neurological or musculoskeletal disease. Currently, however, methodological constraints limit clinical applicability of state-of-the-art double integration approaches to gait patterns with a clear zero-velocity phase. We describe a novel approach to stride length estimation that uses deep con…
▽ More
Accurate estimation of spatial gait characteristics is critical to assess motor impairments resulting from neurological or musculoskeletal disease. Currently, however, methodological constraints limit clinical applicability of state-of-the-art double integration approaches to gait patterns with a clear zero-velocity phase. We describe a novel approach to stride length estimation that uses deep convolutional neural networks to map stride-specific inertial sensor data to the resulting stride length. The model is trained on a publicly available and clinically relevant benchmark dataset consisting of 1220 strides from 101 geriatric patients. Evaluation is done in a 10-fold cross validation and for three different stride definitions. Even though best results are achieved with strides defined from mid-stance to mid-stance with average accuracy and precision of 0.01 $\pm$ 5.37 cm, performance does not strongly depend on stride definition. The achieved precision outperforms state-of-the-art methods evaluated on this benchmark dataset by 3.0 cm (36%). Due to the independence of stride definition, the proposed method is not subject to the methodological constrains that limit applicability of state-of-the-art double integration methods. Furthermore, precision on the benchmark dataset could be improved. With more precise mobile stride length estimation, new insights to the progression of neurological disease or early indications might be gained. Due to the independence of stride definition, previously uncharted diseases in terms of mobile gait analysis can now be investigated by re-training and applying the proposed method.
△ Less
Submitted 9 March, 2017; v1 submitted 12 September, 2016;
originally announced September 2016.