-
Towards Physics-informed Cyclic Adversarial Multi-PSF Lensless Imaging
Authors:
Abeer Banerjee,
Sanjay Singh
Abstract:
Lensless imaging has emerged as a promising field within inverse imaging, offering compact, cost-effective solutions with the potential to revolutionize the computational camera market. By circumventing traditional optical components like lenses and mirrors, novel approaches like mask-based lensless imaging eliminate the need for conventional hardware. However, advancements in lensless image recon…
▽ More
Lensless imaging has emerged as a promising field within inverse imaging, offering compact, cost-effective solutions with the potential to revolutionize the computational camera market. By circumventing traditional optical components like lenses and mirrors, novel approaches like mask-based lensless imaging eliminate the need for conventional hardware. However, advancements in lensless image reconstruction, particularly those leveraging Generative Adversarial Networks (GANs), are hindered by the reliance on data-driven training processes, resulting in network specificity to the Point Spread Function (PSF) of the imaging system. This necessitates a complete retraining for minor PSF changes, limiting adaptability and generalizability across diverse imaging scenarios. In this paper, we introduce a novel approach to multi-PSF lensless imaging, employing a dual discriminator cyclic adversarial framework. We propose a unique generator architecture with a sparse convolutional PSF-aware auxiliary branch, coupled with a forward model integrated into the training loop to facilitate physics-informed learning to handle the substantial domain gap between lensless and lensed images. Comprehensive performance evaluation and ablation studies underscore the effectiveness of our model, offering robust and adaptable lensless image reconstruction capabilities. Our method achieves comparable performance to existing PSF-agnostic generative methods for single PSF cases and demonstrates resilience to PSF changes without the need for retraining.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Unified Anomaly Detection methods on Edge Device using Knowledge Distillation and Quantization
Authors:
Sushovan Jena,
Arya Pulkit,
Kajal Singh,
Anoushka Banerjee,
Sharad Joshi,
Ananth Ganesh,
Dinesh Singh,
Arnav Bhavsar
Abstract:
With the rapid advances in deep learning and smart manufacturing in Industry 4.0, there is an imperative for high-throughput, high-performance, and fully integrated visual inspection systems. Most anomaly detection approaches using defect detection datasets, such as MVTec AD, employ one-class models that require fitting separate models for each class. On the contrary, unified models eliminate the…
▽ More
With the rapid advances in deep learning and smart manufacturing in Industry 4.0, there is an imperative for high-throughput, high-performance, and fully integrated visual inspection systems. Most anomaly detection approaches using defect detection datasets, such as MVTec AD, employ one-class models that require fitting separate models for each class. On the contrary, unified models eliminate the need for fitting separate models for each class and significantly reduce cost and memory requirements. Thus, in this work, we experiment with considering a unified multi-class setup. Our experimental study shows that multi-class models perform at par with one-class models for the standard MVTec AD dataset. Hence, this indicates that there may not be a need to learn separate object/class-wise models when the object classes are significantly different from each other, as is the case of the dataset considered. Furthermore, we have deployed three different unified lightweight architectures on the CPU and an edge device (NVIDIA Jetson Xavier NX). We analyze the quantized multi-class anomaly detection models in terms of latency and memory requirements for deployment on the edge device while comparing quantization-aware training (QAT) and post-training quantization (PTQ) for performance at different precision widths. In addition, we explored two different methods of calibration required in post-training scenarios and show that one of them performs notably better, highlighting its importance for unsupervised tasks. Due to quantization, the performance drop in PTQ is further compensated by QAT, which yields at par performance with the original 32-bit Floating point in two of the models considered.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Achieving Observability on Fog Computing with the use of open-source tools
Authors:
Breno Costa,
Abhik Banerjee,
Prem Prakash Jayaraman,
Leonardo R. Carvalho,
João Bachiega Jr.,
Aleteia Araujo
Abstract:
Fog computing can provide computational resources and low-latency communication at the network edge. But with it comes uncertainties that must be managed in order to guarantee Service Level Agreements. Service observability can help the environment better deal with uncertainties, delivering relevant and up-to-date information in a timely manner to support decision making. Observability is consider…
▽ More
Fog computing can provide computational resources and low-latency communication at the network edge. But with it comes uncertainties that must be managed in order to guarantee Service Level Agreements. Service observability can help the environment better deal with uncertainties, delivering relevant and up-to-date information in a timely manner to support decision making. Observability is considered a superset of monitoring since it uses not only performance metrics, but also other instrumentation domains such as logs and traces. However, as Fog Computing is typically characterised by resource-constrained nodes and network uncertainties, increasing observability in fog can be risky due to the additional load injected into a restricted environment. There is no work in the literature that evaluated fog observability. In this paper, we first outline the challenges of achieving observability in a Fog environment, based on which we present a formal definition of fog observability. Subsequently, a real-world Fog Computing testbed running a smart city use case is deployed, and an empirical evaluation of fog observability using open-source tools is presented. The results show that under certain conditions, it is viable to provide observability in a Fog Computing environment using open-source tools, although it is necessary to control the overhead modifying their default configuration according to the application characteristics.
△ Less
Submitted 25 May, 2024;
originally announced July 2024.
-
A Hybrid Approach to Monitor Context Parameters for Optimising Caching for Context-Aware IoT Applications
Authors:
Ashish Manchanda,
Prem Prakash Jayaraman,
Abhik Banerjee,
Arkady Zaslavsky,
Shakthi Weerasinghe,
Guang-Li Huang
Abstract:
Internet of Things (IoT) has seen a prolific rise in recent times and provides the ability to solve several key challenges faced by our societies and environment. Data produced by IoT provides a significant opportunity to infer context that is key for IoT applications to make decisions/actuations. Context Management Platform (CMP) is a middleware to facilitate the exchange and management of such c…
▽ More
Internet of Things (IoT) has seen a prolific rise in recent times and provides the ability to solve several key challenges faced by our societies and environment. Data produced by IoT provides a significant opportunity to infer context that is key for IoT applications to make decisions/actuations. Context Management Platform (CMP) is a middleware to facilitate the exchange and management of such context information among IoT applications. In this paper, we propose a novel approach to monitoring context freshness as a key metric, to improving the CMP's caching performance to support the real-time context needs of IoT applications. Our proposed hybrid algorithm uses Analytic Hierarchy Process (AHP) and Sliding Window technique to ensure the most relevant (as needed by the IoT applications) context information is cached. By continuously monitoring and prioritizing context attributes, the strategy adapts to IoT environment changes, keeping cached context fresh and reliable. Through experimental evaluation and using mock data obtained from a real-world mobile IoT scenario in section~\ref{use case}, we demonstrate that the proposed algorithm can substantially enhance context cache performance, by monitoring the context attributes in real time.
△ Less
Submitted 30 April, 2024;
originally announced July 2024.
-
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Authors:
Jordy Van Landeghem,
Subhajit Maity,
Ayan Banerjee,
Matthew Blaschko,
Marie-Francine Moens,
Josep Lladós,
Sanket Biswas
Abstract:
This work explores knowledge distillation (KD) for visually-rich document (VRD) applications such as document layout analysis (DLA) and document image classification (DIC). While VRD research is dependent on increasingly sophisticated and cumbersome models, the field has neglected to study efficiency via model compression. Here, we design a KD experimentation methodology for more lean, performant…
▽ More
This work explores knowledge distillation (KD) for visually-rich document (VRD) applications such as document layout analysis (DLA) and document image classification (DIC). While VRD research is dependent on increasingly sophisticated and cumbersome models, the field has neglected to study efficiency via model compression. Here, we design a KD experimentation methodology for more lean, performant models on document understanding (DU) tasks that are integral within larger task pipelines. We carefully selected KD strategies (response-based, feature-based) for distilling knowledge to and from backbones with different architectures (ResNet, ViT, DiT) and capacities (base, small, tiny). We study what affects the teacher-student knowledge gap and find that some methods (tuned vanilla KD, MSE, SimKD with an apt projector) can consistently outperform supervised student training. Furthermore, we design downstream task setups to evaluate covariate shift and the robustness of distilled DLA models on zero-shot layout-aware document visual question answering (DocVQA). DLA-KD experiments result in a large mAP knowledge gap, which unpredictably translates to downstream robustness, accentuating the need to further explore how to efficiently obtain more semantic document layout awareness.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
Authors:
Arindam Banerjee,
Qiaobo Li,
Yingxue Zhou
Abstract:
Generalization and optimization guarantees on the population loss in machine learning often rely on uniform convergence based analysis, typically based on the Rademacher complexity of the predictors. The rich representation power of modern models has led to concerns about this approach. In this paper, we present generalization and optimization guarantees in terms of the complexity of the gradients…
▽ More
Generalization and optimization guarantees on the population loss in machine learning often rely on uniform convergence based analysis, typically based on the Rademacher complexity of the predictors. The rich representation power of modern models has led to concerns about this approach. In this paper, we present generalization and optimization guarantees in terms of the complexity of the gradients, as measured by the Loss Gradient Gaussian Width (LGGW). First, we introduce generalization guarantees directly in terms of the LGGW under a flexible gradient domination condition, which we demonstrate to hold empirically for deep models. Second, we show that sample reuse in finite sum (stochastic) optimization does not make the empirical gradient deviate from the population gradient as long as the LGGW is small. Third, focusing on deep networks, we present results showing how to bound their LGGW under mild assumptions. In particular, we show that their LGGW can be bounded (a) by the $L_2$-norm of the loss Hessian eigenvalues, which has been empirically shown to be $\tilde{O}(1)$ for commonly used deep models; and (b) in terms of the Gaussian width of the featurizer, i.e., the output of the last-but-one layer. To our knowledge, our generalization and optimization guarantees in terms of LGGW are the first results of its kind, avoid the pitfalls of predictor Rademacher complexity based analysis, and hold considerable promise towards quantitatively tight bounds for deep models.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices
Authors:
Xingjian Yang,
Zhitao Yu,
Ashis G. Banerjee
Abstract:
As robotics and augmented reality applications increasingly rely on precise and efficient 6D object pose estimation, real-time performance on edge devices is required for more interactive and responsive systems. Our proposed Sparse Color-Code Net (SCCN) embodies a clear and concise pipeline design to effectively address this requirement. SCCN performs pixel-level predictions on the target object i…
▽ More
As robotics and augmented reality applications increasingly rely on precise and efficient 6D object pose estimation, real-time performance on edge devices is required for more interactive and responsive systems. Our proposed Sparse Color-Code Net (SCCN) embodies a clear and concise pipeline design to effectively address this requirement. SCCN performs pixel-level predictions on the target object in the RGB image, utilizing the sparsity of essential object geometry features to speed up the Perspective-n-Point (PnP) computation process. Additionally, it introduces a novel pixel-level geometry-based object symmetry representation that seamlessly integrates with the initial pose predictions, effectively addressing symmetric object ambiguities. SCCN notably achieves an estimation rate of 19 frames per second (FPS) and 6 FPS on the benchmark LINEMOD dataset and the Occlusion LINEMOD dataset, respectively, for an NVIDIA Jetson AGX Xavier, while consistently maintaining high estimation accuracy at these rates.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Efficient Trajectory Inference in Wasserstein Space Using Consecutive Averaging
Authors:
Amartya Banerjee,
Harlin Lee,
Nir Sharon,
Caroline Moosmüller
Abstract:
Capturing data from dynamic processes through cross-sectional measurements is seen in many fields such as computational biology. Trajectory inference deals with the challenge of reconstructing continuous processes from such observations. In this work, we propose methods for B-spline approximation and interpolation of point clouds through consecutive averaging that is instrinsic to the Wasserstein…
▽ More
Capturing data from dynamic processes through cross-sectional measurements is seen in many fields such as computational biology. Trajectory inference deals with the challenge of reconstructing continuous processes from such observations. In this work, we propose methods for B-spline approximation and interpolation of point clouds through consecutive averaging that is instrinsic to the Wasserstein space. Combining subdivision schemes with optimal transport-based geodesic, our methods carry out trajectory inference at a chosen level of precision and smoothness, and can automatically handle scenarios where particles undergo division over time. We rigorously evaluate our method by providing convergence guarantees and testing it on simulated cell data characterized by bifurcations and merges, comparing its performance against state-of-the-art trajectory inference and interpolation methods. The results not only underscore the effectiveness of our method in inferring trajectories, but also highlight the benefit of performing interpolation and approximation that respect the inherent geometric properties of the data.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Feasibility and benefits of joint learning from MRI databases with different brain diseases and modalities for segmentation
Authors:
Wentian Xu,
Matthew Moffat,
Thalia Seale,
Ziyun Liang,
Felix Wagner,
Daniel Whitehouse,
David Menon,
Virginia Newcombe,
Natalie Voets,
Abhirup Banerjee,
Konstantinos Kamnitsas
Abstract:
Models for segmentation of brain lesions in multi-modal MRI are commonly trained for a specific pathology using a single database with a predefined set of MRI modalities, determined by a protocol for the specific disease. This work explores the following open questions: Is it feasible to train a model using multiple databases that contain varying sets of MRI modalities and annotations for differen…
▽ More
Models for segmentation of brain lesions in multi-modal MRI are commonly trained for a specific pathology using a single database with a predefined set of MRI modalities, determined by a protocol for the specific disease. This work explores the following open questions: Is it feasible to train a model using multiple databases that contain varying sets of MRI modalities and annotations for different brain pathologies? Will this joint learning benefit performance on the sets of modalities and pathologies available during training? Will it enable analysis of new databases with different sets of modalities and pathologies? We develop and compare different methods and show that promising results can be achieved with appropriate, simple and practical alterations to the model and training framework. We experiment with 7 databases containing 5 types of brain pathologies and different sets of MRI modalities. Results demonstrate, for the first time, that joint training on multi-modal MRI databases with different brain pathologies and sets of modalities is feasible and offers practical benefits. It enables a single model to segment pathologies encountered during training in diverse sets of modalities, while facilitating segmentation of new types of pathologies such as via follow-up fine-tuning. The insights this study provides into the potential and limitations of this paradigm should prove useful for guiding future advances in the direction. Code and pretrained models: https://github.com/WenTXuL/MultiUnet
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Dynamic Model Predictive Shielding for Provably Safe Reinforcement Learning
Authors:
Arko Banerjee,
Kia Rahmani,
Joydeep Biswas,
Isil Dillig
Abstract:
Among approaches for provably safe reinforcement learning, Model Predictive Shielding (MPS) has proven effective at complex tasks in continuous, high-dimensional state spaces, by leveraging a backup policy to ensure safety when the learned policy attempts to take risky actions. However, while MPS can ensure safety both during and after training, it often hinders task progress due to the conservati…
▽ More
Among approaches for provably safe reinforcement learning, Model Predictive Shielding (MPS) has proven effective at complex tasks in continuous, high-dimensional state spaces, by leveraging a backup policy to ensure safety when the learned policy attempts to take risky actions. However, while MPS can ensure safety both during and after training, it often hinders task progress due to the conservative and task-oblivious nature of backup policies. This paper introduces Dynamic Model Predictive Shielding (DMPS), which optimizes reinforcement learning objectives while maintaining provable safety. DMPS employs a local planner to dynamically select safe recovery actions that maximize both short-term progress as well as long-term rewards. Crucially, the planner and the neural policy play a synergistic role in DMPS. When planning recovery actions for ensuring safety, the planner utilizes the neural policy to estimate long-term rewards, allowing it to observe beyond its short-term planning horizon. Conversely, the neural policy under training learns from the recovery plans proposed by the planner, converging to policies that are both high-performing and safe in practice. This approach guarantees safety during and after training, with bounded recovery regret that decreases exponentially with planning horizon depth. Experimental results demonstrate that DMPS converges to policies that rarely require shield interventions after training and achieve higher rewards compared to several state-of-the-art baselines.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
CPS-LLM: Large Language Model based Safe Usage Plan Generator for Human-in-the-Loop Human-in-the-Plant Cyber-Physical System
Authors:
Ayan Banerjee,
Aranyak Maity,
Payal Kamboj,
Sandeep K. S. Gupta
Abstract:
We explore the usage of large language models (LLM) in human-in-the-loop human-in-the-plant cyber-physical systems (CPS) to translate a high-level prompt into a personalized plan of actions, and subsequently convert that plan into a grounded inference of sequential decision-making automated by a real-world CPS controller to achieve a control goal. We show that it is relatively straightforward to c…
▽ More
We explore the usage of large language models (LLM) in human-in-the-loop human-in-the-plant cyber-physical systems (CPS) to translate a high-level prompt into a personalized plan of actions, and subsequently convert that plan into a grounded inference of sequential decision-making automated by a real-world CPS controller to achieve a control goal. We show that it is relatively straightforward to contextualize an LLM so it can generate domain-specific plans. However, these plans may be infeasible for the physical system to execute or the plan may be unsafe for human users. To address this, we propose CPS-LLM, an LLM retrained using an instruction tuning framework, which ensures that generated plans not only align with the physical system dynamics of the CPS but are also safe for human users. The CPS-LLM consists of two innovative components: a) a liquid time constant neural network-based physical dynamics coefficient estimator that can derive coefficients of dynamical models with some unmeasured state variables; b) the model coefficients are then used to train an LLM with prompts embodied with traces from the dynamical system and the corresponding model coefficients. We show that when the CPS-LLM is integrated with a contextualized chatbot such as BARD it can generate feasible and safe plans to manage external events such as meals for automated insulin delivery systems used by Type 1 Diabetes subjects.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
A User Interface Study on Sustainable City Trip Recommendations
Authors:
Ashmi Banerjee,
Tunar Mahmudov,
Wolfgang Wörndl
Abstract:
The importance of promoting sustainable and environmentally responsible practices is becoming increasingly recognized in all domains, including tourism. The impact of tourism extends beyond its immediate stakeholders and affects passive participants such as the environment, local businesses, and residents. City trips, in particular, offer significant opportunities to encourage sustainable tourism…
▽ More
The importance of promoting sustainable and environmentally responsible practices is becoming increasingly recognized in all domains, including tourism. The impact of tourism extends beyond its immediate stakeholders and affects passive participants such as the environment, local businesses, and residents. City trips, in particular, offer significant opportunities to encourage sustainable tourism practices by directing travelers towards destinations that minimize environmental impact while providing enriching experiences. Tourism Recommender Systems (TRS) can play a critical role in this. By integrating sustainability features in TRS, travelers can be guided towards destinations that meet their preferences and align with sustainability objectives.
This paper investigates how different user interface design elements affect the promotion of sustainable city trip choices. We explore the impact of various features on user decisions, including sustainability labels for transportation modes and their emissions, popularity indicators for destinations, seasonality labels reflecting crowd levels for specific months, and an overall sustainability composite score. Through a user study involving mockups, participants evaluated the helpfulness of these features in guiding them toward more sustainable travel options.
Our findings indicate that sustainability labels significantly influence users towards lower-carbon footprint options, while popularity and seasonality indicators guide users to less crowded and more seasonally appropriate destinations. This study emphasizes the importance of providing users with clear and informative sustainability information, which can help them make more sustainable travel choices. It lays the groundwork for future applications that can recommend sustainable destinations in real-time.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Attend, Distill, Detect: Attention-aware Entropy Distillation for Anomaly Detection
Authors:
Sushovan Jena,
Vishwas Saini,
Ujjwal Shaw,
Pavitra Jain,
Abhay Singh Raihal,
Anoushka Banerjee,
Sharad Joshi,
Ananth Ganesh,
Arnav Bhavsar
Abstract:
Unsupervised anomaly detection encompasses diverse applications in industrial settings where a high-throughput and precision is imperative. Early works were centered around one-class-one-model paradigm, which poses significant challenges in large-scale production environments. Knowledge-distillation based multi-class anomaly detection promises a low latency with a reasonably good performance but w…
▽ More
Unsupervised anomaly detection encompasses diverse applications in industrial settings where a high-throughput and precision is imperative. Early works were centered around one-class-one-model paradigm, which poses significant challenges in large-scale production environments. Knowledge-distillation based multi-class anomaly detection promises a low latency with a reasonably good performance but with a significant drop as compared to one-class version. We propose a DCAM (Distributed Convolutional Attention Module) which improves the distillation process between teacher and student networks when there is a high variance among multiple classes or objects. Integrated multi-scale feature matching strategy to utilise a mixture of multi-level knowledge from the feature pyramid of the two networks, intuitively helping in detecting anomalies of varying sizes which is also an inherent problem in the multi-class scenario. Briefly, our DCAM module consists of Convolutional Attention blocks distributed across the feature maps of the student network, which essentially learns to masks the irrelevant information during student learning alleviating the "cross-class interference" problem. This process is accompanied by minimizing the relative entropy using KL-Divergence in Spatial dimension and a Channel-wise Cosine Similarity between the same feature maps of teacher and student. The losses enables to achieve scale-invariance and capture non-linear relationships. We also highlight that the DCAM module would only be used during training and not during inference as we only need the learned feature maps and losses for anomaly scoring and hence, gaining a performance gain of 3.92% than the multi-class baseline with a preserved latency.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Learning label-label correlations in Extreme Multi-label Classification via Label Features
Authors:
Siddhant Kharbanda,
Devaansh Gupta,
Erik Schultheis,
Atmadeep Banerjee,
Cho-Jui Hsieh,
Rohit Babbar
Abstract:
Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices. Recent works in this domain have increasingly focused on a symmetric problem setting where both input instances and label features are short-text in nature. Short-text XMC with label features has found numerous applications in a…
▽ More
Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices. Recent works in this domain have increasingly focused on a symmetric problem setting where both input instances and label features are short-text in nature. Short-text XMC with label features has found numerous applications in areas such as query-to-ad-phrase matching in search ads, title-based product recommendation, prediction of related searches. In this paper, we propose Gandalf, a novel approach which makes use of a label co-occurrence graph to leverage label features as additional data points to supplement the training distribution. By exploiting the characteristics of the short-text XMC problem, it leverages the label features to construct valid training instances, and uses the label graph for generating the corresponding soft-label targets, hence effectively capturing the label-label correlations. Surprisingly, models trained on these new training instances, although being less than half of the original dataset, can outperform models trained on the original dataset, particularly on the PSP@k metric for tail labels. With this insight, we aim to train existing XMC algorithms on both, the original and new training instances, leading to an average 5% relative improvements for 6 state-of-the-art algorithms across 4 benchmark datasets consisting of up to 1.3M labels. Gandalf can be applied in a plug-and-play manner to various methods and thus forwards the state-of-the-art in the domain, without incurring any additional computational overheads.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Toward Automated Formation of Composite Micro-Structures Using Holographic Optical Tweezers
Authors:
Tommy Zhang,
Nicole Werner,
Ashis G. Banerjee
Abstract:
Holographic Optical Tweezers (HOT) are powerful tools that can manipulate micro and nano-scale objects with high accuracy and precision. They are most commonly used for biological applications, such as cellular studies, and more recently, micro-structure assemblies. Automation has been of significant interest in the HOT field, since human-run experiments are time-consuming and require skilled oper…
▽ More
Holographic Optical Tweezers (HOT) are powerful tools that can manipulate micro and nano-scale objects with high accuracy and precision. They are most commonly used for biological applications, such as cellular studies, and more recently, micro-structure assemblies. Automation has been of significant interest in the HOT field, since human-run experiments are time-consuming and require skilled operator(s). Automated HOTs, however, commonly use point traps, which focus high intensity laser light at specific spots in fluid media to attract and move micro-objects. In this paper, we develop a novel automated system of tweezing multiple micro-objects more efficiently using multiplexed optical traps. Multiplexed traps enable the simultaneous trapping of multiple beads in various alternate multiplexing formations, such as annular rings and line patterns. Our automated system is realized by augmenting the capabilities of a commercially available HOT with real-time bead detection and tracking, and wavefront-based path planning. We demonstrate the usefulness of the system by assembling two different composite micro-structures, comprising 5 $μm$ polystyrene beads, using both annular and line shaped traps in obstacle-rich environments.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout
Authors:
Ayan Banerjee,
Nityanand Mathur,
Josep Lladós,
Umapada Pal,
Anjan Dutta
Abstract:
Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vect…
▽ More
Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vector graphics depicting entire scenes from textual descriptions. Utilizing a pre-trained LLM for layout generation from text prompts, this framework introduces a technique for producing masked latents in specified bounding boxes for accurate object placement. It introduces a fusion mechanism for integrating attention maps and employs a diffusion U-Net for coherent composition, speeding up the drawing process. The resulting SVG is optimized using a pre-trained encoder and LPIPS loss with opacity modulation to maximize similarity. Additionally, this work explores the potential of primitive shapes in facilitating canvas completion in constrained environments. Through both qualitative and quantitative assessments, SVGCraft is demonstrated to surpass prior works in abstraction, recognizability, and detail, as evidenced by its performance metrics (CLIP-T: 0.4563, Cosine Similarity: 0.6342, Confusion: 0.66, Aesthetic: 6.7832). The code will be available at https://github.com/ayanban011/SVGCraft.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Modeling Sustainable City Trips: Integrating CO2e Emissions, Popularity, and Seasonality into Tourism Recommender Systems
Authors:
Ashmi Banerjee,
Tunar Mahmudov,
Emil Adler,
Fitri Nur Aisyah,
Wolfgang Wörndl
Abstract:
Tourism affects not only the tourism industry but also society and stakeholders such as the environment, local businesses, and residents. Tourism Recommender Systems (TRS) can be pivotal in promoting sustainable tourism by guiding travelers toward destinations with minimal negative impact. Our paper introduces a composite sustainability indicator for a city trip TRS based on the users' starting po…
▽ More
Tourism affects not only the tourism industry but also society and stakeholders such as the environment, local businesses, and residents. Tourism Recommender Systems (TRS) can be pivotal in promoting sustainable tourism by guiding travelers toward destinations with minimal negative impact. Our paper introduces a composite sustainability indicator for a city trip TRS based on the users' starting point and month of travel. This indicator integrates CO2e emissions for different transportation modes and analyses destination popularity and seasonal demand. We quantify city popularity based on user reviews, points of interest, and search trends from Tripadvisor and Google Trends data. To calculate a seasonal demand index, we leverage data from TourMIS and Airbnb. We conducted a user study to explore the fundamental trade-offs in travel decision-making and determine the weights for our proposed indicator. Finally, we demonstrate the integration of this indicator into a TRS, illustrating its ability to deliver sustainable city trip recommendations. This work lays the foundation for future research by integrating sustainability measures and contributing to responsible recommendations by TRS.
△ Less
Submitted 3 May, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Large Language Model-informed ECG Dual Attention Network for Heart Failure Risk Prediction
Authors:
Chen Chen,
Lei Li,
Marcel Beetz,
Abhirup Banerjee,
Ramneek Gupta,
Vicente Grau
Abstract:
Heart failure (HF) poses a significant public health challenge, with a rising global mortality rate. Early detection and prevention of HF could significantly reduce its impact. We introduce a novel methodology for predicting HF risk using 12-lead electrocardiograms (ECGs). We present a novel, lightweight dual-attention ECG network designed to capture complex ECG features essential for early HF ris…
▽ More
Heart failure (HF) poses a significant public health challenge, with a rising global mortality rate. Early detection and prevention of HF could significantly reduce its impact. We introduce a novel methodology for predicting HF risk using 12-lead electrocardiograms (ECGs). We present a novel, lightweight dual-attention ECG network designed to capture complex ECG features essential for early HF risk prediction, despite the notable imbalance between low and high-risk groups. This network incorporates a cross-lead attention module and twelve lead-specific temporal attention modules, focusing on cross-lead interactions and each lead's local dynamics. To further alleviate model overfitting, we leverage a large language model (LLM) with a public ECG-Report dataset for pretraining on an ECG-report alignment task. The network is then fine-tuned for HF risk prediction using two specific cohorts from the UK Biobank study, focusing on patients with hypertension (UKB-HYP) and those who have had a myocardial infarction (UKB-MI).The results reveal that LLM-informed pre-training substantially enhances HF risk prediction in these cohorts. The dual-attention design not only improves interpretability but also predictive accuracy, outperforming existing competitive methods with C-index scores of 0.6349 for UKB-HYP and 0.5805 for UKB-MI. This demonstrates our method's potential in advancing HF risk assessment with clinical complex ECG data.
△ Less
Submitted 22 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Data-Driven Ergonomic Risk Assessment of Complex Hand-intensive Manufacturing Processes
Authors:
Anand Krishnan,
Xingjian Yang,
Utsav Seth,
Jonathan M. Jeyachandran,
Jonathan Y. Ahn,
Richard Gardner,
Samuel F. Pedigo,
Adriana,
Blom-Schieber,
Ashis G. Banerjee,
Krithika Manohar
Abstract:
Hand-intensive manufacturing processes, such as composite layup and textile draping, require significant human dexterity to accommodate task complexity. These strenuous hand motions often lead to musculoskeletal disorders and rehabilitation surgeries. We develop a data-driven ergonomic risk assessment system with a special focus on hand and finger activity to better identify and address ergonomic…
▽ More
Hand-intensive manufacturing processes, such as composite layup and textile draping, require significant human dexterity to accommodate task complexity. These strenuous hand motions often lead to musculoskeletal disorders and rehabilitation surgeries. We develop a data-driven ergonomic risk assessment system with a special focus on hand and finger activity to better identify and address ergonomic issues related to hand-intensive manufacturing processes. The system comprises a multi-modal sensor testbed to collect and synchronize operator upper body pose, hand pose and applied forces; a Biometric Assessment of Complete Hand (BACH) formulation to measure high-fidelity hand and finger risks; and industry-standard risk scores associated with upper body posture, RULA, and hand activity, HAL. Our findings demonstrate that BACH captures injurious activity with a higher granularity in comparison to the existing metrics. Machine learning models are also used to automate RULA and HAL scoring, and generalize well to unseen participants. Our assessment system, therefore, provides ergonomic interpretability of the manufacturing processes studied, and could be used to mitigate risks through minor workplace optimization and posture corrections.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks
Authors:
Abeer Banerjee,
Naval K. Mehta,
Shyam S. Prasad,
Himanshu,
Sumeet Saurav,
Sanjay Singh
Abstract:
In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding metho…
▽ More
In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding method seamlessly integrates Dynamic Vision Sensor (DVS) events with grayscale guide frames, generating consecutively encoded images for input into our neural network. This unique solution not only captures diverse gaze responses from participants within the active age group but also introduces a curated dataset tailored for low-light conditions. The encoded temporal frames paired with our network showcase impressive spatial localization and reliable gaze direction in their predictions. Achieving a remarkable 100-pixel accuracy of 100%, our research underscores the potency of our neural network to work with temporally consecutive encoded images for precise gaze vector predictions in challenging low-light videos, contributing to the advancement of gaze prediction technologies.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market Analysis
Authors:
Agam Shah,
Arnav Hiray,
Pratvi Shah,
Arkaprabha Banerjee,
Anushka Singh,
Dheeraj Eidnani,
Bhaskar Chaudhury,
Sudheer Chava
Abstract:
In this paper, we investigate the influence of claims in analyst reports and earnings calls on financial market returns, considering them as significant quarterly events for publicly traded companies. To facilitate a comprehensive analysis, we construct a new financial dataset for the claim detection task in the financial domain. We benchmark various language models on this dataset and propose a n…
▽ More
In this paper, we investigate the influence of claims in analyst reports and earnings calls on financial market returns, considering them as significant quarterly events for publicly traded companies. To facilitate a comprehensive analysis, we construct a new financial dataset for the claim detection task in the financial domain. We benchmark various language models on this dataset and propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function, outperforming existing approaches. Furthermore, we demonstrate the practical utility of our proposed model by constructing a novel measure ``optimism". Furthermore, we observed the dependence of earnings surprise and return on our optimism measure. Our dataset, models, and code will be made publicly (under CC BY 4.0 license) available on GitHub and Hugging Face.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation
Authors:
Ayan Banerjee,
Sanket Biswas,
Josep Lladós,
Umapada Pal
Abstract:
Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constr…
▽ More
Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constrained devices. Knowledge distillation allows us to create small and more efficient models that retain much of the performance of their larger counterparts. Here we present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image. Here, we design a structured graph with nodes containing proposal-level features and edges representing the relationship between the different proposal regions. Also, to reduce text bias an adaptive node sampling strategy is designed to prune the weight distribution and put more weightage on non-text nodes. We encode the complete graph as a knowledge representation and transfer it from the teacher to the student through the proposed distillation loss by effectively capturing both local and global information concurrently. Extensive experimentation on competitive benchmarks demonstrates that the proposed framework outperforms the current state-of-the-art approaches. The code will be available at: https://github.com/ayanban011/GraphKD.
△ Less
Submitted 20 February, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
On Experimental Emulation of Printability and Fleet Aware Generic Mesh Decomposition for Enabling Aerial 3D Printing
Authors:
Marios-Nektarios Stamatopoulos,
Avijit Banerjee,
George Nikolakopoulos
Abstract:
This article introduces an experimental emulation of a novel chunk-based flexible multi-DoF aerial 3D printing framework. The experimental demonstration of the overall autonomy focuses on precise motion planning and task allocation for a UAV, traversing through a series of planned space-filling paths involved in the aerial 3D printing process without physically depositing the overlaying material.…
▽ More
This article introduces an experimental emulation of a novel chunk-based flexible multi-DoF aerial 3D printing framework. The experimental demonstration of the overall autonomy focuses on precise motion planning and task allocation for a UAV, traversing through a series of planned space-filling paths involved in the aerial 3D printing process without physically depositing the overlaying material. The flexible multi-DoF aerial 3D printing is a newly developed framework and has the potential to strategically distribute the envisioned 3D model to be printed into small, manageable chunks suitable for distributed 3D printing. Moreover, by harnessing the dexterous flexibility due to the 6 DoF motion of UAV, the framework enables the provision of integrating the overall autonomy stack, potentially opening up an entirely new frontier in additive manufacturing. However, it's essential to note that the feasibility of this pioneering concept is still in its very early stage of development, which yet needs to be experimentally verified. Towards this direction, experimental emulation serves as the crucial stepping stone, providing a pseudo mockup scenario by virtual material deposition, helping to identify technological gaps from simulation to reality. Experimental emulation results, supported by critical analysis and discussion, lay the foundation for addressing the technological and research challenges to significantly push the boundaries of the state-of-the-art 3D printing mechanism.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Aalap: AI Assistant for Legal & Paralegal Functions in India
Authors:
Aman Tiwari,
Prathamesh Kalamkar,
Atreyo Banerjee,
Saurabh Karn,
Varun Hemachandran,
Smita Gupta
Abstract:
Using proprietary Large Language Models on legal tasks poses challenges due to data privacy issues, domain data heterogeneity, domain knowledge sophistication, and domain objectives uniqueness. We created Aalalp, a fine-tuned Mistral 7B model on instructions data related to specific Indian legal tasks. The performance of Aalap is better than gpt-3.5-turbo in 31\% of our test data and obtains an eq…
▽ More
Using proprietary Large Language Models on legal tasks poses challenges due to data privacy issues, domain data heterogeneity, domain knowledge sophistication, and domain objectives uniqueness. We created Aalalp, a fine-tuned Mistral 7B model on instructions data related to specific Indian legal tasks. The performance of Aalap is better than gpt-3.5-turbo in 31\% of our test data and obtains an equivalent score in 34\% of the test data as evaluated by GPT4. Training Aalap mainly focuses on teaching legal reasoning rather than legal recall. Aalap is definitely helpful for the day-to-day activities of lawyers, judges, or anyone working in legal systems.
△ Less
Submitted 30 January, 2024;
originally announced February 2024.
-
Correcting a Single Deletion in Reads from a Nanopore Sequencer
Authors:
Anisha Banerjee,
Yonatan Yehezkeally,
Antonia Wachter-Zeh,
Eitan Yaakobi
Abstract:
Owing to its several merits over other DNA sequencing technologies, nanopore sequencers hold an immense potential to revolutionize the efficiency of DNA storage systems. However, their higher error rates necessitate further research to devise practical and efficient coding schemes that would allow accurate retrieval of the data stored. Our work takes a step in this direction by adopting a simplifi…
▽ More
Owing to its several merits over other DNA sequencing technologies, nanopore sequencers hold an immense potential to revolutionize the efficiency of DNA storage systems. However, their higher error rates necessitate further research to devise practical and efficient coding schemes that would allow accurate retrieval of the data stored. Our work takes a step in this direction by adopting a simplified model of the nanopore sequencer inspired by Mao \emph{et al.}, which incorporates some of its physical aspects. This channel model can be viewed as a sliding window of length $\ell$ that passes over the incoming input sequence and produces the Hamming weight of the enclosed $\ell$ bits, while shifting by one position at each time step. The resulting $(\ell+1)$-ary vector, referred to as the $\ell$-\emph{read vector}, is susceptible to deletion errors due to imperfections inherent in the sequencing process. We establish that at least $\log n - \ell$ bits of redundancy are needed to correct a single deletion. An error-correcting code that is optimal up to an additive constant, is also proposed. Furthermore, we find that for $\ell \geq 2$, reconstruction from two distinct noisy $\ell$-read vectors can be accomplished without any redundancy, and provide a suitable reconstruction algorithm to this effect.
△ Less
Submitted 7 May, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images
Authors:
Jia Wan,
Wanhua Li,
Jason Ken Adhinarta,
Atmadeep Banerjee,
Evelina Sjostedt,
Jingpeng Wu,
Jeff Lichtman,
Hanspeter Pfister,
Donglai Wei
Abstract:
While imaging techniques at macro and mesoscales have garnered substantial attention and resources, microscale Volume Electron Microscopy (vEM) imaging, capable of revealing intricate vascular details, has lacked the necessary benchmarking infrastructure. In this paper, we address a significant gap in this field of neuroimaging by introducing the first-in-class public benchmark, BvEM, designed spe…
▽ More
While imaging techniques at macro and mesoscales have garnered substantial attention and resources, microscale Volume Electron Microscopy (vEM) imaging, capable of revealing intricate vascular details, has lacked the necessary benchmarking infrastructure. In this paper, we address a significant gap in this field of neuroimaging by introducing the first-in-class public benchmark, BvEM, designed specifically for cortical blood vessel segmentation in vEM images. Our BvEM benchmark is based on vEM image volumes from three mammals: adult mouse, macaque, and human. We standardized the resolution, addressed imaging variations, and meticulously annotated blood vessels through semi-automatic, manual, and quality control processes, ensuring high-quality 3D segmentation. Furthermore, we developed a zero-shot cortical blood vessel segmentation method named TriSAM, which leverages the powerful segmentation model SAM for 3D segmentation. To extend SAM from 2D to 3D volume segmentation, TriSAM employs a multi-seed tracking framework, leveraging the reliability of certain image planes for tracking while using others to identify potential turning points. This approach effectively achieves long-term 3D blood vessel segmentation without model training or fine-tuning. Experimental results show that TriSAM achieved superior performances on the BvEM benchmark across three species. Our dataset, code, and model are available online at \url{https://jia-wan.github.io/bvem}.
△ Less
Submitted 17 June, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Decentralized Multi-Agent Active Search and Tracking when Targets Outnumber Agents
Authors:
Arundhati Banerjee,
Jeff Schneider
Abstract:
Multi-agent multi-target tracking has a wide range of applications, including wildlife patrolling, security surveillance or environment monitoring. Such algorithms often make restrictive assumptions: the number of targets and/or their initial locations may be assumed known, or agents may be pre-assigned to monitor disjoint partitions of the environment, reducing the burden of exploration. This als…
▽ More
Multi-agent multi-target tracking has a wide range of applications, including wildlife patrolling, security surveillance or environment monitoring. Such algorithms often make restrictive assumptions: the number of targets and/or their initial locations may be assumed known, or agents may be pre-assigned to monitor disjoint partitions of the environment, reducing the burden of exploration. This also limits applicability when there are fewer agents than targets, since agents are unable to continuously follow the targets in their fields of view. Multi-agent tracking algorithms additionally assume inter-agent synchronization of observations, or the presence of a central controller to coordinate joint actions. Instead, we focus on the setting of decentralized multi-agent, multi-target, simultaneous active search-and-tracking with asynchronous inter-agent communication. Our proposed algorithm DecSTER uses a sequential monte carlo implementation of the probability hypothesis density filter for posterior inference combined with Thompson sampling for decentralized multi-agent decision making. We compare different action selection policies, focusing on scenarios where targets outnumber agents. In simulation, we demonstrate that DecSTER is robust to unreliable inter-agent communication and outperforms information-greedy baselines in terms of the Optimal Sub-Pattern Assignment (OSPA) metric for different numbers of targets and varying teamsizes.
△ Less
Submitted 9 January, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
An Implantable Piezofilm Middle Ear Microphone: Performance in Human Cadaveric Temporal Bones
Authors:
John Z. Zhang,
Lukas Graf,
Annesya Banerjee,
Aaron Yeiser,
Christopher I. McHugh,
Ioannis Kymissis,
Jeffrey H. Lang,
Elizabeth S. Olson,
Hideko Heidi Nakajima
Abstract:
Purpose: One of the major reasons that totally implantable cochlear microphones are not readily available is the lack of good implantable microphones. An implantable microphone has the potential to provide a range of benefits over external microphones for cochlear implant users including the filtering ability of the outer ear, cosmetics, and usability in all situations. This paper presents results…
▽ More
Purpose: One of the major reasons that totally implantable cochlear microphones are not readily available is the lack of good implantable microphones. An implantable microphone has the potential to provide a range of benefits over external microphones for cochlear implant users including the filtering ability of the outer ear, cosmetics, and usability in all situations. This paper presents results from experiments in human cadaveric ears of a piezofilm microphone concept under development as a possible component of a future implantable microphone system for use with cochlear implants. This microphone is referred to here as a drum microphone (DrumMic) that senses the robust and predictable motion of the umbo, the tip of the malleus. Methods: The performance was measured of five DrumMics inserted in four different human cadaveric temporal bones. Sensitivity, linearity, bandwidth, and equivalent input noise were measured during these experiments using a sound stimulus and measurement setup. Results: The sensitivity of the DrumMics was found to be tightly clustered across different microphones and ears despite differences in umbo and middle ear anatomy. The DrumMics were shown to behave linearly across a large dynamic range (46 dB SPL to 100 dB SPL) across a wide bandwidth (100 Hz to 8 kHz). The equivalent input noise (0.1-10 kHz) of the DrumMic and amplifier referenced to the ear canal was measured to be 54 dB SPL and estimated to be 46 dB SPL after accounting for the pressure gain of the outer ear. Conclusion: The results demonstrate that the DrumMic behaves robustly across ears and fabrication. The equivalent input noise performance was shown to approach that of commercial hearing aid microphones. To advance this demonstration of the DrumMic concept to a future prototype implantable in humans, work on encapsulation, biocompatibility, connectorization will be required.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Anatomical basis of human sex differences in ECG identified by automated torso-cardiac three-dimensional reconstruction
Authors:
Hannah J. Smith,
Blanca Rodriguez,
Yuling Sang,
Marcel Beetz,
Robin Choudhury,
Vicente Grau,
Abhirup Banerjee
Abstract:
Background and Aims: The electrocardiogram (ECG) is routinely used for diagnosis and risk stratification following myocardial infarction (MI), though its interpretation is confounded by anatomical variability and sex differences. Women have a higher incidence of missed MI diagnosis and poorer outcomes following infarction. Sex differences in ECG biomarkers and torso-ventricular anatomy have not be…
▽ More
Background and Aims: The electrocardiogram (ECG) is routinely used for diagnosis and risk stratification following myocardial infarction (MI), though its interpretation is confounded by anatomical variability and sex differences. Women have a higher incidence of missed MI diagnosis and poorer outcomes following infarction. Sex differences in ECG biomarkers and torso-ventricular anatomy have not been well characterised, largely due to the absence of high-throughput torso reconstruction methods.
Methods: This work presents quantification of sex differences in ECG versus anatomical biomarkers in healthy and post-MI subjects, enabled by a novel, end-to-end automated pipeline for torso-ventricular anatomical reconstruction from clinically standard cardiac magnetic resonance imaging. Personalised 3D torso-ventricular reconstructions were generated for 425 post-MI subjects and 1051 healthy controls from the UK Biobank. Regression models were created relating the extracted torso-ventricular and ECG parameters.
Results: Half the sex difference in QRS durations is explained by smaller ventricles in women both in healthy ($3.4 \pm 1.3$ms of $6.0 \pm 1.5$ms) and post-MI ($4.5 \pm 1.4$ms of $8.3 \pm 2.5$ms) subjects. Lower baseline STj amplitude in women is also associated with smaller ventricles, and more superior and posterior cardiac position. Post-MI T wave amplitude and R axis deviations are more strongly associated with a more posterior and horizontal cardiac position in women rather than electrophysiology as in men.
Conclusion: A novel computational pipeline enables the three-dimensional reconstruction of 1476 torso-cardiac geometries of healthy and post-myocardial infarction subjects, quantification of sex and BMI-related differences and association with ECG biomarkers. Any ECG-based tool should be reviewed considering anatomical sex differences to avoid sex-biased outcomes.
△ Less
Submitted 17 July, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge
Authors:
Yang Nan,
Xiaodan Xing,
Shiyi Wang,
Zeyu Tang,
Federico N Felder,
Sheng Zhang,
Roberta Eufrasia Ledda,
Xiaoliu Ding,
Ruiqi Yu,
Weiping Liu,
Feng Shi,
Tianyang Sun,
Zehong Cao,
Minghui Zhang,
Yun Gu,
Hanxiao Zhang,
Jian Gao,
Pingyu Wang,
Wen Tang,
Pengxin Yu,
Han Kang,
Junqiang Chen,
Xing Lu,
Boyu Zhang,
Michail Mamalakis
, et al. (16 additional authors not shown)
Abstract:
Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intric…
▽ More
Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intricate honeycombing patterns present in the lung tissues of fibrotic lung disease patients exacerbate the challenges, often leading to various prediction errors. To address this issue, the 'Airway-Informed Quantitative CT Imaging Biomarker for Fibrotic Lung Disease 2023' (AIIB23) competition was organized in conjunction with the official 2023 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The airway structures were meticulously annotated by three experienced radiologists. Competitors were encouraged to develop automatic airway segmentation models with high robustness and generalization abilities, followed by exploring the most correlated QIB of mortality prediction. A training set of 120 high-resolution computerised tomography (HRCT) scans were publicly released with expert annotations and mortality status. The online validation set incorporated 52 HRCT scans from patients with fibrotic lung disease and the offline test set included 140 cases from fibrosis and COVID-19 patients. The results have shown that the capacity of extracting airway trees from patients with fibrotic lung disease could be enhanced by introducing voxel-wise weighted general union loss and continuity loss. In addition to the competitive image biomarkers for prognosis, a strong airway-derived biomarker (Hazard ratio>1.5, p<0.0001) was revealed for survival prognostication compared with existing clinical measurements, clinician assessment and AI-based biomarkers.
△ Less
Submitted 16 April, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
The Expert Knowledge combined with AI outperforms AI Alone in Seizure Onset Zone Localization using resting state fMRI
Authors:
Payal Kamboj,
Ayan Banerjee,
Varina L. Boerwinkle,
Sandeep K. S. Gupta
Abstract:
We evaluated whether integration of expert guidance on seizure onset zone (SOZ) identification from resting state functional MRI (rs-fMRI) connectomics combined with deep learning (DL) techniques enhances the SOZ delineation in patients with refractory epilepsy (RE), compared to utilizing DL alone. Rs-fMRI were collected from 52 children with RE who had subsequently undergone ic-EEG and then, if i…
▽ More
We evaluated whether integration of expert guidance on seizure onset zone (SOZ) identification from resting state functional MRI (rs-fMRI) connectomics combined with deep learning (DL) techniques enhances the SOZ delineation in patients with refractory epilepsy (RE), compared to utilizing DL alone. Rs-fMRI were collected from 52 children with RE who had subsequently undergone ic-EEG and then, if indicated, surgery for seizure control (n = 25). The resting state functional connectomics data were previously independently classified by two expert epileptologists, as indicative of measurement noise, typical resting state network connectivity, or SOZ. An expert knowledge integrated deep network was trained on functional connectomics data to identify SOZ. Expert knowledge integrated with DL showed a SOZ localization accuracy of 84.8& and F1 score, harmonic mean of positive predictive value and sensitivity, of 91.7%. Conversely, a DL only model yielded an accuracy of less than 50% (F1 score 63%). Activations that initiate in gray matter, extend through white matter and end in vascular regions are seen as the most discriminative expert identified SOZ characteristics. Integration of expert knowledge of functional connectomics can not only enhance the performance of DL in localizing SOZ in RE, but also lead toward potentially useful explanations of prevalent co-activation patterns in SOZ. RE with surgical outcomes and pre-operative rs-fMRI studies can yield expert knowledge most salient for SOZ identification.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Contextual Bandits with Online Neural Regression
Authors:
Rohan Deb,
Yikun Ban,
Shiliang Zuo,
Jingrui He,
Arindam Banerjee
Abstract:
Recent works have shown a reduction from contextual bandits to online regression under a realizability assumption [Foster and Rakhlin, 2020, Foster and Krishnamurthy, 2021]. In this work, we investigate the use of neural networks for such online regression and associated Neural Contextual Bandits (NeuCBs). Using existing results for wide networks, one can readily show a ${\mathcal{O}}(\sqrt{T})$ r…
▽ More
Recent works have shown a reduction from contextual bandits to online regression under a realizability assumption [Foster and Rakhlin, 2020, Foster and Krishnamurthy, 2021]. In this work, we investigate the use of neural networks for such online regression and associated Neural Contextual Bandits (NeuCBs). Using existing results for wide networks, one can readily show a ${\mathcal{O}}(\sqrt{T})$ regret for online regression with square loss, which via the reduction implies a ${\mathcal{O}}(\sqrt{K} T^{3/4})$ regret for NeuCBs. Departing from this standard approach, we first show a $\mathcal{O}(\log T)$ regret for online regression with almost convex losses that satisfy QG (Quadratic Growth) condition, a generalization of the PL (Polyak-Łojasiewicz) condition, and that have a unique minima. Although not directly applicable to wide networks since they do not have unique minima, we show that adding a suitable small random perturbation to the network predictions surprisingly makes the loss satisfy QG with unique minima. Based on such a perturbed prediction, we show a ${\mathcal{O}}(\log T)$ regret for online regression with both squared loss and KL loss, and subsequently convert these respectively to $\tilde{\mathcal{O}}(\sqrt{KT})$ and $\tilde{\mathcal{O}}(\sqrt{KL^*} + K)$ regret for NeuCB, where $L^*$ is the loss of the best policy. Separately, we also show that existing regret bounds for NeuCBs are $Ω(T)$ or assume i.i.d. contexts, unlike this work. Finally, our experimental results on various datasets demonstrate that our algorithms, especially the one based on KL loss, persistently outperform existing algorithms.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric Learning
Authors:
Amartya Banerjee,
Christopher J. Hazard,
Jacob Beel,
Cade Mack,
Jack Xia,
Michael Resnick,
Will Goddin
Abstract:
Nonparametric learning is a fundamental concept in machine learning that aims to capture complex patterns and relationships in data without making strong assumptions about the underlying data distribution. Owing to simplicity and familiarity, one of the most well-known algorithms under this paradigm is the $k$-nearest neighbors ($k$-NN) algorithm. Driven by the usage of machine learning in safety-…
▽ More
Nonparametric learning is a fundamental concept in machine learning that aims to capture complex patterns and relationships in data without making strong assumptions about the underlying data distribution. Owing to simplicity and familiarity, one of the most well-known algorithms under this paradigm is the $k$-nearest neighbors ($k$-NN) algorithm. Driven by the usage of machine learning in safety-critical applications, in this work, we shed new light on the traditional nearest neighbors algorithm from the perspective of information theory and propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model. We can determine data point weights as well as feature contributions by calculating the conditional entropy for adding a feature without the need for explicit model training. This allows us to compute feature contributions by providing detailed data point influence weights with perfect attribution and can be used to query counterfactuals. Instead of using a traditional distance measure which needs to be scaled and contextualized, we use a novel formulation of $\textit{surprisal}$ (amount of information required to explain the difference between the observed and expected result). Finally, our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection, while also attaining competitive results for regression across a statistically significant number of datasets.
△ Less
Submitted 2 February, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Set Augmented Finite Automata over Infinite Alphabets
Authors:
Ansuman Banerjee,
Kingshuk Chatterjee,
Shibashis Guha
Abstract:
A data language is a set of finite words defined on an infinite alphabet. Data languages are used to express properties associated with data values (domain defined over a countably infinite set). In this paper, we introduce set augmented finite automata (SAFA), a new class of automata for expressing data languages. We investigate the decision problems, closure properties, and expressiveness of SAF…
▽ More
A data language is a set of finite words defined on an infinite alphabet. Data languages are used to express properties associated with data values (domain defined over a countably infinite set). In this paper, we introduce set augmented finite automata (SAFA), a new class of automata for expressing data languages. We investigate the decision problems, closure properties, and expressiveness of SAFA. We also study the deterministic variant of these automata.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance
Authors:
Alloy Das,
Sanket Biswas,
Ayan Banerjee,
Josep Lladós,
Umapada Pal,
Saumik Bhattacharya
Abstract:
The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here…
▽ More
The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data such that it can directly adapt to target domains rather than being specialized for a specific domain or scenario. Further, we investigate a transformer baseline called Swin-TESTR to focus on solving scene-text spotting for both regular and arbitrary-shaped scene text along with an exhaustive evaluation. The results clearly demonstrate the potential of intermediate representations to achieve significant performance on text spotting benchmarks across multiple domains (e.g. language, synth-to-real, and documents). both in terms of accuracy and efficiency.
△ Less
Submitted 1 November, 2023; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Active Anomaly Detection in Confined Spaces Using Ergodic Traversal of Directed Region Graphs
Authors:
Benjamin Wong,
Tyler M. Paine,
Santosh Devasia,
Ashis G. Banerjee
Abstract:
We provide the first step toward developing a hierarchical control-estimation framework to actively plan robot trajectories for anomaly detection in confined spaces. The space is represented globally using a directed region graph, where a region is a landmark that needs to be visited (inspected). We devise a fast mixing Markov chain to find an ergodic route that traverses this graph so that the re…
▽ More
We provide the first step toward developing a hierarchical control-estimation framework to actively plan robot trajectories for anomaly detection in confined spaces. The space is represented globally using a directed region graph, where a region is a landmark that needs to be visited (inspected). We devise a fast mixing Markov chain to find an ergodic route that traverses this graph so that the region visitation frequency is proportional to its anomaly detection uncertainty, while satisfying the edge directionality (region transition) constraint(s). Preliminary simulation results show fast convergence to the ergodic solution and confident estimation of the presence of anomalies in the inspected regions.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
Human-Inspired Topological Representations for Visual Object Recognition in Unseen Environments
Authors:
Ekta U. Samani,
Ashis G. Banerjee
Abstract:
Visual object recognition in unseen and cluttered indoor environments is a challenging problem for mobile robots. Toward this goal, we extend our previous work to propose the TOPS2 descriptor, and an accompanying recognition framework, THOR2, inspired by a human reasoning mechanism known as object unity. We interleave color embeddings obtained using the Mapper algorithm for topological soft cluste…
▽ More
Visual object recognition in unseen and cluttered indoor environments is a challenging problem for mobile robots. Toward this goal, we extend our previous work to propose the TOPS2 descriptor, and an accompanying recognition framework, THOR2, inspired by a human reasoning mechanism known as object unity. We interleave color embeddings obtained using the Mapper algorithm for topological soft clustering with the shape-based TOPS descriptor to obtain the TOPS2 descriptor. THOR2, trained using synthetic data, achieves substantially higher recognition accuracy than the shape-based THOR framework and outperforms RGB-D ViT on two real-world datasets: the benchmark OCID dataset and the UW-IS Occluded dataset. Therefore, THOR2 is a promising step toward achieving robust recognition in low-cost robots.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
High Fidelity Fast Simulation of Human in the Loop Human in the Plant (HIL-HIP) Systems
Authors:
Ayan Banerjee,
Payal Kamboj,
Aranyak Maity,
Riya Sudhakar Salian,
Sandeep K. S. Gupta
Abstract:
Non-linearities in simulation arise from the time variance in wireless mobile networks when integrated with human in the loop, human in the plant (HIL-HIP) physical systems under dynamic contexts, leading to simulation slowdown. Time variance is handled by deriving a series of piece wise linear time invariant simulations (PLIS) in intervals, which are then concatenated in time domain. In this pape…
▽ More
Non-linearities in simulation arise from the time variance in wireless mobile networks when integrated with human in the loop, human in the plant (HIL-HIP) physical systems under dynamic contexts, leading to simulation slowdown. Time variance is handled by deriving a series of piece wise linear time invariant simulations (PLIS) in intervals, which are then concatenated in time domain. In this paper, we conduct a formal analysis of the impact of discretizing time-varying components in wireless network-controlled HIL-HIP systems on simulation accuracy and speedup, and evaluate trade-offs with reliable guarantees. We develop an accurate simulation framework for an artificial pancreas wireless network system that controls blood glucose in Type 1 Diabetes patients with time varying properties such as physiological changes associated with psychological stress and meal patterns. PLIS approach achieves accurate simulation with greater than 2.1 times speedup than a non-linear system simulation for the given dataset.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
AmbientFlow: Invertible generative models from incomplete, noisy measurements
Authors:
Varun A. Kelkar,
Rucha Deshpande,
Arindam Banerjee,
Mark A. Anastasio
Abstract:
Generative models have gained popularity for their potential applications in imaging science, such as image reconstruction, posterior sampling and data sharing. Flow-based generative models are particularly attractive due to their ability to tractably provide exact density estimates along with fast, inexpensive and diverse samples. Training such models, however, requires a large, high quality data…
▽ More
Generative models have gained popularity for their potential applications in imaging science, such as image reconstruction, posterior sampling and data sharing. Flow-based generative models are particularly attractive due to their ability to tractably provide exact density estimates along with fast, inexpensive and diverse samples. Training such models, however, requires a large, high quality dataset of objects. In applications such as computed imaging, it is often difficult to acquire such data due to requirements such as long acquisition time or high radiation dose, while acquiring noisy or partially observed measurements of these objects is more feasible. In this work, we propose AmbientFlow, a framework for learning flow-based generative models directly from noisy and incomplete data. Using variational Bayesian methods, a novel framework for establishing flow-based generative models from noisy, incomplete data is proposed. Extensive numerical studies demonstrate the effectiveness of AmbientFlow in learning the object distribution. The utility of AmbientFlow in a downstream inference task of image reconstruction is demonstrated.
△ Less
Submitted 13 December, 2023; v1 submitted 9 September, 2023;
originally announced September 2023.
-
Detection of Unknown-Unknowns in Human-in-Plant Human-in-Loop Systems Using Physics Guided Process Models
Authors:
Aranyak Maity,
Ayan Banerjee,
Sandeep Gupta
Abstract:
Unknown-unknowns are operational scenarios in systems that are not accounted for in the design and test phase. In such scenarios, the operational behavior of the Human-in-loop (HIL) Human-in-Plant (HIP) systems is not guaranteed to meet requirements such as safety and efficacy. We propose a novel framework for analyzing the operational output characteristics of safety-critical HIL-HIP systems that…
▽ More
Unknown-unknowns are operational scenarios in systems that are not accounted for in the design and test phase. In such scenarios, the operational behavior of the Human-in-loop (HIL) Human-in-Plant (HIP) systems is not guaranteed to meet requirements such as safety and efficacy. We propose a novel framework for analyzing the operational output characteristics of safety-critical HIL-HIP systems that can discover unknown-unknown scenarios and evaluate potential safety hazards. We propose dynamics-induced hybrid recurrent neural networks (DiH-RNN) to mine a physics-guided surrogate model (PGSM) that checks for deviation of the cyber-physical system (CPS) from safety-certified operational characteristics. The PGSM enables early detection of unknown-unknowns based on the physical laws governing the system. We demonstrate the detection of operational changes in an Artificial Pancreas(AP) due to unknown insulin cartridge errors.
△ Less
Submitted 12 December, 2023; v1 submitted 5 September, 2023;
originally announced September 2023.
-
Towards Individual and Multistakeholder Fairness in Tourism Recommender Systems
Authors:
Ashmi Banerjee,
Paromita Banik,
Wolfgang Wörndl
Abstract:
This position paper summarizes our published review on individual and multistakeholder fairness in Tourism Recommender Systems (TRS). Recently, there has been growing attention to fairness considerations in recommender systems (RS). It has been acknowledged in research that fairness in RS is often closely tied to the presence of multiple stakeholders, such as end users, item providers, and platfor…
▽ More
This position paper summarizes our published review on individual and multistakeholder fairness in Tourism Recommender Systems (TRS). Recently, there has been growing attention to fairness considerations in recommender systems (RS). It has been acknowledged in research that fairness in RS is often closely tied to the presence of multiple stakeholders, such as end users, item providers, and platforms, as it raises concerns for the fair treatment of all parties involved. Hence, fairness in RS is a multi-faceted concept that requires consideration of the perspectives and needs of the different stakeholders to ensure fair outcomes for them. However, there may often be instances where achieving the goals of one stakeholder could conflict with those of another, resulting in trade-offs.
In this paper, we emphasized addressing the unique challenges of ensuring fairness in RS within the tourism domain. We aimed to discuss potential strategies for mitigating the aforementioned challenges and examine the applicability of solutions from other domains to tackle fairness issues in tourism. By exploring cross-domain approaches and strategies for incorporating S-Fairness, we can uncover valuable insights and determine how these solutions can be adapted and implemented effectively in the context of tourism to enhance fairness in RS.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Learning for Interval Prediction of Electricity Demand: A Cluster-based Bootstrapping Approach
Authors:
Rohit Dube,
Natarajan Gautam,
Amarnath Banerjee,
Harsha Nagarajan
Abstract:
Accurate predictions of electricity demands are necessary for managing operations in a small aggregation load setting like a Microgrid. Due to low aggregation, the electricity demands can be highly stochastic and point estimates would lead to inflated errors. Interval estimation in this scenario, would provide a range of values within which the future values might lie and helps quantify the errors…
▽ More
Accurate predictions of electricity demands are necessary for managing operations in a small aggregation load setting like a Microgrid. Due to low aggregation, the electricity demands can be highly stochastic and point estimates would lead to inflated errors. Interval estimation in this scenario, would provide a range of values within which the future values might lie and helps quantify the errors around the point estimates. This paper introduces a residual bootstrap algorithm to generate interval estimates of day-ahead electricity demand. A machine learning algorithm is used to obtain the point estimates of electricity demand and respective residuals on the training set. The obtained residuals are stored in memory and the memory is further partitioned. Days with similar demand patterns are grouped in clusters using an unsupervised learning algorithm and these clusters are used to partition the memory. The point estimates for test day are used to find the closest cluster of similar days and the residuals are bootstrapped from the chosen cluster. This algorithm is evaluated on the real electricity demand data from EULR(End Use Load Research) and is compared to other bootstrapping methods for varying confidence intervals.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
Flexible Multi-DoF Aerial 3D Printing Supported with Automated Optimal Chunking
Authors:
Marios-Nektarios Stamatopoulos,
Avijit Banerjee,
George Nikolakopoulos
Abstract:
The future of 3D printing utilizing unmanned aerial vehicles (UAVs) presents a promising capability to revolutionize manufacturing and to enable the creation of large-scale structures in remote and hard- to-reach areas e.g. in other planetary systems. Nevertheless, the limited payload capacity of UAVs and the complexity in the 3D printing of large objects pose significant challenges. In this artic…
▽ More
The future of 3D printing utilizing unmanned aerial vehicles (UAVs) presents a promising capability to revolutionize manufacturing and to enable the creation of large-scale structures in remote and hard- to-reach areas e.g. in other planetary systems. Nevertheless, the limited payload capacity of UAVs and the complexity in the 3D printing of large objects pose significant challenges. In this article we propose a novel chunk-based framework for distributed 3D printing using UAVs that sets the basis for a fully collaborative aerial 3D printing of challenging structures. The presented framework, through a novel proposed optimisation process, is able to divide the 3D model to be printed into small, manageable chunks and to assign them to a UAV for partial printing of the assigned chunk, in a fully autonomous approach. Thus, we establish the algorithms for chunk division, allocation, and printing, and we also introduce a novel algorithm that efficiently partitions the mesh into planar chunks, while accounting for the inter-connectivity constraints of the chunks. The efficiency of the proposed framework is demonstrated through multiple physics based simulations in Gazebo, where a CAD construction mesh is printed via multiple UAVs carrying materials whose volume is proportionate to a fraction of the total mesh volume.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Two-Dimensional Z-Complementary Array Quads with Low Column Sequence PMEPRs
Authors:
Shibsankar Das,
Adrish Banerjee,
Udaya Parampalli
Abstract:
In this paper, we first propose a new design strategy of 2D $Z$-complementary array quads (2D-ZCAQs) with feasible array sizes. A 2D-ZCAQ consists of four distinct unimodular arrays satisfying zero 2D auto-correlation sums for non-trivial 2D time-shifts within certain zone. Then, we obtain the upper bounds on the column sequence peak-to-mean envelope power ratio (PMEPR) of the constructed 2D-ZCAQs…
▽ More
In this paper, we first propose a new design strategy of 2D $Z$-complementary array quads (2D-ZCAQs) with feasible array sizes. A 2D-ZCAQ consists of four distinct unimodular arrays satisfying zero 2D auto-correlation sums for non-trivial 2D time-shifts within certain zone. Then, we obtain the upper bounds on the column sequence peak-to-mean envelope power ratio (PMEPR) of the constructed 2D-ZCAQs by using specific auto-correlation properties of some seed sequences. The constructed 2D-ZCAQs with bounded column sequence PMEPR can be used as a potential alternative to 2D Golay complementary array sets for practical applications
△ Less
Submitted 13 August, 2023;
originally announced August 2023.
-
Root Cross Z-Complementary Pairs with Large ZCZ Width
Authors:
Shibsankar Das,
Adrish Banerjee,
Zilong Liu
Abstract:
In this paper, we present a new family of cross $Z$-complementary pairs (CZCPs) based on generalized Boolean functions and two roots of unity. Our key idea is to consider an arbitrary partition of the set $\{1,2,\cdots, n\}$ with two subsets corresponding to two given roots of unity for which two truncated sequences of new alphabet size determined by the two roots of unity are obtained. We show th…
▽ More
In this paper, we present a new family of cross $Z$-complementary pairs (CZCPs) based on generalized Boolean functions and two roots of unity. Our key idea is to consider an arbitrary partition of the set $\{1,2,\cdots, n\}$ with two subsets corresponding to two given roots of unity for which two truncated sequences of new alphabet size determined by the two roots of unity are obtained. We show that these two truncated sequences form a new $q$-ary CZCP with flexible sequence length and large zero-correlation zone width. Furthermore, we derive an enumeration formula by considering the Stirling number of the second kind for the partitions and show that the number of constructed CZCPs increases significantly compared to the existing works.
△ Less
Submitted 13 August, 2023;
originally announced August 2023.
-
Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion
Authors:
Siyuan Shan,
Yang Li,
Amartya Banerjee,
Junier B. Oliva
Abstract:
Voice conversion (VC) aims at altering a person's voice to make it sound similar to the voice of another person while preserving linguistic content. Existing methods suffer from a dilemma between content intelligibility and speaker similarity; i.e., methods with higher intelligibility usually have a lower speaker similarity, while methods with higher speaker similarity usually require plenty of ta…
▽ More
Voice conversion (VC) aims at altering a person's voice to make it sound similar to the voice of another person while preserving linguistic content. Existing methods suffer from a dilemma between content intelligibility and speaker similarity; i.e., methods with higher intelligibility usually have a lower speaker similarity, while methods with higher speaker similarity usually require plenty of target speaker voice data to achieve high intelligibility. In this work, we propose a novel method \textit{Phoneme Hallucinator} that achieves the best of both worlds. Phoneme Hallucinator is a one-shot VC model; it adopts a novel model to hallucinate diversified and high-fidelity target speaker phonemes based just on a short target speaker voice (e.g. 3 seconds). The hallucinated phonemes are then exploited to perform neighbor-based voice conversion. Our model is a text-free, any-to-any VC model that requires no text annotations and supports conversion to any unseen speaker. Objective and subjective evaluations show that \textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity.
△ Less
Submitted 30 December, 2023; v1 submitted 11 August, 2023;
originally announced August 2023.
-
A Distance Correlation-Based Approach to Characterize the Effectiveness of Recurrent Neural Networks for Time Series Forecasting
Authors:
Christopher Salazar,
Ashis G. Banerjee
Abstract:
Time series forecasting has received a lot of attention, with recurrent neural networks (RNNs) being one of the widely used models due to their ability to handle sequential data. Previous studies on RNN time series forecasting, however, show inconsistent outcomes and offer few explanations for performance variations among the datasets. In this paper, we provide an approach to link time series char…
▽ More
Time series forecasting has received a lot of attention, with recurrent neural networks (RNNs) being one of the widely used models due to their ability to handle sequential data. Previous studies on RNN time series forecasting, however, show inconsistent outcomes and offer few explanations for performance variations among the datasets. In this paper, we provide an approach to link time series characteristics with RNN components via the versatile metric of distance correlation. This metric allows us to examine the information flow through the RNN activation layers to be able to interpret and explain their performance. We empirically show that the RNN activation layers learn the lag structures of time series well. However, they gradually lose this information over the span of a few consecutive layers, thereby worsening the forecast quality for series with large lag structures. We also show that the activation layers cannot adequately model moving average and heteroskedastic time series processes. Last, we generate heatmaps for visual comparisons of the activation layers for different choices of the network hyperparameters to identify which of them affect the forecast performance. Our findings can, therefore, aid practitioners in assessing the effectiveness of RNNs for given time series data without actually training and evaluating the networks.
△ Less
Submitted 25 April, 2024; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Multi-objective point cloud autoencoders for explainable myocardial infarction prediction
Authors:
Marcel Beetz,
Abhirup Banerjee,
Vicente Grau
Abstract:
Myocardial infarction (MI) is one of the most common causes of death in the world. Image-based biomarkers commonly used in the clinic, such as ejection fraction, fail to capture more complex patterns in the heart's 3D anatomy and thus limit diagnostic accuracy. In this work, we present the multi-objective point cloud autoencoder as a novel geometric deep learning approach for explainable infarctio…
▽ More
Myocardial infarction (MI) is one of the most common causes of death in the world. Image-based biomarkers commonly used in the clinic, such as ejection fraction, fail to capture more complex patterns in the heart's 3D anatomy and thus limit diagnostic accuracy. In this work, we present the multi-objective point cloud autoencoder as a novel geometric deep learning approach for explainable infarction prediction, based on multi-class 3D point cloud representations of cardiac anatomy and function. Its architecture consists of multiple task-specific branches connected by a low-dimensional latent space to allow for effective multi-objective learning of both reconstruction and MI prediction, while capturing pathology-specific 3D shape information in an interpretable latent space. Furthermore, its hierarchical branch design with point cloud-based deep learning operations enables efficient multi-scale feature learning directly on high-resolution anatomy point clouds. In our experiments on a large UK Biobank dataset, the multi-objective point cloud autoencoder is able to accurately reconstruct multi-temporal 3D shapes with Chamfer distances between predicted and input anatomies below the underlying images' pixel resolution. Our method outperforms multiple machine learning and deep learning benchmarks for the task of incident MI prediction by 19% in terms of Area Under the Receiver Operating Characteristic curve. In addition, its task-specific compact latent space exhibits easily separable control and MI clusters with clinically plausible associations between subject encodings and corresponding 3D shapes, thus demonstrating the explainability of the prediction.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Modeling 3D cardiac contraction and relaxation with point cloud deformation networks
Authors:
Marcel Beetz,
Abhirup Banerjee,
Vicente Grau
Abstract:
Global single-valued biomarkers of cardiac function typically used in clinical practice, such as ejection fraction, provide limited insight on the true 3D cardiac deformation process and hence, limit the understanding of both healthy and pathological cardiac mechanics. In this work, we propose the Point Cloud Deformation Network (PCD-Net) as a novel geometric deep learning approach to model 3D car…
▽ More
Global single-valued biomarkers of cardiac function typically used in clinical practice, such as ejection fraction, provide limited insight on the true 3D cardiac deformation process and hence, limit the understanding of both healthy and pathological cardiac mechanics. In this work, we propose the Point Cloud Deformation Network (PCD-Net) as a novel geometric deep learning approach to model 3D cardiac contraction and relaxation between the extreme ends of the cardiac cycle. It employs the recent advances in point cloud-based deep learning into an encoder-decoder structure, in order to enable efficient multi-scale feature learning directly on multi-class 3D point cloud representations of the cardiac anatomy. We evaluate our approach on a large dataset of over 10,000 cases from the UK Biobank study and find average Chamfer distances between the predicted and ground truth anatomies below the pixel resolution of the underlying image acquisition. Furthermore, we observe similar clinical metrics between predicted and ground truth populations and show that the PCD-Net can successfully capture subpopulation-specific differences between normal subjects and myocardial infarction (MI) patients. We then demonstrate that the learned 3D deformation patterns outperform multiple clinical benchmarks by 13% and 7% in terms of area under the receiver operating characteristic curve for the tasks of prevalent MI detection and incident MI prediction and by 7% in terms of Harrell's concordance index for MI survival analysis.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Alignment complete relational Hoare logics for some and all
Authors:
Ramana Nagasamudram,
Anindya Banerjee,
David A. Naumann
Abstract:
In relational verification, judicious alignment of computational steps facilitates proof of relations between programs using simple relational assertions. Relational Hoare logics (RHL) provide compositional rules that embody various alignments of executions. Seemingly more flexible alignments can be expressed in terms of product automata based on program transition relations. A single degenerate a…
▽ More
In relational verification, judicious alignment of computational steps facilitates proof of relations between programs using simple relational assertions. Relational Hoare logics (RHL) provide compositional rules that embody various alignments of executions. Seemingly more flexible alignments can be expressed in terms of product automata based on program transition relations. A single degenerate alignment rule (self-composition), atop a complete Hoare logic, comprises a RHL for $\forall\forall$ properties that is complete in the ordinary logical sense (Cook'78). The notion of alignment completeness was previously proposed as a more satisfactory measure, and some rules were shown to be alignment complete with respect to a few ad hoc forms of alignment automata. This paper proves alignment completeness with respect to a general class of $\forall\forall$ alignment automata, for a RHL comprised of standard rules together with a rule of semantics-preserving rewrites based on Kleene algebra with tests. A new logic for $\forall\exists$ properties is introduced and shown to be alignment complete. The $\forall\forall$ and $\forall\exists$ automata are shown to be semantically complete. Thus the logics are both complete in the ordinary sense. Recent work by D'Osualdo et al highlights the importance of completeness relative to assumptions (which we term entailment completeness), and presents $\forall\forall$ examples seemingly beyond the scope of RHLs. Additional rules enable these examples to be proved in our RHL, shedding light on the open problem of entailment completeness.
△ Less
Submitted 9 July, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.