subscribe to arXiv mailings

Verifying Peephole Rewriting In SSA Compiler IRs

Authors: Siddharth Bhat, Alex Keizer, Chris Hughes, Andrés Goens, Tobias Grosser

Abstract: There is an increasing need for domain-specific reasoning in modern compilers. This has fueled the use of tailored intermediate representations (IRs) based on static single assignment (SSA), like in the MLIR compiler framework. Interactive theorem provers (ITPs) provide strong guarantees for the end-to-end verification of compilers (e.g., CompCert). However, modern compilers and their IRs evolve a… ▽ More There is an increasing need for domain-specific reasoning in modern compilers. This has fueled the use of tailored intermediate representations (IRs) based on static single assignment (SSA), like in the MLIR compiler framework. Interactive theorem provers (ITPs) provide strong guarantees for the end-to-end verification of compilers (e.g., CompCert). However, modern compilers and their IRs evolve at a rate that makes proof engineering alongside them prohibitively expensive. Nevertheless, well-scoped push-button automated verification tools such as the Alive peephole verifier for LLVM-IR gained recognition in domains where SMT solvers offer efficient (semi) decision procedures. In this paper, we aim to combine the convenience of automation with the versatility of ITPs for verifying peephole rewrites across domain-specific IRs. We formalize a core calculus for SSA-based IRs that is generic over the IR and covers so-called regions (nested scoping used by many domain-specific IRs in the MLIR ecosystem). Our mechanization in the Lean proof assistant provides a user-friendly frontend for translating MLIR syntax into our calculus. We provide scaffolding for defining and verifying peephole rewrites, offering tactics to eliminate the abstraction overhead of our SSA calculus. We prove correctness theorems about peephole rewriting, as well as two classical program transformations. To evaluate our framework, we consider three use cases from the MLIR ecosystem that cover different levels of abstractions: (1) bitvector rewrites from LLVM, (2) structured control flow, and (3) fully homomorphic encryption. We envision that our mechanization provides a foundation for formally verified rewrites on new domain-specific IRs. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: accepted at ITP 2024

arXiv:2406.16143 [pdf, other]

Review of Zero-Shot and Few-Shot AI Algorithms in The Medical Domain

Authors: Maged Badawi, Mohammedyahia Abushanab, Sheethal Bhat, Andreas Maier

Abstract: In this paper, different techniques of few-shot, zero-shot, and regular object detection have been investigated. The need for few-shot learning and zero-shot learning techniques is crucial and arises from the limitations and challenges in traditional machine learning, deep learning, and computer vision methods where they require large amounts of data, plus the poor generalization of those traditio… ▽ More In this paper, different techniques of few-shot, zero-shot, and regular object detection have been investigated. The need for few-shot learning and zero-shot learning techniques is crucial and arises from the limitations and challenges in traditional machine learning, deep learning, and computer vision methods where they require large amounts of data, plus the poor generalization of those traditional methods. Those techniques can give us prominent results by using only a few training sets reducing the required amounts of data and improving the generalization. This survey will highlight the recent papers of the last three years that introduce the usage of few-shot learning and zero-shot learning techniques in addressing the challenges mentioned earlier. In this paper we reviewed the Zero-shot, few-shot and regular object detection methods and categorized them in an understandable manner. Based on the comparison made within each category. It been found that the approaches are quite impressive. This integrated review of diverse papers on few-shot, zero-shot, and regular object detection reveals a shared focus on advancing the field through novel frameworks and techniques. A noteworthy observation is the scarcity of detailed discussions regarding the difficulties encountered during the development phase. Contributions include the introduction of innovative models, such as ZSD-YOLO and GTNet, often showcasing improvements with various metrics such as mean average precision (mAP),Recall@100 (RE@100), the area under the receiver operating characteristic curve (AUROC) and precision. These findings underscore a collective move towards leveraging vision-language models for versatile applications, with potential areas for future research including a more thorough exploration of limitations and domain-specific adaptations. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.06679 [pdf, other]

PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation

Authors: Zhenyu Li, Shariq Farooq Bhat, Peter Wonka

Abstract: This paper introduces PatchRefiner, an advanced framework for metric single image depth estimation aimed at high-resolution real-domain inputs. While depth estimation is crucial for applications such as autonomous driving, 3D generative modeling, and 3D reconstruction, achieving accurate high-resolution depth in real-world scenarios is challenging due to the constraints of existing architectures a… ▽ More This paper introduces PatchRefiner, an advanced framework for metric single image depth estimation aimed at high-resolution real-domain inputs. While depth estimation is crucial for applications such as autonomous driving, 3D generative modeling, and 3D reconstruction, achieving accurate high-resolution depth in real-world scenarios is challenging due to the constraints of existing architectures and the scarcity of detailed real-world depth data. PatchRefiner adopts a tile-based methodology, reconceptualizing high-resolution depth estimation as a refinement process, which results in notable performance enhancements. Utilizing a pseudo-labeling strategy that leverages synthetic data, PatchRefiner incorporates a Detail and Scale Disentangling (DSD) loss to enhance detail capture while maintaining scale accuracy, thus facilitating the effective transfer of knowledge from synthetic to real-world data. Our extensive evaluations demonstrate PatchRefiner's superior performance, significantly outperforming existing benchmarks on the Unreal4KStereo dataset by 18.1% in terms of the root mean squared error (RMSE) and showing marked improvements in detail accuracy and consistent scale estimation on diverse real-world datasets like CityScape, ScanNet++, and ETH3D. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.19376 [pdf, other]

PureEBM: Universal Poison Purification via Mid-Run Dynamics of Energy-Based Models

Authors: Omead Pooladzandi, Jeffrey Jiang, Sunay Bhat, Gregory Pottie

Abstract: Data poisoning attacks pose a significant threat to the integrity of machine learning models by leading to misclassification of target distribution data by injecting adversarial examples during training. Existing state-of-the-art (SoTA) defense methods suffer from limitations, such as significantly reduced generalization performance and significant overhead during training, making them impractical… ▽ More Data poisoning attacks pose a significant threat to the integrity of machine learning models by leading to misclassification of target distribution data by injecting adversarial examples during training. Existing state-of-the-art (SoTA) defense methods suffer from limitations, such as significantly reduced generalization performance and significant overhead during training, making them impractical or limited for real-world applications. In response to this challenge, we introduce a universal data purification method that defends naturally trained classifiers from malicious white-, gray-, and black-box image poisons by applying a universal stochastic preprocessing step $Ψ_{T}(x)$, realized by iterative Langevin sampling of a convergent Energy Based Model (EBM) initialized with an image $x.$ Mid-run dynamics of $Ψ_{T}(x)$ purify poison information with minimal impact on features important to the generalization of a classifier network. We show that EBMs remain universal purifiers, even in the presence of poisoned EBM training data, and achieve SoTA defense on leading triggered and triggerless poisons. This work is a subset of a larger framework introduced in \pgen with a more detailed focus on EBM purification and poison defense. △ Less

Submitted 2 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2405.18627

arXiv:2405.18627 [pdf, other]

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Authors: Sunay Bhat, Jeffrey Jiang, Omead Pooladzandi, Alexander Branch, Gregory Pottie

Abstract: Train-time data poisoning attacks threaten machine learning models by introducing adversarial examples during training, leading to misclassification. Current defense methods often reduce generalization performance, are attack-specific, and impose significant training overhead. To address this, we introduce a set of universal data purification methods using a stochastic transform, $Ψ(x)$, realized… ▽ More Train-time data poisoning attacks threaten machine learning models by introducing adversarial examples during training, leading to misclassification. Current defense methods often reduce generalization performance, are attack-specific, and impose significant training overhead. To address this, we introduce a set of universal data purification methods using a stochastic transform, $Ψ(x)$, realized via iterative Langevin dynamics of Energy-Based Models (EBMs), Denoising Diffusion Probabilistic Models (DDPMs), or both. These approaches purify poisoned data with minimal impact on classifier generalization. Our specially trained EBMs and DDPMs provide state-of-the-art defense against various attacks (including Narcissus, Bullseye Polytope, Gradient Matching) on CIFAR-10, Tiny-ImageNet, and CINIC-10, without needing attack or classifier-specific information. We discuss performance trade-offs and show that our methods remain highly effective even with poisoned or distributionally shifted generative model training data. △ Less

Submitted 2 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.18324 [pdf, other]

Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study

Authors: Shreyas Bhat, Joseph B. Lyons, Cong Shi, X. Jessie Yang

Abstract: With the advent of AI technologies, humans and robots are increasingly teaming up to perform collaborative tasks. To enable smooth and effective collaboration, the topic of value alignment (operationalized herein as the degree of dynamic goal alignment within a task) between the robot and the human is gaining increasing research attention. Prior literature on value alignment makes an inherent assu… ▽ More With the advent of AI technologies, humans and robots are increasingly teaming up to perform collaborative tasks. To enable smooth and effective collaboration, the topic of value alignment (operationalized herein as the degree of dynamic goal alignment within a task) between the robot and the human is gaining increasing research attention. Prior literature on value alignment makes an inherent assumption that aligning the values of the robot with that of the human benefits the team. This assumption, however, has not been empirically verified. Moreover, prior literature does not account for human's trust in the robot when analyzing human-robot value alignment. Thus, a research gap needs to be bridged by answering two questions: How does alignment of values affect trust? Is it always beneficial to align the robot's values with that of the human? We present a simulation study and a human-subject study to answer these questions. Results from the simulation study show that alignment of values is important for trust when the overall risk level of the task is high. We also present an adaptive strategy for the robot that uses Inverse Reinforcement Learning (IRL) to match the values of the robot with those of the human during interaction. Our simulations suggest that such an adaptive strategy is able to maintain trust across the full spectrum of human values. We also present results from an empirical study that validate these findings from simulation. Results indicate that real-time personalized value alignment is beneficial to trust and perceived performance by the human when the robot does not have a good prior on the human's values. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: This is a preprint of the following chapter: Bhat et al., Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study, published in "Emerging Frontiers in Human-Robot Interaction", edited by Ramana Kumar Vinjamuri, 2024, Springer Nature reproduced with permission of Springer Nature. The final authenticated version is available online at: [INSERT LINK HERE]

arXiv:2405.01787 [pdf, other]

Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming

Authors: Saikat Chakraborty, Gabriel Ebner, Siddharth Bhat, Sarah Fakhoury, Sakina Fatima, Shuvendu Lahiri, Nikhil Swamy

Abstract: Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*. Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600… ▽ More Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*. Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K lines of open-source F* programs and proofs, including software used in production systems ranging from Windows and Linux, to Python and Firefox. Our dataset includes around 32K top-level F* definitions, each representing a type-directed program and proof synthesis problem -- producing a definition given a formal specification expressed as an F* type. We provide a program-fragment checker that queries F* to check the correctness of candidate solutions. We believe this is the largest corpus of SMT-assisted program proofs coupled with a reproducible program-fragment checker. Grounded in this dataset, we investigate the use of AI to synthesize programs and their proofs in F*, with promising results. Our main finding in that the performance of fine-tuned smaller language models (such as Phi-2 or StarCoder) compare favorably with large language models (such as GPT-4), at a much lower computational cost. We also identify various type-based retrieval augmentation techniques and find that they boost performance significantly. With detailed error analysis and case studies, we identify potential strengths and weaknesses of models and techniques and suggest directions for future improvements. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.06405 [pdf, other]

Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry

Authors: Shiven Sinha, Ameya Prabhu, Ponnurangam Kumaraguru, Siddharth Bhat, Matthias Bethge

Abstract: Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 2… ▽ More Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 25 of 30 International Mathematical Olympiad (IMO) problems whereas the reported baseline based on Wu's method solved only ten. In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong. Wu's method alone can solve 15 problems, and some of them are not solved by any of the other methods. This leads to two key findings: (i) Combining Wu's method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem. Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist. △ Less

Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: Work in Progress. Released for wider feedback

arXiv:2403.17536 [pdf, other]

ILLUMINER: Instruction-tuned Large Language Models as Few-shot Intent Classifier and Slot Filler

Authors: Paramita Mirza, Viju Sudhi, Soumya Ranjan Sahoo, Sinchana Ramakanth Bhat

Abstract: State-of-the-art intent classification (IC) and slot filling (SF) methods often rely on data-intensive deep learning models, limiting their practicality for industry applications. Large language models on the other hand, particularly instruction-tuned models (Instruct-LLMs), exhibit remarkable zero-shot performance across various natural language tasks. This study evaluates Instruct-LLMs on popula… ▽ More State-of-the-art intent classification (IC) and slot filling (SF) methods often rely on data-intensive deep learning models, limiting their practicality for industry applications. Large language models on the other hand, particularly instruction-tuned models (Instruct-LLMs), exhibit remarkable zero-shot performance across various natural language tasks. This study evaluates Instruct-LLMs on popular benchmark datasets for IC and SF, emphasizing their capacity to learn from fewer examples. We introduce ILLUMINER, an approach framing IC and SF as language generation tasks for Instruct-LLMs, with a more efficient SF-prompting method compared to prior work. A comprehensive comparison with multiple baselines shows that our approach, using the FLAN-T5 11B model, outperforms the state-of-the-art joint IC+SF method and in-context learning with GPT3.5 (175B), particularly in slot filling by 11.1--32.2 percentage points. Additionally, our in-depth ablation study demonstrates that parameter-efficient fine-tuning requires less than 6% of training data to yield comparable performance with traditional full-weight fine-tuning. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted at LREC-COLING 2024

arXiv:2403.15596 [pdf, other]

Incorporating Memory into Propagation of 1-Electron Reduced Density Matrices

Authors: Harish S. Bhat, Hardeep Bassi, Karnamohit Ranka, Christine M. Isborn

Abstract: For any linear system where the unreduced dynamics are governed by unitary propagators, we derive a closed, time-delayed, linear system for a reduced-dimensional quantity of interest. We apply this method to understand the memory-dependence of $1$-electron reduced density matrices in time-dependent configuration interaction (TDCI), a scheme to solve for the correlated dynamics of electrons in mole… ▽ More For any linear system where the unreduced dynamics are governed by unitary propagators, we derive a closed, time-delayed, linear system for a reduced-dimensional quantity of interest. We apply this method to understand the memory-dependence of $1$-electron reduced density matrices in time-dependent configuration interaction (TDCI), a scheme to solve for the correlated dynamics of electrons in molecules. Though time-dependent density functional theory has established that the $1$-electron reduced density possesses memory-dependence, the precise nature of this memory-dependence has not been understood. We derive a self-contained, symmetry/constraint-preserving method to propagate reduced TDCI electron density matrices. Our method preserves properties of density matrices such as Hermitian symmetry and constant trace. In numerical tests on two model systems ($\text{H}_2$ and $\text{HeH}^+$), we show that with sufficiently large time-delay (or memory-dependence), our method propagates reduced TDCI density matrices with high quantitative accuracy. We study the dependence of our results on time step and basis set. To implement our method, we derive the $4$-index tensor that relates reduced and full TDCI density matrices. Our derivation applies to any TDCI system, regardless of basis set, number of electrons, or choice of Slater determinants in the wave function. This derivation enables a proof that the trace of the reduced TDCI density matrix is constant and equals the number of electrons. △ Less

Submitted 24 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: 26 pages, 7 figures

MSC Class: 81V55; 34K06; 81S22; 37N20

arXiv:2402.16034 [pdf, other]

Emotion Classification in Short English Texts using Deep Learning Techniques

Authors: Siddhanth Bhat

Abstract: Detecting emotions in limited text datasets from under-resourced languages presents a formidable obstacle, demanding specialized frameworks and computational strategies. This study conducts a thorough examination of deep learning techniques for discerning emotions in short English texts. Deep learning approaches employ transfer learning and word embedding, notably BERT, to attain superior accuracy… ▽ More Detecting emotions in limited text datasets from under-resourced languages presents a formidable obstacle, demanding specialized frameworks and computational strategies. This study conducts a thorough examination of deep learning techniques for discerning emotions in short English texts. Deep learning approaches employ transfer learning and word embedding, notably BERT, to attain superior accuracy. To evaluate these methods, we introduce the "SmallEnglishEmotions" dataset, comprising 6372 varied short English texts annotated with five primary emotion categories. Our experiments reveal that transfer learning and BERT-based text embedding outperform alternative methods in accurately categorizing the text in the dataset. △ Less

Submitted 10 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.05428 [pdf, other]

Mixture Density Networks for Classification with an Application to Product Bundling

Authors: Narendhar Gugulothu, Sanjay P. Bhat, Tejas Bodas

Abstract: While mixture density networks (MDNs) have been extensively used for regression tasks, they have not been used much for classification tasks. One reason for this is that the usability of MDNs for classification is not clear and straightforward. In this paper, we propose two MDN-based models for classification tasks. Both models fit mixtures of Gaussians to the the data and use the fitted distribut… ▽ More While mixture density networks (MDNs) have been extensively used for regression tasks, they have not been used much for classification tasks. One reason for this is that the usability of MDNs for classification is not clear and straightforward. In this paper, we propose two MDN-based models for classification tasks. Both models fit mixtures of Gaussians to the the data and use the fitted distributions to classify a given sample by evaluating the learnt cumulative distribution function for the given input features. While the proposed MDN-based models perform slightly better than, or on par with, five baseline classification models on three publicly available datasets, the real utility of our models comes out through a real-world product bundling application. Specifically, we use our MDN-based models to learn the willingness-to-pay (WTP) distributions for two products from synthetic sales data of the individual products. The Gaussian mixture representation of the learnt WTP distributions is then exploited to obtain the WTP distribution of the bundle consisting of both the products. The proposed MDN-based models are able to approximate the true WTP distributions of both products and the bundle well. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2401.03912 [pdf, other]

Attention-Guided Erasing: A Novel Augmentation Method for Enhancing Downstream Breast Density Classification

Authors: Adarsh Bhandary Panambur, Hui Yu, Sheethal Bhat, Prathmesh Madhu, Siming Bayer, Andreas Maier

Abstract: The assessment of breast density is crucial in the context of breast cancer screening, especially in populations with a higher percentage of dense breast tissues. This study introduces a novel data augmentation technique termed Attention-Guided Erasing (AGE), devised to enhance the downstream classification of four distinct breast density categories in mammography following the BI-RADS recommendat… ▽ More The assessment of breast density is crucial in the context of breast cancer screening, especially in populations with a higher percentage of dense breast tissues. This study introduces a novel data augmentation technique termed Attention-Guided Erasing (AGE), devised to enhance the downstream classification of four distinct breast density categories in mammography following the BI-RADS recommendation in the Vietnamese cohort. The proposed method integrates supplementary information during transfer learning, utilizing visual attention maps derived from a vision transformer backbone trained using the self-supervised DINO method. These maps are utilized to erase background regions in the mammogram images, unveiling only the potential areas of dense breast tissues to the network. Through the incorporation of AGE during transfer learning with varying random probabilities, we consistently surpass classification performance compared to scenarios without AGE and the traditional random erasing transformation. We validate our methodology using the publicly available VinDr-Mammo dataset. Specifically, we attain a mean F1-score of 0.5910, outperforming values of 0.5594 and 0.5691 corresponding to scenarios without AGE and with random erasing (RE), respectively. This superiority is further substantiated by t-tests, revealing a p-value of p<0.0001, underscoring the statistical significance of our approach. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.08548 [pdf, other]

EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment

Authors: Mykola Lavreniuk, Shariq Farooq Bhat, Matthias Müller, Peter Wonka

Abstract: This work presents the network architecture EVP (Enhanced Visual Perception). EVP builds on the previous work VPD which paved the way to use the Stable Diffusion network for computer vision tasks. We propose two major enhancements. First, we develop the Inverse Multi-Attentive Feature Refinement (IMAFR) module which enhances feature learning capabilities by aggregating spatial information from hig… ▽ More This work presents the network architecture EVP (Enhanced Visual Perception). EVP builds on the previous work VPD which paved the way to use the Stable Diffusion network for computer vision tasks. We propose two major enhancements. First, we develop the Inverse Multi-Attentive Feature Refinement (IMAFR) module which enhances feature learning capabilities by aggregating spatial information from higher pyramid levels. Second, we propose a novel image-text alignment module for improved feature extraction of the Stable Diffusion backbone. The resulting architecture is suitable for a wide variety of tasks and we demonstrate its performance in the context of single-image depth estimation with a specialized decoder using classification-based bins and referring segmentation with an off-the-shelf decoder. Comprehensive experiments conducted on established datasets show that EVP achieves state-of-the-art results in single-image depth estimation for indoor (NYU Depth v2, 11.8% RMSE improvement over VPD) and outdoor (KITTI) environments, as well as referring segmentation (RefCOCO, 2.53 IoU improvement over ReLA). The code and pre-trained models are publicly available at https://github.com/Lavreniuk/EVP. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.06053 [pdf, other]

IEKG: A Commonsense Knowledge Graph for Idiomatic Expressions

Authors: Ziheng Zeng, Kellen Tan Cheng, Srihari Venkat Nanniyur, Jianing Zhou, Suma Bhat

Abstract: Idiomatic expression (IE) processing and comprehension have challenged pre-trained language models (PTLMs) because their meanings are non-compositional. Unlike prior works that enable IE comprehension through fine-tuning PTLMs with sentences containing IEs, in this work, we construct IEKG, a commonsense knowledge graph for figurative interpretations of IEs. This extends the established ATOMIC2020… ▽ More Idiomatic expression (IE) processing and comprehension have challenged pre-trained language models (PTLMs) because their meanings are non-compositional. Unlike prior works that enable IE comprehension through fine-tuning PTLMs with sentences containing IEs, in this work, we construct IEKG, a commonsense knowledge graph for figurative interpretations of IEs. This extends the established ATOMIC2020 graph, converting PTLMs into knowledge models (KMs) that encode and infer commonsense knowledge related to IE use. Experiments show that various PTLMs can be converted into KMs with IEKG. We verify the quality of IEKG and the ability of the trained KMs with automatic and human evaluation. Through applications in natural language understanding, we show that a PTLM injected with knowledge from IEKG exhibits improved IE comprehension ability and can generalize to IEs unseen during training. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

arXiv:2312.03079 [pdf, other]

LooseControl: Lifting ControlNet for Generalized Depth Conditioning

Authors: Shariq Farooq Bhat, Niloy J. Mitra, Peter Wonka

Abstract: We present LooseControl to allow generalized depth conditioning for diffusion-based image generation. ControlNet, the SOTA for depth-conditioned image generation, produces remarkable results but relies on having access to detailed depth maps for guidance. Creating such exact depth maps, in many scenarios, is challenging. This paper introduces a generalized version of depth conditioning that enable… ▽ More We present LooseControl to allow generalized depth conditioning for diffusion-based image generation. ControlNet, the SOTA for depth-conditioned image generation, produces remarkable results but relies on having access to detailed depth maps for guidance. Creating such exact depth maps, in many scenarios, is challenging. This paper introduces a generalized version of depth conditioning that enables many new content-creation workflows. Specifically, we allow (C1) scene boundary control for loosely specifying scenes with only boundary conditions, and (C2) 3D box control for specifying layout locations of the target objects rather than the exact shape and appearance of the objects. Using LooseControl, along with text guidance, users can create complex environments (e.g., rooms, street views, etc.) by specifying only scene boundaries and locations of primary objects. Further, we provide two editing mechanisms to refine the results: (E1) 3D box editing enables the user to refine images by changing, adding, or removing boxes while freezing the style of the image. This yields minimal changes apart from changes induced by the edited boxes. (E2) Attribute editing proposes possible editing directions to change one particular aspect of the scene, such as the overall object density or a particular object. Extensive tests and comparisons with baselines demonstrate the generality of our method. We believe that LooseControl can become an important design tool for easily creating complex environments and be extended to other forms of guidance channels. Code and more information are available at https://shariqfarooq123.github.io/loose-control/ . △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2312.02284 [pdf, other]

PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

Authors: Zhenyu Li, Shariq Farooq Bhat, Peter Wonka

Abstract: Single image depth estimation is a foundational task in computer vision and generative modeling. However, prevailing depth estimation models grapple with accommodating the increasing resolutions commonplace in today's consumer cameras and devices. Existing high-resolution strategies show promise, but they often face limitations, ranging from error propagation to the loss of high-frequency details.… ▽ More Single image depth estimation is a foundational task in computer vision and generative modeling. However, prevailing depth estimation models grapple with accommodating the increasing resolutions commonplace in today's consumer cameras and devices. Existing high-resolution strategies show promise, but they often face limitations, ranging from error propagation to the loss of high-frequency details. We present PatchFusion, a novel tile-based framework with three key components to improve the current state of the art: (1) A patch-wise fusion network that fuses a globally-consistent coarse prediction with finer, inconsistent tiled predictions via high-level feature guidance, (2) A Global-to-Local (G2L) module that adds vital context to the fusion network, discarding the need for patch selection heuristics, and (3) A Consistency-Aware Training (CAT) and Inference (CAI) approach, emphasizing patch overlap consistency and thereby eradicating the necessity for post-processing. Experiments on UnrealStereo4K, MVS-Synth, and Middleburry 2014 demonstrate that our framework can generate high-resolution depth maps with intricate details. PatchFusion is independent of the base model for depth estimation. Notably, our framework built on top of SOTA ZoeDepth brings improvements for a total of 17.3% and 29.4% in terms of the root mean squared error (RMSE) on UnrealStereo4K and MVS-Synth, respectively. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.16051 [pdf, other]

Evaluating the Impact of Personalized Value Alignment in Human-Robot Interaction: Insights into Trust and Team Performance Outcomes

Authors: Shreyas Bhat, Joseph B. Lyons, Cong Shi, X. Jessie Yang

Abstract: This paper examines the effect of real-time, personalized alignment of a robot's reward function to the human's values on trust and team performance. We present and compare three distinct robot interaction strategies: a non-learner strategy where the robot presumes the human's reward function mirrors its own, a non-adaptive-learner strategy in which the robot learns the human's reward function for… ▽ More This paper examines the effect of real-time, personalized alignment of a robot's reward function to the human's values on trust and team performance. We present and compare three distinct robot interaction strategies: a non-learner strategy where the robot presumes the human's reward function mirrors its own, a non-adaptive-learner strategy in which the robot learns the human's reward function for trust estimation and human behavior modeling, but still optimizes its own reward function, and an adaptive-learner strategy in which the robot learns the human's reward function and adopts it as its own. Two human-subject experiments with a total number of 54 participants were conducted. In both experiments, the human-robot team searches for potential threats in a town. The team sequentially goes through search sites to look for threats. We model the interaction between the human and the robot as a trust-aware Markov Decision Process (trust-aware MDP) and use Bayesian Inverse Reinforcement Learning (IRL) to estimate the reward weights of the human as they interact with the robot. In Experiment 1, we start our learning algorithm with an informed prior of the human's values/goals. In Experiment 2, we start the learning algorithm with an uninformed prior. Results indicate that when starting with a good informed prior, personalized value alignment does not seem to benefit trust or team performance. On the other hand, when an informed prior is unavailable, alignment to the human's values leads to high trust and higher perceived performance while maintaining the same objective team performance. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 10 pages, 9 figures, to be published in ACM/IEEE International Conference on Human Robot Interaction. arXiv admin note: text overlap with arXiv:2309.05179

arXiv:2310.19127 [pdf, other]

Unified Representation for Non-compositional and Compositional Expressions

Authors: Ziheng Zeng, Suma Bhat

Abstract: Accurate processing of non-compositional language relies on generating good representations for such expressions. In this work, we study the representation of language non-compositionality by proposing a language model, PIER, that builds on BART and can create semantically meaningful and contextually appropriate representations for English potentially idiomatic expressions (PIEs). PIEs are charact… ▽ More Accurate processing of non-compositional language relies on generating good representations for such expressions. In this work, we study the representation of language non-compositionality by proposing a language model, PIER, that builds on BART and can create semantically meaningful and contextually appropriate representations for English potentially idiomatic expressions (PIEs). PIEs are characterized by their non-compositionality and contextual ambiguity in their literal and idiomatic interpretations. Via intrinsic evaluation on embedding quality and extrinsic evaluation on PIE processing and NLU tasks, we show that representations generated by PIER result in 33% higher homogeneity score for embedding clustering than BART, whereas 3.12% and 3.29% gains in accuracy and sequence accuracy for PIE sense classification and span detection compared to the state-of-the-art IE representation model, GIEA. These gains are achieved without sacrificing PIER's performance on NLU tasks (+/- 1% accuracy) compared to BART. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: This work is accepted to EMNLP 2023 Findings

arXiv:2310.18743 [pdf, other]

Optimization of utility-based shortfall risk: A non-asymptotic viewpoint

Authors: Sumedh Gupte, Prashanth L. A., Sanjay P. Bhat

Abstract: We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR. Next, in the context of UBSR optimization, we derive an expression for the UBSR gradient under a smooth p… ▽ More We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR. Next, in the context of UBSR optimization, we derive an expression for the UBSR gradient under a smooth parameterization. This expression is a ratio of expectations, both of which involve the UBSR. We use SAA for the numerator as well as denominator in the UBSR gradient expression to arrive at a biased gradient estimator. We derive non-asymptotic bounds on the estimation error, which show that our gradient estimator is asymptotically unbiased. We incorporate the aforementioned gradient estimator into a stochastic gradient (SG) algorithm for UBSR optimization. Finally, we derive non-asymptotic bounds that quantify the rate of convergence of our SG algorithm for UBSR optimization. △ Less

Submitted 30 March, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

arXiv:2310.11389 [pdf, ps, other]

Risk Estimation in a Markov Cost Process: Lower and Upper Bounds

Authors: Gugan Thoppe, L. A. Prashanth, Sanjay Bhat

Abstract: We tackle the problem of estimating risk measures of the infinite-horizon discounted cost within a Markov cost process. The risk measures we study include variance, Value-at-Risk (VaR), and Conditional Value-at-Risk (CVaR). First, we show that estimating any of these risk measures with $ε$-accuracy, either in expected or high-probability sense, requires at least $Ω(1/ε^2)$ samples. Then, using a t… ▽ More We tackle the problem of estimating risk measures of the infinite-horizon discounted cost within a Markov cost process. The risk measures we study include variance, Value-at-Risk (VaR), and Conditional Value-at-Risk (CVaR). First, we show that estimating any of these risk measures with $ε$-accuracy, either in expected or high-probability sense, requires at least $Ω(1/ε^2)$ samples. Then, using a truncation scheme, we derive an upper bound for the CVaR and variance estimation. This bound matches our lower bound up to logarithmic factors. Finally, we discuss an extension of our estimation scheme that covers more general risk measures satisfying a certain continuity criterion, e.g., spectral risk measures, utility-based shortfall risk. To the best of our knowledge, our work is the first to provide lower and upper bounds for estimating any risk measure beyond the mean within a Markovian setting. Our lower bounds also extend to the infinite-horizon discounted costs' mean. Even in that case, our lower bound of $Ω(1/ε^2) $ improves upon the existing $Ω(1/ε)$ bound [13]. △ Less

Submitted 11 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.10640 [pdf, other]

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

Authors: Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, Salman Khan, Peter Wonka

Abstract: Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects. While excelling in generating images from short, single-object descriptions, these models often struggle to faithfully capture all the nuanced details within longer and more elaborate text… ▽ More Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects. While excelling in generating images from short, single-object descriptions, these models often struggle to faithfully capture all the nuanced details within longer and more elaborate textual inputs. In response, we present a novel approach leveraging Large Language Models (LLMs) to extract critical components from text prompts, including bounding box coordinates for foreground objects, detailed textual descriptions for individual objects, and a succinct background context. These components form the foundation of our layout-to-image generation model, which operates in two phases. The initial Global Scene Generation utilizes object layouts and background context to create an initial scene but often falls short in faithfully representing object characteristics as specified in the prompts. To address this limitation, we introduce an Iterative Refinement Scheme that iteratively evaluates and refines box-level content to align them with their textual descriptions, recomposing objects as needed to ensure consistency. Our evaluation on complex prompts featuring multiple objects demonstrates a substantial improvement in recall compared to baseline diffusion models. This is further validated by a user study, underscoring the efficacy of our approach in generating coherent and detailed scenes from intricate textual inputs. △ Less

Submitted 25 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted at ICLR 2024

arXiv:2310.09536 [pdf, other]

CarExpert: Leveraging Large Language Models for In-Car Conversational Question Answering

Authors: Md Rashad Al Hasan Rony, Christian Suess, Sinchana Ramakanth Bhat, Viju Sudhi, Julia Schneider, Maximilian Vogel, Roman Teucher, Ken E. Friedl, Soumya Sahoo

Abstract: Large language models (LLMs) have demonstrated remarkable performance by following natural language instructions without fine-tuning them on domain-specific tasks and data. However, leveraging LLMs for domain-specific question answering suffers from severe limitations. The generated answer tends to hallucinate due to the training data collection time (when using off-the-shelf), complex user uttera… ▽ More Large language models (LLMs) have demonstrated remarkable performance by following natural language instructions without fine-tuning them on domain-specific tasks and data. However, leveraging LLMs for domain-specific question answering suffers from severe limitations. The generated answer tends to hallucinate due to the training data collection time (when using off-the-shelf), complex user utterance and wrong retrieval (in retrieval-augmented generation). Furthermore, due to the lack of awareness about the domain and expected output, such LLMs may generate unexpected and unsafe answers that are not tailored to the target domain. In this paper, we propose CarExpert, an in-car retrieval-augmented conversational question-answering system leveraging LLMs for different tasks. Specifically, CarExpert employs LLMs to control the input, provide domain-specific documents to the extractive and generative answering components, and controls the output to ensure safe and domain-specific answers. A comprehensive empirical evaluation exhibits that CarExpert outperforms state-of-the-art LLMs in generating natural, safe and car-specific answers. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: Accepted into EMNLP 2023 (industry track), corresponding Author: Md Rashad Al Hasan Rony

arXiv:2309.05179 [pdf, other]

Effect of Adapting to Human Preferences on Trust in Human-Robot Teaming

Authors: Shreyas Bhat, Joseph B. Lyons, Cong Shi, X. Jessie Yang

Abstract: We present the effect of adapting to human preferences on trust in a human-robot teaming task. The team performs a task in which the robot acts as an action recommender to the human. It is assumed that the behavior of the human and the robot is based on some reward function they try to optimize. We use a new human trust-behavior model that enables the robot to learn and adapt to the human's prefer… ▽ More We present the effect of adapting to human preferences on trust in a human-robot teaming task. The team performs a task in which the robot acts as an action recommender to the human. It is assumed that the behavior of the human and the robot is based on some reward function they try to optimize. We use a new human trust-behavior model that enables the robot to learn and adapt to the human's preferences in real-time during their interaction using Bayesian Inverse Reinforcement Learning. We present three strategies for the robot to interact with a human: a non-learner strategy, in which the robot assumes that the human's reward function is the same as the robot's, a non-adaptive learner strategy that learns the human's reward function for performance estimation, but still optimizes its own reward function, and an adaptive-learner strategy that learns the human's reward function for performance estimation and also optimizes this learned reward function. Results show that adapting to the human's reward function results in the highest trust in the robot. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 6 pages, 6 figures, AAAI Fall Symposium on Agent Teaming in Mixed-Motive Situations

arXiv:2309.01050 [pdf, other]

Efficient Curriculum based Continual Learning with Informative Subset Selection for Remote Sensing Scene Classification

Authors: S Divakar Bhat, Biplab Banerjee, Subhasis Chaudhuri, Avik Bhattacharya

Abstract: We tackle the problem of class incremental learning (CIL) in the realm of landcover classification from optical remote sensing (RS) images in this paper. The paradigm of CIL has recently gained much prominence given the fact that data are generally obtained in a sequential manner for real-world phenomenon. However, CIL has not been extensively considered yet in the domain of RS irrespective of the… ▽ More We tackle the problem of class incremental learning (CIL) in the realm of landcover classification from optical remote sensing (RS) images in this paper. The paradigm of CIL has recently gained much prominence given the fact that data are generally obtained in a sequential manner for real-world phenomenon. However, CIL has not been extensively considered yet in the domain of RS irrespective of the fact that the satellites tend to discover new classes at different geographical locations temporally. With this motivation, we propose a novel CIL framework inspired by the recent success of replay-memory based approaches and tackling two of their shortcomings. In order to reduce the effect of catastrophic forgetting of the old classes when a new stream arrives, we learn a curriculum of the new classes based on their similarity with the old classes. This is found to limit the degree of forgetting substantially. Next while constructing the replay memory, instead of randomly selecting samples from the old streams, we propose a sample selection strategy which ensures the selection of highly confident samples so as to reduce the effects of noise. We observe a sharp improvement in the CIL performance with the proposed components. Experimental results on the benchmark NWPU-RESISC45, PatternNet, and EuroSAT datasets confirm that our method offers improved stability-plasticity trade-off than the literature. △ Less

Submitted 2 September, 2023; originally announced September 2023.

arXiv:2307.16562 [pdf, other]

SAKSHI: Decentralized AI Platforms

Authors: Suma Bhat, Canhui Chen, Zerui Cheng, Zhixuan Fang, Ashwin Hebbar, Sreeram Kannan, Ranvir Rana, Peiyao Sheng, Himanshu Tyagi, Pramod Viswanath, Xuechao Wang

Abstract: Large AI models (e.g., Dall-E, GPT4) have electrified the scientific, technological and societal landscape through their superhuman capabilities. These services are offered largely in a traditional web2.0 format (e.g., OpenAI's GPT4 service). As more large AI models proliferate (personalizing and specializing to a variety of domains), there is a tremendous need to have a neutral trust-free platfor… ▽ More Large AI models (e.g., Dall-E, GPT4) have electrified the scientific, technological and societal landscape through their superhuman capabilities. These services are offered largely in a traditional web2.0 format (e.g., OpenAI's GPT4 service). As more large AI models proliferate (personalizing and specializing to a variety of domains), there is a tremendous need to have a neutral trust-free platform that allows the hosting of AI models, clients receiving AI services efficiently, yet in a trust-free, incentive compatible, Byzantine behavior resistant manner. In this paper we propose SAKSHI, a trust-free decentralized platform specifically suited for AI services. The key design principles of SAKSHI are the separation of the data path (where AI query and service is managed) and the control path (where routers and compute and storage hosts are managed) from the transaction path (where the metering and billing of services are managed over a blockchain). This separation is enabled by a "proof of inference" layer which provides cryptographic resistance against a variety of misbehaviors, including poor AI service, nonpayment for service, copying of AI models. This is joint work between multiple universities (Princeton University, University of Illinois at Urbana-Champaign, Tsinghua University, HKUST) and two startup companies (Witness Chain and Eigen Layer). △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 23 pages, 9 figures

arXiv:2307.03926 [pdf, ps, other]

Enhancing Room Security and Automating Class Attendance Using ID Cards

Authors: Shravan Bhat, Nithin R, Pranav S

Abstract: With the rapid advancements in technology, automation has emerged as the future of human endeavors. From simple tasks like attendance management to complex security systems, automation has the potential to revolutionize various aspects of our lives. This research paper explores the implementation of a method aimed at enhancing room security in hostels and automating class attendance using ID cards… ▽ More With the rapid advancements in technology, automation has emerged as the future of human endeavors. From simple tasks like attendance management to complex security systems, automation has the potential to revolutionize various aspects of our lives. This research paper explores the implementation of a method aimed at enhancing room security in hostels and automating class attendance using ID cards. In this study, we propose a system that utilizes the unique identity information stored in ID cards for various security and check-in tasks. By integrating RFID (Radio-Frequency Identification) reader technology, GSM modules, Node MCU, and Arduino, we create a comprehensive solution. The RFID reader scans the ID card, extracting the relevant information and verifying the user's identity. The data is then transmitted via the GSM module to a central database, ensuring real-time monitoring and security measures. Moreover, the system also enables the automation of class attendance. By utilizing the same ID cards, students can simply tap their cards on a reader placed in the classroom. This information is recorded automatically, eliminating the need for manual attendance taking and reducing errors and time consumption. This research project highlights the practical implementation of ID card technology to enhance room security in hostels and automate class attendance processes. By leveraging the power of automation, we aim to streamline administrative tasks, improve security measures, and optimize efficiency in educational institutions and other relevant settings. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: 7 pages, 5 figures

MSC Class: none ACM Class: J.7

arXiv:2306.08701 [pdf, other]

Transpiling RTL Pseudo-code of the POWER Instruction Set Architecture to C for Real-time Performance Analysis on Cavatools Simulator

Authors: Kinar S, Prashanth K V, Adithya Hegde, Aditya Subrahmanya Bhat, Narender M

Abstract: This paper presents a transpiler framework for converting RTL pseudo code of the POWER Instruction Set Architecture (ISA) to C code, enabling its execution on the Cavatools simulator. The transpiler consists of a lexer and parser, which parse the RTL pseudo code and generate corresponding C code representations. The lexer tokenizes the input code, while the parser applies grammar rules to build an… ▽ More This paper presents a transpiler framework for converting RTL pseudo code of the POWER Instruction Set Architecture (ISA) to C code, enabling its execution on the Cavatools simulator. The transpiler consists of a lexer and parser, which parse the RTL pseudo code and generate corresponding C code representations. The lexer tokenizes the input code, while the parser applies grammar rules to build an abstract syntax tree (AST). The transpiler ensures compatibility with the Cavatools simulator by generating C code that adheres to its requirements. The resulting C code can be executed on the Cavatools simulator, allowing developers to analyze the instruction-level performance of the Power ISA in real time. The proposed framework facilitates the seamless integration of RTL pseudo code into the Cavatools ecosystem, enabling comprehensive performance analysis and optimization of Power ISA-based code. △ Less

Submitted 14 June, 2023; originally announced June 2023.

ACM Class: B.5.2

arXiv:2305.13675 [pdf, other]

Polyglot or Not? Measuring Multilingual Encyclopedic Knowledge in Foundation Models

Authors: Tim Schott, Daniel Furman, Shreshta Bhat

Abstract: In this work, we assess the ability of foundation models to recall encyclopedic knowledge across a wide range of linguistic contexts. To support this, we: 1) produce a 20-language dataset that contains 303k factual associations paired with counterfactuals, 2) evaluate 5 models in a multilingual test, and 3) benchmark a diverse set of 24 models in an English-only test. Meta's LLaMA achieves the hig… ▽ More In this work, we assess the ability of foundation models to recall encyclopedic knowledge across a wide range of linguistic contexts. To support this, we: 1) produce a 20-language dataset that contains 303k factual associations paired with counterfactuals, 2) evaluate 5 models in a multilingual test, and 3) benchmark a diverse set of 24 models in an English-only test. Meta's LLaMA achieves the highest scores in both multilingual and English-only evaluations. Yet, an analysis of LLaMA's errors reveals significant limitations in its ability to recall facts in languages other than English, plus difficulties related to the location and gender of fact subjects. Overall, our findings suggest that today's foundation models are far from polyglots. △ Less

Submitted 5 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: EMNLP 2023 (Main)

arXiv:2303.12496 [pdf, ps, other]

On the Bit Error Performance of OTFS Modulation using Discrete Zak Transform

Authors: Vineetha Yogesh, Vighnesh S Bhat, Sandesh Rao Mattu, A. Chockalingam

Abstract: In orthogonal time frequency space (OTFS) modulation, Zak transform approach is a natural approach for converting information symbols multiplexed in the DD domain directly to time domain for transmission, and vice versa at the receiver. Past research on OTFS has primarily considered a two-step approach where DD domain symbols are first converted to time-frequency domain which are then converted to… ▽ More In orthogonal time frequency space (OTFS) modulation, Zak transform approach is a natural approach for converting information symbols multiplexed in the DD domain directly to time domain for transmission, and vice versa at the receiver. Past research on OTFS has primarily considered a two-step approach where DD domain symbols are first converted to time-frequency domain which are then converted to time domain for transmission, and vice versa at the receiver. The Zak transform approach can offer performance and complexity benefits compared to the two-step approach. This paper presents an early investigation on the bit error performance of OTFS realized using discrete Zak transform (DZT). We develop a compact DD domain input-output relation for DZT-OTFS using matrix decomposition that is valid for both integer and fractional delay-Dopplers. We analyze the bit error performance of DZT-OTFS using pairwise error probability analysis and simulations. Simulation results show that 1) both DZT-OTFS and two-step OTFS perform better than OFDM, and 2) DZT-OTFS achieves better performance compared to two-step OTFS over a wide range of Doppler spreads. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: ICC'2023. Copyright IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2303.03462 [pdf, other]

Towards Composable Distributions of Latent Space Augmentations

Authors: Omead Pooladzandi, Jeffrey Jiang, Sunay Bhat, Gregory Pottie

Abstract: We propose a composable framework for latent space image augmentation that allows for easy combination of multiple augmentations. Image augmentation has been shown to be an effective technique for improving the performance of a wide variety of image classification and generation tasks. Our framework is based on the Variational Autoencoder architecture and uses a novel approach for augmentation via… ▽ More We propose a composable framework for latent space image augmentation that allows for easy combination of multiple augmentations. Image augmentation has been shown to be an effective technique for improving the performance of a wide variety of image classification and generation tasks. Our framework is based on the Variational Autoencoder architecture and uses a novel approach for augmentation via linear transformation within the latent space itself. We explore losses and augmentation latent geometry to enforce the transformations to be composable and involuntary, thus allowing the transformations to be readily combined or inverted. Finally, we show these properties are better performing with certain pairs of augmentations, but we can transfer the latent space to other sets of augmentations to modify performance, effectively constraining the VAE's bottleneck to preserve the variance of specific augmentations and features of the image which we care about. We demonstrate the effectiveness of our approach with initial results on the MNIST dataset against both a standard VAE and a Conditional VAE. This latent augmentation method allows for much greater control and geometric interpretability of the latent space, making it a valuable tool for researchers and practitioners in the field. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted at 2023 Information Theory and Applications Workshop (Feb, San Diego)

arXiv:2302.12288 [pdf, other]

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

Authors: Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller

Abstract: This paper tackles the problem of depth estimation from a single image. Existing work either focuses on generalization performance disregarding metric scale, i.e. relative depth estimation, or state-of-the-art results on specific datasets, i.e. metric depth estimation. We propose the first approach that combines both worlds, leading to a model with excellent generalization performance while mainta… ▽ More This paper tackles the problem of depth estimation from a single image. Existing work either focuses on generalization performance disregarding metric scale, i.e. relative depth estimation, or state-of-the-art results on specific datasets, i.e. metric depth estimation. We propose the first approach that combines both worlds, leading to a model with excellent generalization performance while maintaining metric scale. Our flagship model, ZoeD-M12-NK, is pre-trained on 12 datasets using relative depth and fine-tuned on two datasets using metric depth. We use a lightweight head with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier. Our framework admits multiple configurations depending on the datasets used for relative depth pre-training and metric fine-tuning. Without pre-training, we can already significantly improve the state of the art (SOTA) on the NYU Depth v2 indoor dataset. Pre-training on twelve datasets and fine-tuning on the NYU Depth v2 indoor dataset, we can further improve SOTA for a total of 21% in terms of relative absolute error (REL). Finally, ZoeD-M12-NK is the first model that can jointly train on multiple datasets (NYU Depth v2 and KITTI) without a significant drop in performance and achieve unprecedented zero-shot generalization performance to eight unseen datasets from both indoor and outdoor domains. The code and pre-trained models are publicly available at https://github.com/isl-org/ZoeDepth . △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.09182 [pdf, other]

Safe Networked Robotics with Probabilistic Verification

Authors: Sai Shankar Narasimhan, Sharachchandra Bhat, Sandeep P. Chinchali

Abstract: Autonomous robots must utilize rich sensory data to make safe control decisions. To process this data, compute-constrained robots often require assistance from remote computation, or the cloud, that runs compute-intensive deep neural network perception or control models. However, this assistance comes at the cost of a time delay due to network latency, resulting in past observations being used in… ▽ More Autonomous robots must utilize rich sensory data to make safe control decisions. To process this data, compute-constrained robots often require assistance from remote computation, or the cloud, that runs compute-intensive deep neural network perception or control models. However, this assistance comes at the cost of a time delay due to network latency, resulting in past observations being used in the cloud to compute the control commands for the present robot state. Such communication delays could potentially lead to the violation of essential safety properties, such as collision avoidance. This paper develops methods to ensure the safety of robots operated over communication networks with stochastic latency. To do so, we use tools from formal verification to construct a shield, i.e., a run-time monitor, that provides a list of safe actions for any delayed sensory observation, given the expected and maximum network latency. Our shield is minimally intrusive and enables networked robots to satisfy key safety constraints, expressed as temporal logic specifications, with desired probability. We demonstrate our approach on a real F1/10th autonomous vehicle that navigates in indoor environments and transmits rich LiDAR sensory data over congested WiFi links. △ Less

Submitted 12 July, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

arXiv:2212.05378 [pdf, other]

Neural Continuous-Time Markov Models

Authors: Majerle Reeves, Harish S. Bhat

Abstract: Continuous-time Markov chains are used to model stochastic systems where transitions can occur at irregular times, e.g., birth-death processes, chemical reaction networks, population dynamics, and gene regulatory networks. We develop a method to learn a continuous-time Markov chain's transition rate functions from fully observed time series. In contrast with existing methods, our method allows for… ▽ More Continuous-time Markov chains are used to model stochastic systems where transitions can occur at irregular times, e.g., birth-death processes, chemical reaction networks, population dynamics, and gene regulatory networks. We develop a method to learn a continuous-time Markov chain's transition rate functions from fully observed time series. In contrast with existing methods, our method allows for transition rates to depend nonlinearly on both state variables and external covariates. The Gillespie algorithm is used to generate trajectories of stochastic systems where propensity functions (reaction rates) are known. Our method can be viewed as the inverse: given trajectories of a stochastic reaction network, we generate estimates of the propensity functions. While previous methods used linear or log-linear methods to link transition rates to covariates, we use neural networks, increasing the capacity and potential accuracy of learned models. In the chemical context, this enables the method to learn propensity functions from non-mass-action kinetics. We test our method with synthetic data generated from a variety of systems with known transition rates. We show that our method learns these transition rates with considerably more accuracy than log-linear methods, in terms of mean absolute error between ground truth and predicted transition rates. We also demonstrate an application of our methods to open-loop control of a continuous-time Markov chain. △ Less

Submitted 10 December, 2022; originally announced December 2022.

Comments: 8 pages, 6 figures

arXiv:2212.03317 [pdf, other]

Drift Identification for Lévy alpha-Stable Stochastic Systems

Authors: Harish S. Bhat

Abstract: This paper focuses on a stochastic system identification problem: given time series observations of a stochastic differential equation (SDE) driven by Lévy $α$-stable noise, estimate the SDE's drift field. For $α$ in the interval $[1,2)$, the noise is heavy-tailed, leading to computational difficulties for methods that compute transition densities and/or likelihoods in physical space. We propose a… ▽ More This paper focuses on a stochastic system identification problem: given time series observations of a stochastic differential equation (SDE) driven by Lévy $α$-stable noise, estimate the SDE's drift field. For $α$ in the interval $[1,2)$, the noise is heavy-tailed, leading to computational difficulties for methods that compute transition densities and/or likelihoods in physical space. We propose a Fourier space approach that centers on computing time-dependent characteristic functions, i.e., Fourier transforms of time-dependent densities. Parameterizing the unknown drift field using Fourier series, we formulate a loss consisting of the squared error between predicted and empirical characteristic functions. We minimize this loss with gradients computed via the adjoint method. For a variety of one- and two-dimensional problems, we demonstrate that this method is capable of learning drift fields in qualitative and/or quantitative agreement with ground truth fields. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: 22 pages, 6 figures

arXiv:2211.15175 [pdf, ps, other]

Automating and Mechanizing Cutoff-based Verification of Distributed Protocols

Authors: Shreesha G. Bhat, Kartik Nagar

Abstract: Distributed protocols are generally parametric and can be executed on a system with any number of nodes, and hence proving their correctness becomes an infinite state verification problem. The most popular approach for verifying distributed protocols is to find an inductive invariant which is strong enough to prove the required safety property. However, finding inductive invariants is known to be… ▽ More Distributed protocols are generally parametric and can be executed on a system with any number of nodes, and hence proving their correctness becomes an infinite state verification problem. The most popular approach for verifying distributed protocols is to find an inductive invariant which is strong enough to prove the required safety property. However, finding inductive invariants is known to be notoriously hard, and is especially harder in the context of distributed protocols which are quite complex due to their asynchronous nature. In this work, we investigate an orthogonal cut-off based approach to verifying distributed protocols which sidesteps the problem of finding an inductive invariant, and instead reduces checking correctness to a finite state verification problem. The main idea is to find a finite, fixed protocol instance called the cutoff instance, such that if the cutoff instance is safe, then any protocol instance would also be safe. Previous cutoff based approaches have only been applied to a restricted class of protocols and specifications. We formalize the cutoff approach in the context of a general protocol modeling language (RML), and identify sufficient conditions which can be efficiently encoded in SMT to check whether a given protocol instance is a cutoff instance. Further, we propose a simple static analysis-based algorithm to automatically synthesize a cut-off instance. We have applied our approach successfully on a number of complex distributed protocols, providing the first known cut-off results for many of them. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: 27 pages

arXiv:2210.13989 [pdf, ps, other]

Input-Output Relation and Performance of RIS-Aided OTFS with Fractional Delay-Doppler

Authors: Vighnesh S Bhat, Gandhodi Harshavardhan, A. Chockalingam

Abstract: Reconfigurable intelligent surfaces (RIS) and orthogonal time-frequency space (OTFS) modulation have gained attention in recent wireless research. RIS technology aids communication by reflecting the incident electromagnetic waves towards the receiver, and OTFS modulation is effective in high-Doppler channels. This paper presents an early investigation of RIS-aided OTFS in high-Doppler channels. We… ▽ More Reconfigurable intelligent surfaces (RIS) and orthogonal time-frequency space (OTFS) modulation have gained attention in recent wireless research. RIS technology aids communication by reflecting the incident electromagnetic waves towards the receiver, and OTFS modulation is effective in high-Doppler channels. This paper presents an early investigation of RIS-aided OTFS in high-Doppler channels. We derive the end-to-end delay-Doppler (DD) domain input-output relation of a RIS-aided OTFS system, considering rectangular pulses and fractional delay-Doppler values. We also consider a Zak receiver for RIS-aided OTFS that converts the received time-domain signal to DD domain in one step using Zak transform, and derive its end-to-end input-output relation. Our simulation results show that $i)$ RIS-aided OTFS performs better than OTFS without RIS, $ii)$ Zak receiver performs better than a two-step receiver, and $iii)$ RIS-aided OTFS achieves superior performance compared to RIS-aided OFDM. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: Comm Lett. Copyright IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2210.11275 [pdf, other]

Causal Structural Hypothesis Testing and Data Generation Models

Authors: Jeffrey Jiang, Omead Pooladzandi, Sunay Bhat, Gregory Pottie

Abstract: A vast amount of expert and domain knowledge is captured by causal structural priors, yet there has been little research on testing such priors for generalization and data synthesis purposes. We propose a novel model architecture, Causal Structural Hypothesis Testing, that can use nonparametric, structural causal knowledge and approximate a causal model's functional relationships using deep neural… ▽ More A vast amount of expert and domain knowledge is captured by causal structural priors, yet there has been little research on testing such priors for generalization and data synthesis purposes. We propose a novel model architecture, Causal Structural Hypothesis Testing, that can use nonparametric, structural causal knowledge and approximate a causal model's functional relationships using deep neural networks. We use these architectures for comparing structural priors, akin to hypothesis testing, using a deliberate (non-random) split of training and testing data. Extensive simulations demonstrate the effectiveness of out-of-distribution generalization error as a proxy for causal structural prior hypothesis testing and offers a statistical baseline for interpreting results. We show that the variational version of the architecture, Causal Structural Variational Hypothesis Testing can improve performance in low SNR regimes. Due to the simplicity and low parameter count of the models, practitioners can test and compare structural prior hypotheses on small dataset and use the priors with the best generalization capacity to synthesize much larger, causally-informed datasets. Finally, we validate our methods on a synthetic pendulum dataset, and show a use-case on a real-world trauma surgery ground-level falls dataset. △ Less

Submitted 4 November, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research

arXiv:2210.00313 [pdf, other]

CRISP: Curriculum based Sequential Neural Decoders for Polar Code Family

Authors: S Ashwin Hebbar, Viraj Nadkarni, Ashok Vardhan Makkuva, Suma Bhat, Sewoong Oh, Pramod Viswanath

Abstract: Polar codes are widely used state-of-the-art codes for reliable communication that have recently been included in the 5th generation wireless standards (5G). However, there remains room for the design of polar decoders that are both efficient and reliable in the short blocklength regime. Motivated by recent successes of data-driven channel decoders, we introduce a novel $\textbf{C}$ur… ▽ More Polar codes are widely used state-of-the-art codes for reliable communication that have recently been included in the 5th generation wireless standards (5G). However, there remains room for the design of polar decoders that are both efficient and reliable in the short blocklength regime. Motivated by recent successes of data-driven channel decoders, we introduce a novel $\textbf{C}$ur$\textbf{RI}$culum based $\textbf{S}$equential neural decoder for $\textbf{P}$olar codes (CRISP). We design a principled curriculum, guided by information-theoretic insights, to train CRISP and show that it outperforms the successive-cancellation (SC) decoder and attains near-optimal reliability performance on the Polar(32,16) and Polar(64,22) codes. The choice of the proposed curriculum is critical in achieving the accuracy gains of CRISP, as we show by comparing against other curricula. More notably, CRISP can be readily extended to Polarization-Adjusted-Convolutional (PAC) codes, where existing SC decoders are significantly less reliable. To the best of our knowledge, CRISP constructs the first data-driven decoder for PAC codes and attains near-optimal performance on the PAC(32,16) code. △ Less

Submitted 29 May, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

Comments: 23 pages, 23 figures. ICML 2023

arXiv:2209.04430 [pdf, other]

doi 10.1007/s12036-023-09920-4

Investigation of a Machine learning methodology for the SKA pulsar search pipeline

Authors: Shashank Sanjay Bhat, Thiagaraj Prabu, Ben Stappers, Atul Ghalame, Snehanshu Saha, T. S. B Sudarshan, Zafiirah Hosenie

Abstract: The SKA pulsar search pipeline will be used for real time detection of pulsars. Modern radio telescopes such as SKA will be generating petabytes of data in their full scale of operation. Hence experience-based and data-driven algorithms become indispensable for applications such as candidate detection. Here we describe our findings from testing a state of the art object detection algorithm called… ▽ More The SKA pulsar search pipeline will be used for real time detection of pulsars. Modern radio telescopes such as SKA will be generating petabytes of data in their full scale of operation. Hence experience-based and data-driven algorithms become indispensable for applications such as candidate detection. Here we describe our findings from testing a state of the art object detection algorithm called Mask R-CNN to detect candidate signatures in the SKA pulsar search pipeline. We have trained the Mask R-CNN model to detect candidate images. A custom annotation tool was developed to mark the regions of interest in large datasets efficiently. We have successfully demonstrated this algorithm by detecting candidate signatures on a simulation dataset. The paper presents details of this work with a highlight on the future prospects. △ Less

Submitted 17 January, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

Journal ref: Journal of Astronomy and Astrophysics SKA special issue 2022-23 (Under review)

arXiv:2209.02275 [pdf, ps, other]

Multi-class Classifier based Failure Prediction with Artificial and Anonymous Training for Data Privacy

Authors: Dibakar Das, Vikram Seshasai, Vineet Sudhir Bhat, Pushkal Juneja, Jyotsna Bapat, Debabrata Das

Abstract: This paper proposes a novel non-intrusive system failure prediction technique using available information from developers and minimal information from raw logs (rather than mining entire logs) but keeping the data entirely private with the data owners. A neural network based multi-class classifier is developed for failure prediction, using artificially generated anonymous data set, applying a comb… ▽ More This paper proposes a novel non-intrusive system failure prediction technique using available information from developers and minimal information from raw logs (rather than mining entire logs) but keeping the data entirely private with the data owners. A neural network based multi-class classifier is developed for failure prediction, using artificially generated anonymous data set, applying a combination of techniques, viz., genetic algorithm (steps), pattern repetition, etc., to train and test the network. The proposed mechanism completely decouples the data set used for training process from the actual data which is kept private. Moreover, multi-criteria decision making (MCDM) schemes are used to prioritize failures meeting business requirements. Results show high accuracy in failure prediction under different parameter configurations. On a broader context, any classification problem, beyond failure prediction, can be performed using the proposed mechanism with artificially generated data set without looking into the actual data as long as the input features can be translated to binary values (e.g. output from private binary classifiers) and can provide classification-as-a-service. △ Less

Submitted 6 September, 2022; originally announced September 2022.

arXiv:2207.10511 [pdf, other]

A cost effective eye movement tracker based wheel chair control algorithm for people with paraplegia

Authors: Skanda Upadhyaya, Shravan Bhat, Siddhanth P. Rao, V Ashwin, Krishnan Chemmangat

Abstract: Spinal cord injuries can often lead to quadriplegia in patients limiting their mobility. Wheelchairs could be a good proposition for patients, but most of them operate either manually or with the help of electric motors operated with a joystick. This, however, requires the use of hands, making it unsuitable for quadriplegic patients. Controlling eye movement, on the other hand, is retained even by… ▽ More Spinal cord injuries can often lead to quadriplegia in patients limiting their mobility. Wheelchairs could be a good proposition for patients, but most of them operate either manually or with the help of electric motors operated with a joystick. This, however, requires the use of hands, making it unsuitable for quadriplegic patients. Controlling eye movement, on the other hand, is retained even by people who undergo brain injury. Monitoring the movements in the eye can be a helpful tool in generating control signals for the wheelchair. This paper is an approach to converting obtained signals from the eye into meaningful signals by trying to control a bot that imitates a wheelchair. The overall system is cost-effective and uses simple image processing and pattern recognition to control the bot. An android application is developed, which could be used by the patients' aid for more refined control of the wheelchair in the actual scenario. △ Less

Submitted 21 July, 2022; originally announced July 2022.

Comments: 5 pages, 6 figures

ACM Class: I.4.8; I.2.9

arXiv:2207.03679 [pdf, other]

Getting BART to Ride the Idiomatic Train: Learning to Represent Idiomatic Expressions

Authors: Ziheng Zeng, Suma Bhat

Abstract: Idiomatic expressions (IEs), characterized by their non-compositionality, are an important part of natural language. They have been a classical challenge to NLP, including pre-trained language models that drive today's state-of-the-art. Prior work has identified deficiencies in their contextualized representation stemming from the underlying compositional paradigm of representation. In this work,… ▽ More Idiomatic expressions (IEs), characterized by their non-compositionality, are an important part of natural language. They have been a classical challenge to NLP, including pre-trained language models that drive today's state-of-the-art. Prior work has identified deficiencies in their contextualized representation stemming from the underlying compositional paradigm of representation. In this work, we take a first-principles approach to build idiomaticity into BART using an adapter as a lightweight non-compositional language expert trained on idiomatic sentences. The improved capability over baselines (e.g., BART) is seen via intrinsic and extrinsic methods, where idiom embeddings score 0.19 points higher in homogeneity score for embedding clustering, and up to 25% higher sequence accuracy on the idiom processing tasks of IE sense disambiguation and span detection. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: This paper is accepted by Transactions of the Association for Computational Linguistics (TACL)

arXiv:2207.01575 [pdf, other]

De-Biasing Generative Models using Counterfactual Methods

Authors: Sunay Bhat, Jeffrey Jiang, Omead Pooladzandi, Gregory Pottie

Abstract: Variational autoencoders (VAEs) and other generative methods have garnered growing interest not just for their generative properties but also for the ability to dis-entangle a low-dimensional latent variable space. However, few existing generative models take causality into account. We propose a new decoder based framework named the Causal Counterfactual Generative Model (CCGM), which includes a p… ▽ More Variational autoencoders (VAEs) and other generative methods have garnered growing interest not just for their generative properties but also for the ability to dis-entangle a low-dimensional latent variable space. However, few existing generative models take causality into account. We propose a new decoder based framework named the Causal Counterfactual Generative Model (CCGM), which includes a partially trainable causal layer in which a part of a causal model can be learned without significantly impacting reconstruction fidelity. By learning the causal relationships between image semantic labels or tabular variables, we can analyze biases, intervene on the generative model, and simulate new scenarios. Furthermore, by modifying the causal structure, we can generate samples outside the domain of the original training data and use such counterfactual models to de-bias datasets. Thus, datasets with known biases can still be used to train the causal generative model and learn the causal relationships, but we can produce de-biased datasets on the generative side. Our proposed method combines a causal latent space VAE model with specific modification to emphasize causal fidelity, enabling finer control over the causal layer and the ability to learn a robust intervention framework. We explore how better disentanglement of causal learning and encoding/decoding generates higher causal intervention quality. We also compare our model against similar research to demonstrate the need for explicit generative de-biasing beyond interventions. Our initial experiments show that our model can generate images and tabular data with high fidelity to the causal framework and accommodate explicit de-biasing to ignore undesired relationships in the causal data compared to the baseline. △ Less

Submitted 10 February, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: Submitted to: Information Theory and Applications Workshop

arXiv:2206.01645 [pdf, other]

Clustering Trust Dynamics in a Human-Robot Sequential Decision-Making Task

Authors: Shreyas Bhat, Joseph B. Lyons, Cong Shi, X. Jessie Yang

Abstract: In this paper, we present a framework for trust-aware sequential decision-making in a human-robot team. We model the problem as a finite-horizon Markov Decision Process with a reward-based performance metric, allowing the robotic agent to make trust-aware recommendations. Results of a human-subject experiment show that the proposed trust update model is able to accurately capture the human agent's… ▽ More In this paper, we present a framework for trust-aware sequential decision-making in a human-robot team. We model the problem as a finite-horizon Markov Decision Process with a reward-based performance metric, allowing the robotic agent to make trust-aware recommendations. Results of a human-subject experiment show that the proposed trust update model is able to accurately capture the human agent's moment-to-moment trust changes. Moreover, we cluster the participants' trust dynamics into three categories, namely, Bayesian decision makers, oscillators, and disbelievers, and identify personal characteristics that could be used to predict which type of trust dynamics a person will belong to. We find that the disbelievers are less extroverted, less agreeable, and have lower expectations toward the robotic agent, compared to the Bayesian decision makers and oscillators. The oscillators are significantly more frustrated than the Bayesian decision makers. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: 4 pages, 4 figures

arXiv:2203.15132 [pdf, other]

LocalBins: Improving Depth Estimation by Learning Local Distributions

Authors: Shariq Farooq Bhat, Ibraheem Alhashim, Peter Wonka

Abstract: We propose a novel architecture for depth estimation from a single image. The architecture itself is based on the popular encoder-decoder architecture that is frequently used as a starting point for all dense regression tasks. We build on AdaBins which estimates a global distribution of depth values for the input image and evolve the architecture in two ways. First, instead of predicting global de… ▽ More We propose a novel architecture for depth estimation from a single image. The architecture itself is based on the popular encoder-decoder architecture that is frequently used as a starting point for all dense regression tasks. We build on AdaBins which estimates a global distribution of depth values for the input image and evolve the architecture in two ways. First, instead of predicting global depth distributions, we predict depth distributions of local neighborhoods at every pixel. Second, instead of predicting depth distributions only towards the end of the decoder, we involve all layers of the decoder. We call this new architecture LocalBins. Our results demonstrate a clear improvement over the state-of-the-art in all metrics on the NYU-Depth V2 dataset. Code and pretrained models will be made publicly available. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: 19 pages

arXiv:2202.12578 [pdf, other]

Learning to Liquidate Forex: Optimal Stopping via Adaptive Top-K Regression

Authors: Diksha Garg, Pankaj Malhotra, Anil Bhatia, Sanjay Bhat, Lovekesh Vig, Gautam Shroff

Abstract: We consider learning a trading agent acting on behalf of the treasury of a firm earning revenue in a foreign currency (FC) and incurring expenses in the home currency (HC). The goal of the agent is to maximize the expected HC at the end of the trading episode by deciding to hold or sell the FC at each time step in the trading episode. We pose this as an optimization problem, and consider a broad s… ▽ More We consider learning a trading agent acting on behalf of the treasury of a firm earning revenue in a foreign currency (FC) and incurring expenses in the home currency (HC). The goal of the agent is to maximize the expected HC at the end of the trading episode by deciding to hold or sell the FC at each time step in the trading episode. We pose this as an optimization problem, and consider a broad spectrum of approaches with the learning component ranging from supervised to imitation to reinforcement learning. We observe that most of the approaches considered struggle to improve upon simple heuristic baselines. We identify two key aspects of the problem that render standard solutions ineffective - i) while good forecasts of future FX rates can be highly effective in guiding good decisions, forecasting FX rates is difficult, and erroneous estimates tend to degrade the performance of trading agents instead of improving it, ii) the inherent non-stationary nature of FX rates renders a fixed decision-threshold highly ineffective. To address these problems, we propose a novel supervised learning approach that learns to forecast the top-K future FX rates instead of forecasting all the future FX rates, and bases the hold-versus-sell decision on the forecasts (e.g. hold if future FX rate is higher than current FX rate, sell otherwise). Furthermore, to handle the non-stationarity in the FX rates data which poses challenges to the i.i.d. assumption in supervised learning methods, we propose to adaptively learn decision-thresholds based on recent historical episodes. Through extensive empirical evaluation, we show that our approach is the only approach which is able to consistently improve upon a simple heuristic baseline. Further experiments show the inefficacy of state-of-the-art statistical and deep-learning-based forecasting methods as they degrade the performance of the trading agent. △ Less

Submitted 25 February, 2022; originally announced February 2022.

Comments: Published at Workshop on AI in Financial Services: Adaptiveness, Resilience & Governance, AAAI-22

arXiv:2202.05517 [pdf, other]

Electricity Consumption Forecasting for Out-of-distribution Time-of-Use Tariffs

Authors: Jyoti Narwariya, Chetan Verma, Pankaj Malhotra, Lovekesh Vig, Easwara Subramanian, Sanjay Bhat

Abstract: In electricity markets, retailers or brokers want to maximize profits by allocating tariff profiles to end consumers. One of the objectives of such demand response management is to incentivize the consumers to adjust their consumption so that the overall electricity procurement in the wholesale markets is minimized, e.g. it is desirable that consumers consume less during peak hours when cost of pr… ▽ More In electricity markets, retailers or brokers want to maximize profits by allocating tariff profiles to end consumers. One of the objectives of such demand response management is to incentivize the consumers to adjust their consumption so that the overall electricity procurement in the wholesale markets is minimized, e.g. it is desirable that consumers consume less during peak hours when cost of procurement for brokers from wholesale markets are high. We consider a greedy solution to maximize the overall profit for brokers by optimal tariff profile allocation. This in-turn requires forecasting electricity consumption for each user for all tariff profiles. This forecasting problem is challenging compared to standard forecasting problems due to following reasons: i. the number of possible combinations of hourly tariffs is high and retailers may not have considered all combinations in the past resulting in a biased set of tariff profiles tried in the past, ii. the profiles allocated in the past to each user is typically based on certain policy. These reasons violate the standard i.i.d. assumptions, as there is a need to evaluate new tariff profiles on existing customers and historical data is biased by the policies used in the past for tariff allocation. In this work, we consider several scenarios for forecasting and optimization under these conditions. We leverage the underlying structure of how consumers respond to variable tariff rates by comparing tariffs across hours and shifting loads, and propose suitable inductive biases in the design of deep neural network based architectures for forecasting under such scenarios. More specifically, we leverage attention mechanisms and permutation equivariant networks that allow desirable processing of tariff profiles to learn tariff representations that are insensitive to the biases in the data and still representative of the task. △ Less

Submitted 11 February, 2022; originally announced February 2022.

Comments: Accepted paper at AAAI workshop AIBSD 2022

arXiv:2201.10127 [pdf, other]

Multi-unit Double Auctions: Equilibrium Analysis and Bidding Strategy using DDPG in Smart-grids

Authors: Sanjay Chandlekar, Easwar Subramanian, Sanjay Bhat, Praveen Paruchuri, Sujit Gujar

Abstract: Periodic double auctions (PDA) have applications in many areas such as in e-commerce, intra-day equity markets, and day-ahead energy markets in smart-grids. While the trades accomplished using PDAs are worth trillions of dollars, finding a reliable bidding strategy in such auctions is still a challenge as it requires the consideration of future auctions. A participating buyer in a PDA has to desig… ▽ More Periodic double auctions (PDA) have applications in many areas such as in e-commerce, intra-day equity markets, and day-ahead energy markets in smart-grids. While the trades accomplished using PDAs are worth trillions of dollars, finding a reliable bidding strategy in such auctions is still a challenge as it requires the consideration of future auctions. A participating buyer in a PDA has to design its bidding strategy by planning for current and future auctions. Many equilibrium-based bidding strategies proposed are complex to use in real-time. In the current exposition, we propose a scale-based bidding strategy for buyers participating in PDA. We first present an equilibrium analysis for single-buyer single-seller multi-unit single-shot k-Double auctions. Specifically, we analyze the situation when a seller and a buyer trade two identical units of quantity in a double auction where both the buyer and the seller deploy a simple, scale-based bidding strategy. The equilibrium analysis becomes intractable as the number of participants increases. To be useful in more complex settings such as wholesale markets in smart-grids, we model equilibrium bidding strategy as a learning problem. We develop a deep deterministic policy gradient (DDPG) based learning strategy, DDPGBBS, for a participating agent in PDAs to suggest an action at any auction instance. DDPGBBS, which empirically follows the obtained theoretical equilibrium, is easily extendable when the number of buyers/sellers increases. We take Power Trading Agent Competition's (PowerTAC) wholesale market PDA as a testbed to evaluate our novel bidding strategy. We benchmark our DDPG based strategy against several baselines and state-of-the-art bidding strategies of the PowerTAC wholesale market PDA and demonstrate the efficacy of DDPGBBS against several benchmarked strategies. △ Less

Submitted 22 February, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

Comments: Accepted for publication in the proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS-22)

arXiv:2201.07272 [pdf, other]

Lambda the Ultimate SSA: Optimizing Functional Programs in SSA

Authors: Siddharth Bhat, Tobias Grosser

Abstract: Static Single Assignment (SSA) is the workhorse of modern optimizing compilers for imperative programming languages. However, functional languages have been slow to adopt SSA and prefer to use intermediate representations based on minimal lambda calculi due to SSA's inability to express higher order constructs. We exploit a new SSA construct -- regions -- in order to express functional optimizatio… ▽ More Static Single Assignment (SSA) is the workhorse of modern optimizing compilers for imperative programming languages. However, functional languages have been slow to adopt SSA and prefer to use intermediate representations based on minimal lambda calculi due to SSA's inability to express higher order constructs. We exploit a new SSA construct -- regions -- in order to express functional optimizations via classical SSA based reasoning. Region optimization currently relies on ad-hoc analyses and transformations on imperative programs. These ad-hoc transformations are sufficient for imperative languages as regions are used in a limited fashion. In contrast, we use regions pervasively to model sub-expressions in our functional IR. This motivates us to systematize region optimizations. We extend classical SSA reasoning to regions for functional-style analyses and transformations. We implement a new SSA+regions based backend for LEAN4, a theorem prover that implements a purely functional, dependently typed programming language. Our backend is feature-complete and handles all constructs of LEAN4's functional intermediate representation λrc within the SSA framework. We evaluate our proposed region optimizations by optimizing λrc within an SSA+regions based framework implemented in MLIR and demonstrating performance parity with the current LEAN4 backend. We believe our work will pave the way for a unified optimization framework capable of representing, analyzing, and optimizing both functional and imperative languages. △ Less

Submitted 18 January, 2022; originally announced January 2022.

ACM Class: D.3

Showing 1–50 of 120 results for author: Bhat, S