Skip to main content

Showing 1–50 of 643 results for author: Singh, K

  1. arXiv:2407.13522  [pdf, other

    cs.LG

    INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages

    Authors: Abhishek Kumar Singh, Rudra Murthy, Vishwajeet kumar, Jaydeep Sen, Ganesh Ramakrishnan

    Abstract: Large Language Models (LLMs) have demonstrated remarkable zero-shot and few-shot capabilities in unseen tasks, including context-grounded question answering (QA) in English. However, the evaluation of LLMs' capabilities in non-English languages for context-based QA is limited by the scarcity of benchmarks in non-English languages. To address this gap, we introduce Indic-QA, the largest publicly av… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.11306  [pdf, other

    cs.CV

    PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer

    Authors: Pierre-David Letourneau, Manish Kumar Singh, Hsin-Pai Cheng, Shizhong Han, Yunxiao Shi, Dalton Jones, Matthew Harper Langston, Hong Cai, Fatih Porikli

    Abstract: We present Polynomial Attention Drop-in Replacement (PADRe), a novel and unifying framework designed to replace the conventional self-attention mechanism in transformer models. Notably, several recent alternative attention mechanisms, including Hyena, Mamba, SimA, Conv2Former, and Castling-ViT, can be viewed as specific instances of our PADRe framework. PADRe leverages polynomial functions and dra… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  3. arXiv:2407.09481  [pdf

    cs.CY cs.HC

    ChatGPT and Vaccine Hesitancy: A Comparison of English, Spanish, and French Responses Using a Validated Scale

    Authors: Saubhagya Joshi, Eunbin Ha, Yonaira Rivera, Vivek K. Singh

    Abstract: ChatGPT is a popular information system (over 1 billion visits in August 2023) that can generate natural language responses to user queries. It is important to study the quality and equity of its responses on health-related topics, such as vaccination, as they may influence public health decision-making. We use the Vaccine Hesitancy Scale (VHS) proposed by Shapiro et al.1 to measure the hesitancy… ▽ More

    Submitted 6 May, 2024; originally announced July 2024.

    Comments: 11 pages. Appeared in the Proceedings of the AMIA Informatics Summit, 2024

  4. arXiv:2407.08328  [pdf, other

    cs.AI cs.LG

    Unveiling Disparities in Maternity Care: A Topic Modelling Approach to Analysing Maternity Incident Investigation Reports

    Authors: Georgina Cosma, Mohit Kumar Singh, Patrick Waterson, Gyuchan Thomas Jun, Jonathan Back

    Abstract: This study applies Natural Language Processing techniques, including Latent Dirichlet Allocation, to analyse anonymised maternity incident investigation reports from the Healthcare Safety Investigation Branch. The reports underwent preprocessing, annotation using the Safety Intelligence Research taxonomy, and topic modelling to uncover prevalent topics and detect differences in maternity care acro… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  5. arXiv:2407.08322  [pdf, other

    cs.AI

    Intelligent Multi-Document Summarisation for Extracting Insights on Racial Inequalities from Maternity Incident Investigation Reports

    Authors: Georgina Cosma, Mohit Kumar Singh, Patrick Waterson, Gyuchan Thomas Jun, Jonathan Back

    Abstract: In healthcare, thousands of safety incidents occur every year, but learning from these incidents is not effectively aggregated. Analysing incident reports using AI could uncover critical insights to prevent harm by identifying recurring patterns and contributing factors. To aggregate and extract valuable information, natural language processing (NLP) and machine learning techniques can be employed… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  6. arXiv:2407.04398  [pdf, other

    cs.NI

    CBL: Compact Encoding of JSON-LD Data using CBOR and Bitmaps for Web of Things

    Authors: Prudhvi Gudla, Kamal Singh

    Abstract: The concept of Web of Things (WoT) merges web technologies with knowledge graphs in the context of Internet of Things. Given its widespread adoption in representing and exchanging structured data online, JSON-LD could be an effective format for WoT. Nevertheless, its verbose nature may present challenges for resource-constrained IoT devices with limited bandwidth and memory capacities. In this p… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  7. arXiv:2407.03454  [pdf, other

    cs.NE math.OC

    Decomposition of Difficulties in Complex Optimization Problems Using a Bilevel Approach

    Authors: Ankur Sinha, Dhaval Pujara, Hemant Kumar Singh

    Abstract: Practical optimization problems may contain different kinds of difficulties that are often not tractable if one relies on a particular optimization method. Different optimization approaches offer different strengths that are good at tackling one or more difficulty in an optimization problem. For instance, evolutionary algorithms have a niche in handling complexities like discontinuity, non-differe… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 9 pages

    MSC Class: 90C30 ACM Class: G.0

  8. arXiv:2407.02968  [pdf, other

    cs.CV cs.AI cs.CC cs.ET

    Unified Anomaly Detection methods on Edge Device using Knowledge Distillation and Quantization

    Authors: Sushovan Jena, Arya Pulkit, Kajal Singh, Anoushka Banerjee, Sharad Joshi, Ananth Ganesh, Dinesh Singh, Arnav Bhavsar

    Abstract: With the rapid advances in deep learning and smart manufacturing in Industry 4.0, there is an imperative for high-throughput, high-performance, and fully integrated visual inspection systems. Most anomaly detection approaches using defect detection datasets, such as MVTec AD, employ one-class models that require fitting separate models for each class. On the contrary, unified models eliminate the… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 20 pages

    MSC Class: 68T07 ACM Class: I.2.10

  9. arXiv:2407.00434  [pdf, other

    cs.CL

    Brevity is the soul of wit: Pruning long files for code generation

    Authors: Aaditya K. Singh, Yu Yang, Kushal Tirumala, Mostafa Elhoushi, Ari S. Morcos

    Abstract: Data curation is commonly considered a "secret-sauce" for LLM training, with higher quality data usually leading to better LLM performance. Given the scale of internet-scraped corpora, data pruning has become a larger and larger focus. Specifically, many have shown that de-duplicating data, or sub-selecting higher quality data, can lead to efficiency or performance improvements. Generally, three t… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 15 pages, 5 figures

  10. arXiv:2406.17720  [pdf, other

    cs.CV

    Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity

    Authors: Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab, Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh, Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian

    Abstract: We introduce Arboretum, the largest publicly accessible dataset designed to advance AI for biodiversity applications. This dataset, curated from the iNaturalist community science platform and vetted by domain experts to ensure accuracy, includes 134.6 million images, surpassing existing datasets in scale by an order of magnitude. The dataset encompasses image-language paired data for a diverse set… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Preprint under review

  11. arXiv:2406.17339  [pdf, other

    cs.IT eess.SP

    Optimizing Configuration Selection in Reconfigurable-Antenna MIMO Systems: Physics-Inspired Heuristic Solvers

    Authors: I. Krikidis, C. Psomas, A. K. Singh, K. Jamieson

    Abstract: Reconfigurable antenna multiple-input multiple-output (MIMO) is a foundational technology for the continuing evolution of cellular systems, including upcoming 6G communication systems. In this paper, we address the problem of flexible/reconfigurable antenna configuration selection for point-to-point MIMO antenna systems by using physics-inspired heuristics. Firstly, we optimize the antenna configu… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.12571

    Journal ref: IEEE Transactions on Communications, 2004

  12. arXiv:2406.16176  [pdf, other

    cs.AI cs.CL cs.LG

    GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets

    Authors: Qiming Wu, Zichen Chen, Will Corcoran, Misha Sra, Ambuj K. Singh

    Abstract: Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to reason about graph-structured data. To address this gap, we introduce GraphEval2000, the first comprehensive graph dataset, comprising 40 graph da… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Submitted to NeurIPs 2024 Dataset and Benchmark track, under review

    MSC Class: H.2.8; I.2.6; I.2.7

  13. arXiv:2406.15586  [pdf, other

    cs.CL

    TinyStyler: Efficient Few-Shot Text Style Transfer with Authorship Embeddings

    Authors: Zachary Horvitz, Ajay Patel, Kanishk Singh, Chris Callison-Burch, Kathleen McKeown, Zhou Yu

    Abstract: The goal of text style transfer is to transform the style of texts while preserving their original meaning, often with only a few examples of the target style. Existing style transfer methods generally rely on the few-shot capabilities of large language models or on complex controllable text generation approaches that are inefficient and underperform on fluency metrics. We introduce TinyStyler, a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  14. arXiv:2406.14639  [pdf, other

    cs.RO

    Differentiable-Optimization Based Neural Policy for Occlusion-Aware Target Tracking

    Authors: Houman Masnavi, Arun Kumar Singh, Farrokh Janabi-Sharifi

    Abstract: Tracking a target in cluttered and dynamic environments is challenging but forms a core component in applications like aerial cinematography. The obstacles in the environment not only pose collision risk but can also occlude the target from the field-of-view of the robot. Moreover, the target future trajectory may be unknown and only its current state can be estimated. In this paper, we propose a… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  15. arXiv:2406.13081  [pdf, other

    cs.CV

    Class-specific Data Augmentation for Plant Stress Classification

    Authors: Nasla Saleem, Aditya Balu, Talukder Zaki Jubery, Arti Singh, Asheesh K. Singh, Soumik Sarkar, Baskar Ganapathysubramanian

    Abstract: Data augmentation is a powerful tool for improving deep learning-based image classifiers for plant stress identification and classification. However, selecting an effective set of augmentations from a large pool of candidates remains a key challenge, particularly in imbalanced and confounding datasets. We propose an approach for automated class-specific data augmentation using a genetic algorithm.… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  16. arXiv:2406.10229  [pdf, other

    cs.LG cs.AI

    Quantifying Variance in Evaluation Benchmarks

    Authors: Lovish Madaan, Aaditya K. Singh, Rylan Schaeffer, Andrew Poulton, Sanmi Koyejo, Pontus Stenetorp, Sharan Narang, Dieuwke Hupkes

    Abstract: Evaluation benchmarks are the cornerstone of measuring capabilities of large language models (LLMs), as well as driving progress in said capabilities. Originally designed to make claims about capabilities (or lack thereof) in fully pretrained models, evaluation benchmarks are now also extensively used to decide between various training choices. Despite this widespread usage, we rarely quantify the… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  17. arXiv:2406.08913  [pdf, other

    math.CO cs.CG math.MG

    Maximizing the Maximum Degree in Ordered Yao Graphs

    Authors: Péter Ágoston, Adrian Dumitrescu, Arsenii Sagdeev, Karamjeet Singh, Ji Zeng

    Abstract: For an ordered point set in a Euclidean space or, more generally, in an abstract metric space, the ordered Yao graph is obtained by connecting each of the points to its closest predecessor by a directed edge. We show that for every set of $n$ points in $\mathbb{R}^d$, there exists an order such that the corresponding ordered Yao graph has maximum degree at least $\log{n}/(4d)$. Apart from the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 9 pages, 1 figure

    MSC Class: 05C07; 05D10; 52C10

  18. arXiv:2406.08816  [pdf, other

    cs.CV

    ToSA: Token Selective Attention for Efficient Vision Transformers

    Authors: Manish Kumar Singh, Rajeev Yasarla, Hong Cai, Mingu Lee, Fatih Porikli

    Abstract: In this paper, we propose a novel token selective attention approach, ToSA, which can identify tokens that need to be attended as well as those that can skip a transformer layer. More specifically, a token selector parses the current attention maps and predicts the attention maps for the next layer, which are then used to select the important tokens that should participate in the attention operati… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted at CVPRW 2024

  19. arXiv:2406.05505  [pdf, other

    cs.IR cs.AI

    I-SIRch: AI-Powered Concept Annotation Tool For Equitable Extraction And Analysis Of Safety Insights From Maternity Investigations

    Authors: Mohit Kumar Singh, Georgina Cosma, Patrick Waterson, Jonathan Back, Gyuchan Thomas Jun

    Abstract: Maternity care is a complex system involving treatments and interactions between patients, providers, and the care environment. To improve patient safety and outcomes, understanding the human factors (e.g. individuals decisions, local facilities) influencing healthcare delivery is crucial. However, most current tools for analysing healthcare data focus only on biomedical concepts (e.g. health cond… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  20. arXiv:2406.03822  [pdf, other

    cs.SD cs.CR eess.AS

    SilentCipher: Deep Audio Watermarking

    Authors: Mayank Kumar Singh, Naoya Takahashi, Weihsiang Liao, Yuki Mitsufuji

    Abstract: In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional methods, the encoded messages introduce audible artefacts that restricts their usage in professional settings. In this study… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  21. arXiv:2405.20469  [pdf, other

    cs.CV

    Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images

    Authors: Krishnakant Singh, Thanush Navaratnam, Jannik Holmer, Simone Schaub-Meyer, Stefan Roth

    Abstract: A long-standing challenge in developing machine learning approaches has been the lack of high-quality labeled data. Recently, models trained with purely synthetic data, here termed synthetic clones, generated using large-scale pre-trained diffusion models have shown promising results in overcoming this annotation bottleneck. As these synthetic clone models progress, they are likely to be deployed… ▽ More

    Submitted 30 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPR 2024 Workshop: SyntaGen-Harnessing Generative Models for Synthetic Visual Datasets. Project page at https://synbenchmark.github.io/SynCloneBenchmark Comments: Fix typo in Fig. 1

  22. arXiv:2405.15766  [pdf, other

    cs.AI cs.CL cs.CV

    Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development

    Authors: Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Aman Chadha, Samrat Mondal

    Abstract: The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety by identifying potential risks associated with medications, facilitating early detection of adverse events, and guiding regulatory decision-making. Traditional ADE detection methods are reliable but slow, not easily adaptable to large-scale operations, and offer limited information. With the exponent… ▽ More

    Submitted 26 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: ACL Findings 2024

  23. arXiv:2405.13546  [pdf, other

    cs.CL cs.IR

    Knowledge-Driven Cross-Document Relation Extraction

    Authors: Monika Jain, Raghava Mutharaju, Kuldeep Singh, Ramakanth Kavuluru

    Abstract: Relation extraction (RE) is a well-known NLP application often treated as a sentence- or document-level task. However, a handful of recent efforts explore it across documents or in the cross-document setting (CrossDocRE). This is distinct from the single document case because different documents often focus on disparate themes, while text within a document tends to have a single goal. Linking find… ▽ More

    Submitted 18 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted in ACL 2024 Findings

  24. arXiv:2405.11487  [pdf, other

    cs.CV

    "Previously on ..." From Recaps to Story Summarization

    Authors: Aditya Kumar Singh, Dhruv Srivastava, Makarand Tapaswi

    Abstract: We introduce multimodal story summarization by leveraging TV episode recaps - short video sequences interweaving key story moments from previous episodes to bring viewers up to speed. We propose PlotSnap, a dataset featuring two crime thriller TV shows with rich recaps and long episodes of 40 minutes. Story summarization labels are unlocked by matching recap shots to corresponding sub-stories in t… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: CVPR 2024; Project page: https://katha-ai.github.io/projects/recap-story-summ/

  25. arXiv:2405.11200  [pdf, other

    cs.CL

    LexGen: Domain-aware Multilingual Lexicon Generation

    Authors: Karthika NJ, Ayush Maheshwari, Atul Kumar Singh, Preethi Jyothi, Ganesh Ramakrishnan, Krishnakant Bhatt

    Abstract: Lexicon or dictionary generation across domains is of significant societal importance, as it can potentially enhance information accessibility for a diverse user base while preserving language identity. Prior work in the field primarily focuses on bilingual lexical induction, which deals with word alignments using mapping-based or corpora-based approaches. Though initiated by researchers, the rese… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  26. arXiv:2405.10206  [pdf, other

    cs.GT

    A Participatory Budgeting based Truthful Budget-Limited Incentive Mechanism for Time-Constrained Tasks in Crowdsensing Systems

    Authors: Chattu Bhargavi, Vikash Kumar Singh

    Abstract: Crowdsensing, also known as participatory sensing, is a method of data collection that involves gathering information from a large number of common people (or individuals), often using mobile devices or other personal technologies. This paper considers the set-up with multiple task requesters and several task executors in a strategic setting. Each task requester has multiple heterogeneous tasks an… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 17 pages, 25 figures

  27. arXiv:2405.03356  [pdf, other

    cs.NI eess.SP

    An Overview of Intelligent Meta-surfaces for 6G and Beyond: Opportunities, Trends, and Challenges

    Authors: Mayur Katwe, Aryan Kaushik, Lina Mohjazi, Mohammad Abualhayja'a, Davide Dardari, Keshav Singh, Muhammad Ali Imran, M. Majid Butt, Octavia A. Dobre

    Abstract: With the impending arrival of the sixth generation (6G) of wireless communication technology, the telecommunications landscape is poised for another revolutionary transformation. At the forefront of this evolution are intelligent meta-surfaces (IS), emerging as a disruptive physical layer technology with the potential to redefine the capabilities and performance metrics of future wireless networks… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  28. arXiv:2405.00080  [pdf, other

    cs.LG cs.IR cs.NI

    Recommenadation aided Caching using Combinatorial Multi-armed Bandits

    Authors: Pavamana K J, Chandramani Kishore Singh

    Abstract: We study content caching with recommendations in a wireless network where the users are connected through a base station equipped with a finite-capacity cache. We assume a fixed set of contents with unknown user preferences and content popularities. We can recommend a subset of the contents to the users which encourages the users to request these contents. Recommendation can thus be used to increa… ▽ More

    Submitted 3 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

  29. arXiv:2404.18591  [pdf, other

    cs.CV cs.AI

    FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

    Authors: Abhishek Kumar Singh, Ioannis Patras

    Abstract: The rapid evolution of the fashion industry increasingly intersects with technological advancements, particularly through the integration of generative AI. This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. Utilizing ControlNet and LoRA fine-tuning, our approach generates high-quality images from multimodal input… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 9 pages, 8 figures

  30. arXiv:2404.13252  [pdf, other

    cs.CV cs.LG eess.IV

    3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification

    Authors: Shyam Varahagiri, Aryaman Sinha, Shiv Ram Dubey, Satish Kumar Singh

    Abstract: In recent years, Vision Transformers (ViTs) have shown promising classification performance over Convolutional Neural Networks (CNNs) due to their self-attention mechanism. Many researchers have incorporated ViTs for Hyperspectral Image (HSI) classification. HSIs are characterised by narrow contiguous spectral bands, providing rich spectral data. Although ViTs excel with sequential data, they cann… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted in IEEE Conference on Artificial Intelligence, 2024

  31. arXiv:2404.07129  [pdf, other

    cs.LG

    What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

    Authors: Aaditya K. Singh, Ted Moskovitz, Felix Hill, Stephanie C. Y. Chan, Andrew M. Saxe

    Abstract: In-context learning is a powerful emergent ability in transformer models. Prior work in mechanistic interpretability has identified a circuit element that may be critical for in-context learning -- the induction head (IH), which performs a match-and-copy operation. During training of large transformers on natural language data, IHs emerge around the same time as a notable phase change in the loss.… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 26 pages, 18 figures

  32. TrajPRed: Trajectory Prediction with Region-based Relation Learning

    Authors: Chen Zhou, Ghassan AlRegib, Armin Parchami, Kunjan Singh

    Abstract: Forecasting human trajectories in traffic scenes is critical for safety within mixed or fully autonomous systems. Human future trajectories are driven by two major stimuli, social interactions, and stochastic goals. Thus, reliable forecasting needs to capture these two stimuli. Edge-based relation modeling represents social interactions using pairwise correlations from precise individual states. N… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  33. arXiv:2404.05631  [pdf, other

    cs.ET

    Multi Digit Ising Mapping for Low Precision Ising Solvers

    Authors: Abhishek Kumar Singh, Kyle Jamieson

    Abstract: The last couple of years have seen an ever-increasing interest in using different Ising solvers, like Quantum annealers, Coherent Ising machines, and Oscillator-based Ising machines, for solving tough computational problems in various domains. Although the simulations predict massive performance improvements for several tough computational problems, the real implementations of the Ising solvers te… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: version 1.0

  34. arXiv:2404.04377  [pdf, other

    cs.RO cs.CV

    LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping

    Authors: Kurran Singh, Tim Magoun, John J. Leonard

    Abstract: Enabling robots to understand the world in terms of objects is a critical building block towards higher level autonomy. The success of foundation models in vision has created the ability to segment and identify nearly all objects in the world. However, utilizing such objects to localize the robot and build an open-set semantic map of the world remains an open research question. In this work, a sys… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  35. arXiv:2404.03307  [pdf, other

    cs.RO eess.SY

    Bi-level Trajectory Optimization on Uneven Terrains with Differentiable Wheel-Terrain Interaction Model

    Authors: Amith Manoharan, Aditya Sharma, Himani Belsare, Kaustab Pal, K. Madhava Krishna, Arun Kumar Singh

    Abstract: Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-int… ▽ More

    Submitted 11 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: 8 pages, 7 figures, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  36. arXiv:2403.20116  [pdf, other

    cs.RO

    LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving

    Authors: Pranjal Paul, Anant Garg, Tushar Choudhary, Arun Kumar Singh, K. Madhava Krishna

    Abstract: Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, wh… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  37. arXiv:2403.19461  [pdf, other

    cs.RO

    Learning Sampling Distribution and Safety Filter for Autonomous Driving with VQ-VAE and Differentiable Optimization

    Authors: Simon Idoko, Basant Sharma, Arun Kumar Singh

    Abstract: Sampling trajectories from a distribution followed by ranking them based on a specified cost function is a common approach in autonomous driving. Typically, the sampling distribution is hand-crafted (e.g a Gaussian, or a grid). Recently, there have been efforts towards learning the sampling distribution through generative models such as Conditional Variational Autoencoder (CVAE). However, these ap… ▽ More

    Submitted 25 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  38. arXiv:2403.17223  [pdf

    cs.CV cs.AI cs.LG

    Co-Occurring of Object Detection and Identification towards unlabeled object discovery

    Authors: Binay Kumar Singh, Niels Da Vitoria Lobo

    Abstract: In this paper, we propose a novel deep learning based approach for identifying co-occurring objects in conjunction with base objects in multilabel object categories. Nowadays, with the advancement in computer vision based techniques we need to know about co-occurring objects with respect to base object for various purposes. The pipeline of the proposed work is composed of two stages: in the first… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 6 pages, 2 figures,

  39. arXiv:2403.16592  [pdf, other

    cs.CL

    TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques

    Authors: Ashok Urlana, Aditya Saibewar, Bala Mallikarjunarao Garlapati, Charaka Vinayak Kumar, Ajeet Kumar Singh, Srinivasa Rao Chalamala

    Abstract: The Large Language Models (LLMs) exhibit remarkable ability to generate fluent content across a wide spectrum of user queries. However, this capability has raised concerns regarding misinformation and personal information leakage. In this paper, we present our methods for the SemEval2024 Task8, aiming to detect machine-generated text across various domains in both mono-lingual and multi-lingual co… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 8 pages, 1 Figure

    ACM Class: I.2.7

  40. arXiv:2403.12953  [pdf, other

    cs.CV

    FutureDepth: Learning to Predict the Future Improves Video Depth Estimation

    Authors: Rajeev Yasarla, Manish Kumar Singh, Hong Cai, Yunxiao Shi, Jisoo Jeong, Yinhao Zhu, Shizhong Han, Risheek Garrepalli, Fatih Porikli

    Abstract: In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training. More specifically, we propose a future prediction network, F-Net, which takes the features of multiple consecutive frames and is trained to predict multi-frame fea… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  41. arXiv:2403.12837  [pdf, other

    cs.RO

    Opti-Acoustic Semantic SLAM with Unknown Objects in Underwater Environments

    Authors: Kurran Singh, Jungseok Hong, Nicholas R. Rypkema, John J. Leonard

    Abstract: Despite recent advances in semantic Simultaneous Localization and Mapping (SLAM) for terrestrial and aerial applications, underwater semantic SLAM remains an open and largely unaddressed research problem due to the unique sensing modalities and the object classes found underwater. This paper presents an object-based semantic SLAM method for underwater environments that can identify, localize, clas… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  42. arXiv:2403.12571  [pdf, other

    cs.IT eess.SP

    Optimizing Reconfigurable Antenna MIMO Systems with Coherent Ising Machines

    Authors: Ioannis Krikidis, Abhishek Kumar Singh, Kyle Jamieson

    Abstract: Reconfigurable antenna multiple-input multiple-output (MIMO) is a promising technology for upcoming 6G communication systems. In this paper, we deal with the problem of configuration selection for reconfigurable antenna MIMO by leveraging Coherent Ising Machines (CIMs). By adopting the CIM as a heuristic solver for the Ising problem, the optimal antenna configuration that maximizes the received si… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Journal ref: IEEE International Conference on Communications (ICC), June 2024

  43. A Hybrid Transformer-Sequencer approach for Age and Gender classification from in-wild facial images

    Authors: Aakash Singh, Vivek Kumar Singh

    Abstract: The advancements in computer vision and image processing techniques have led to emergence of new application in the domain of visual surveillance, targeted advertisement, content-based searching, and human-computer interaction etc. Out of the various techniques in computer vision, face analysis, in particular, has gained much attention. Several previous studies have tried to explore different appl… ▽ More

    Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 22 pages

    Journal ref: Neural Computing and Applications. 2024 Jan;36(3):1149-65

  44. arXiv:2403.12202  [pdf, other

    cs.CV

    DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions

    Authors: Yunxiao Shi, Manish Kumar Singh, Hong Cai, Fatih Porikli

    Abstract: In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024

  45. arXiv:2403.11228  [pdf

    cs.NI

    Routing Algorithms

    Authors: Ujjwal Sinha, Vikas Kumar, Shubham Kumar Singh

    Abstract: Routing algorithms play a crucial role in the efficient transmission of data within computer networks by determining the optimal paths for packet forwarding. This paper presents a comprehensive exploration of routing algorithms, focusing on their fundamental principles, classification, challenges, recent advancements, and practical applications. Beginning with an overview of the significance of ro… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  46. arXiv:2403.09611  [pdf, other

    cs.CV cs.CL cs.LG

    MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

    Authors: Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman , et al. (7 additional authors not shown)

    Abstract: In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  47. arXiv:2403.08901  [pdf, other

    cs.CE cs.LG

    A Framework for Strategic Discovery of Credible Neural Network Surrogate Models under Uncertainty

    Authors: Pratyush Kumar Singh, Kathryn A. Farrell-Maupin, Danial Faghihi

    Abstract: The widespread integration of deep neural networks in developing data-driven surrogate models for high-fidelity simulations of complex physical systems highlights the critical necessity for robust uncertainty quantification techniques and credibility assessment methodologies, ensuring the reliable deployment of surrogate models in consequential decision-making. This study presents the Occam Plausi… ▽ More

    Submitted 13 May, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  48. arXiv:2403.04379  [pdf, other

    cs.NI

    Performance evaluation of conditional handover in 5G systems under fading scenario

    Authors: Souvik Deb, Megh Rathod, Rishi Balamurugan, Shankar K. Ghosh, Rajeev K. Singh, Samriddha Sanyal

    Abstract: To enhance the handover performance in fifth generation (5G) cellular systems, conditional handover (CHO) has been evolved as a promising solution. Unlike A3 based handover where handover execution is certain after receiving handover command from the serving access network, in CHO, handover execution is conditional on the RSRP measurements from both current and target access networks, as well as o… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  49. arXiv:2402.18778  [pdf, other

    cs.NI quant-ph

    X-ResQ: Reverse Annealing for Quantum MIMO Detection with Flexible Parallelism

    Authors: Minsung Kim, Abhishek Kumar Singh, Davide Venturelli, John Kaewell, Kyle Jamieson

    Abstract: Quantum Annealing (QA)-accelerated MIMO detection is an emerging research approach in the context of NextG wireless networks. The opportunity is to enable large MIMO systems and thus improve wireless performance. The approach aims to leverage QA to expedite the computation required for theoretically optimal but computationally-demanding Maximum Likelihood detection to overcome the limitations of t… ▽ More

    Submitted 9 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 22 pages

  50. arXiv:2402.18751  [pdf, other

    cs.LG cs.CV

    Multi-Sensor and Multi-temporal High-Throughput Phenotyping for Monitoring and Early Detection of Water-Limiting Stress in Soybean

    Authors: Sarah E. Jones, Timilehin Ayanlade, Benjamin Fallen, Talukder Z. Jubery, Arti Singh, Baskar Ganapathysubramanian, Soumik Sarkar, Asheesh K. Singh

    Abstract: Soybean production is susceptible to biotic and abiotic stresses, exacerbated by extreme weather events. Water limiting stress, i.e. drought, emerges as a significant risk for soybean production, underscoring the need for advancements in stress monitoring for crop breeding and production. This project combines multi-modal information to identify the most effective and efficient automated methods t… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 25 pages, 5 figures