Skip to main content

Showing 1–32 of 32 results for author: Khare, A

  1. arXiv:2407.06167  [pdf, other

    cs.CV cs.LG

    DεpS: Delayed ε-Shrinking for Faster Once-For-All Training

    Authors: Aditya Annavajjala, Alind Khare, Animesh Agrawal, Igor Fedorov, Hugo Latapie, Myungjin Lee, Alexey Tumanov

    Abstract: CNNs are increasingly deployed across different hardware, dynamic environments, and low-power embedded devices. This has led to the design and training of CNN architectures with the goal of maximizing accuracy subject to such variable deployment constraints. As the number of deployment scenarios grows, there is a need to find scalable solutions to design and train specialized CNNs. Once-for-all tr… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to the 18th European Conference on Computer Vision (ECCV 2024)

  2. arXiv:2407.00075  [pdf, other

    cs.AI cs.CL cs.CR cs.LG

    Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference

    Authors: Anton Xue, Avishree Khare, Rajeev Alur, Surbhi Goel, Eric Wong

    Abstract: We study how to subvert language models from following the rules. We model rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form "if $P$ and $Q$, then $R$" for some propositions $P$, $Q$, and $R$. We prove that although transformers can faithfully abide by such rules, maliciously crafted prompts can nevertheless mislead even theoretically const… ▽ More

    Submitted 21 June, 2024; originally announced July 2024.

  3. arXiv:2403.19822  [pdf, other

    cs.CL cs.AI

    Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

    Authors: Yash Jain, David Chan, Pranav Dheram, Aparna Khare, Olabanji Shonibare, Venkatesh Ravichandran, Shalini Ghosh

    Abstract: Recent advances in machine learning have demonstrated that multi-modal pre-training can improve automatic speech recognition (ASR) performance compared to randomly initialized models, even when models are fine-tuned on uni-modal tasks. Existing multi-modal pre-training methods for the ASR task have primarily focused on single-stage pre-training where a single unsupervised task is used for pre-trai… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted in LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation

  4. arXiv:2401.14717  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

    Authors: Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

    Abstract: We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: To appear in IEEE ICASSP 2024

  5. Two-pass Endpoint Detection for Speech Recognition

    Authors: Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow

    Abstract: Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands. The endpoint detector has to trade-off between accuracy and latency, since waiting longer reduces the cases of users being cut-off early. We propose a novel two-pass solution for endpointing, where the utterance endpoint detected from a first pass endpointer is verified b… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: ASRU 2023

  6. arXiv:2312.16733  [pdf, other

    cs.DC cs.LG

    SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads

    Authors: Alind Khare, Dhruv Garg, Sukrit Kalra, Snigdha Grandhi, Ion Stoica, Alexey Tumanov

    Abstract: The increasing deployment of ML models on the critical path of production applications in both datacenter and the edge requires ML inference serving systems to serve these models under unpredictable and bursty request arrival rates. Serving models under such conditions requires these systems to strike a careful balance between the latency and accuracy requirements of the application and the overal… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  7. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  8. arXiv:2311.16169  [pdf, other

    cs.CR cs.PL cs.SE

    Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

    Authors: Avishree Khare, Saikat Dutta, Ziyang Li, Alaia Solko-Breslin, Rajeev Alur, Mayur Naik

    Abstract: Security vulnerabilities in modern software are prevalent and harmful. While automated vulnerability detection tools have made promising progress, their scalability and applicability remain challenging. Recently, Large Language Models (LLMs), such as GPT-4 and CodeLlama, have demonstrated remarkable performance on code-related tasks. However, it is unknown whether such LLMs can do complex reasonin… ▽ More

    Submitted 9 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  9. arXiv:2310.15938  [pdf, other

    cs.LG

    ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation

    Authors: Anshul Ahluwalia, Rohit Das, Payman Behnam, Alind Khare, Pan Li, Alexey Tumanov

    Abstract: Graph Neural Networks (GNNs) have proven to be quite versatile for a variety of applications, including recommendation systems, fake news detection, drug discovery, and even computer vision. Due to the expanding size of graph-structured data, GNN models have also increased in complexity, leading to substantial latency issues. This is primarily attributed to the irregular structure of graph data an… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  10. arXiv:2307.10577  [pdf, other

    cs.CV cs.AI

    Ethosight: A Reasoning-Guided Iterative Learning System for Nuanced Perception based on Joint-Embedding & Contextual Label Affinity

    Authors: Hugo Latapie, Shan Yu, Patrick Hammer, Kristinn R. Thorisson, Vahagn Petrosyan, Brandon Kynoch, Alind Khare, Payman Behnam, Alexey Tumanov, Aksheit Saxena, Anish Aralikatti, Hanning Chen, Mohsen Imani, Mike Archbold, Tangrui Li, Pei Wang, Justin Hart

    Abstract: Traditional computer vision models often necessitate extensive data acquisition, annotation, and validation. These models frequently struggle in real-world applications, resulting in high false positive and negative rates, and exhibit poor adaptability to new scenarios, often requiring costly retraining. To address these issues, we present Ethosight, a flexible and adaptable zero-shot video analyt… ▽ More

    Submitted 20 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

  11. arXiv:2306.17266  [pdf, other

    cs.DC cs.LG

    Subgraph Stationary Hardware-Software Inference Co-Design

    Authors: Payman Behnam, Jianming Tong, Alind Khare, Yangyu Chen, Yue Pan, Pranav Gadikar, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexey Tumanov

    Abstract: A growing number of applications depend on Machine Learning (ML) functionality and benefits from both higher quality ML predictions and better timeliness (latency) at the same time. A growing body of research in computer architecture, ML, and systems software literature focuses on reaching better latency-accuracy tradeoffs for ML models. Efforts include compression, quantization, pruning, early-ex… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: 16 pages; MLSYS 2023

  12. arXiv:2305.14129  [pdf, other

    cs.SE cs.LG

    GrACE: Generation using Associated Code Edits

    Authors: Priyanshu Gupta, Avishree Khare, Yasharth Bajpai, Saikat Chakraborty, Sumit Gulwani, Aditya Kanade, Arjun Radhakrishna, Gustavo Soares, Ashish Tiwari

    Abstract: Developers expend a significant amount of time in editing code for a variety of reasons such as bug fixing or adding new features. Designing effective methods to predict code edits has been an active yet challenging area of research due to the diversity of code edits and the difficulty of capturing the developer intent. In this work, we address these challenges by endowing pre-trained large langua… ▽ More

    Submitted 20 September, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  13. arXiv:2304.06167  [pdf

    cs.CR cs.AR

    CoVE: Towards Confidential Computing on RISC-V Platforms

    Authors: Ravi Sahita, Atish Patra, Vedvyas Shanbhogue, Samuel Ortiz, Andrew Bresticker, Dylan Reid, Atul Khare, Rajnesh Kanwal

    Abstract: Multi-tenant computing platforms are typically comprised of several software and hardware components including platform firmware, host operating system kernel, virtualization monitor, and the actual tenant payloads that run on them (typically in a virtual machine, container, or application). This model is well established in large scale commercial deployment, but the downside is that all platform… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    ACM Class: D.4.6

  14. arXiv:2303.15132  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Cross-utterance ASR Rescoring with Graph-based Label Propagation

    Authors: Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran

    Abstract: We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity. In contrast to conventional neural language model (LM) based ASR rescoring/reranking models, our approach focuses on acoustic information and conducts the rescoring collaboratively among utterances, instead of individually. Experiments on the VCTK da… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: To appear in IEEE ICASSP 2023

    Journal ref: Proc. IEEE ICASSP, June 2023

  15. arXiv:2301.10879  [pdf, other

    cs.LG cs.DC

    SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference

    Authors: Alind Khare, Animesh Agrawal, Aditya Annavajjala, Payman Behnam, Myungjin Lee, Hugo Latapie, Alexey Tumanov

    Abstract: Neural Architecture Search (NAS) for Federated Learning (FL) is an emerging field. It automates the design and training of Deep Neural Networks (DNNs) when data cannot be centralized due to privacy, communication costs, or regulatory restrictions. Recent federated NAS methods not only reduce manual effort but also help achieve higher accuracy than traditional FL methods like FedAvg. Despite the su… ▽ More

    Submitted 11 July, 2024; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: Accepted at ECCV 2024

  16. arXiv:2210.15056  [pdf, other

    cs.LG

    UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification

    Authors: Yanbo Xu, Alind Khare, Glenn Matlin, Monish Ramadoss, Rishikesan Kamaleswaran, Chao Zhang, Alexey Tumanov

    Abstract: Machine Learning (ML) research has focused on maximizing the accuracy of predictive tasks. ML models, however, are increasingly more complex, resource intensive, and costlier to deploy in resource-constrained environments. These issues are exacerbated for prediction tasks with sequential classification on progressively transitioned stages with ''happens-before'' relation between them.We argue that… ▽ More

    Submitted 27 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: To be published in NeurIPS'22

  17. Guided contrastive self-supervised pre-training for automatic speech recognition

    Authors: Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas

    Abstract: Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model. It can be used to effectively initialize the encoder of an Automatic Speech Recognition (ASR) model. We present a novel modification of CPC called Guided Contrastive Predictive Coding (GCPC). Our proposed method m… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: To appear in SLT 2022

  18. arXiv:2205.05882  [pdf

    cs.LG cs.AI cs.HC cs.RO

    E-Mail Assistant -- Automation of E-Mail Handling and Management using Robotic Process Automation

    Authors: Arpit Khare, Sudhakar Singh, Richa Mishra, Shiv Prakash, Pratibha Dixit

    Abstract: In this paper, a workflow for designing a bot using Robotic Process Automation (RPA), associated with Artificial Intelligence (AI) that is used for information extraction, classification, etc., is proposed. The bot is equipped with many features that make email handling a stress-free job. It automatically login into the mailbox through secured channels, distinguishes between the useful and not use… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: 7 pages, 4 figures, Accepted in DASA 2022

    Report number: 1570792902 ACM Class: I.2.1

  19. ASR-Aware End-to-end Neural Diarization

    Authors: Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke

    Abstract: We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model. Two categories of features are explored: features derived directly from ASR output (phones, position-in-word and word boundaries) and features derived from a lexical speaker change detection model, trained by fine-tuning a pret… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: To appear in ICASSP 2022

    Journal ref: Proc. IEEE ICASSP, May 2022, pp. 8092-8096

  20. arXiv:2201.02127  [pdf

    cs.IR cs.CL cs.LG

    Sentiment Analysis and Sarcasm Detection of Indian General Election Tweets

    Authors: Arpit Khare, Amisha Gangwar, Sudhakar Singh, Shiv Prakash

    Abstract: Social Media usage has increased to an all-time high level in today's digital world. The majority of the population uses social media tools (like Twitter, Facebook, YouTube, etc.) to share their thoughts and experiences with the community. Analysing the sentiments and opinions of the common public is very important for both the government and the business people. This is the reason behind the acti… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

    Comments: 17 pages, 9 figures, ANTIC-2021

    Report number: 533 MSC Class: 68T50 ACM Class: I.2.7

  21. arXiv:2104.12642  [pdf, other

    cs.CV cs.LG

    CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment

    Authors: Manas Sahni, Shreya Varshini, Alind Khare, Alexey Tumanov

    Abstract: The emergence of CNNs in mainstream deployment has necessitated methods to design and train efficient architectures tailored to maximize the accuracy under diverse hardware & latency constraints. To scale these resource-intensive tasks with an increasing number of deployment targets, Once-For-All (OFA) proposed an approach to jointly train several models at once with a constant training cost. Howe… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: Published as a conference paper at ICLR 2021

  22. arXiv:2104.05421  [pdf, other

    cs.LG cs.AI

    NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic

    Authors: Mahdi Nazemi, Arash Fayyazi, Amirhossein Esmaili, Atharva Khare, Soheil Nazar Shahsavani, Massoud Pedram

    Abstract: While there is a large body of research on efficient processing of deep neural networks (DNNs), ultra-low-latency realization of these models for applications with stringent, sub-microsecond latency requirements continues to be an unresolved, challenging problem. Field-programmable gate array (FPGA)-based DNN accelerators are gaining traction as a serious contender to replace graphics processing u… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  23. arXiv:2102.05811  [pdf, other

    cs.CV eess.IV

    Audiovisual Highlight Detection in Videos

    Authors: Karel Mundnich, Alexandra Fenster, Aparna Khare, Shiva Sundaram

    Abstract: In this paper, we test the hypothesis that interesting events in unstructured videos are inherently audiovisual. We combine deep image representations for object recognition and scene understanding with representations from an audiovisual affect recognition model. To this set, we include content agnostic audio-visual synchrony representations and mel-frequency cepstral coefficients to capture othe… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures, conference paper

  24. Automated Crop Field Surveillance using Computer Vision

    Authors: Tejas Atul Khare, Anuradha C. Phadke

    Abstract: Artificial Intelligence is everywhere today. But unfortunately, Agriculture has not been able to get that much attention from Artificial Intelligence (AI). A lack of automation persists in the agriculture industry. For over many years, farmers and crop field owners have been facing a problem of trespassing of wild animals for which no feasible solution has been provided. Installing a fence or barr… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

    Comments: 6 Pages, 10 Figures

    Journal ref: Proceedings reference - 978-1-7281-9885-9/20/$31.00 \c{opyright}2020 IEEE

  25. arXiv:2011.14691  [pdf, ps, other

    cs.LG

    KD-Lib: A PyTorch library for Knowledge Distillation, Pruning and Quantization

    Authors: Het Shah, Avishree Khare, Neelay Shah, Khizir Siddiqui

    Abstract: In recent years, the growing size of neural networks has led to a vast amount of research concerning compression techniques to mitigate the drawbacks of such large sizes. Most of these research works can be categorized into three broad families : Knowledge Distillation, Pruning, and Quantization. While there has been steady research in this domain, adoption and commercial usage of the proposed tec… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

  26. Self-Supervised learning with cross-modal transformers for emotion recognition

    Authors: Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram

    Abstract: Emotion recognition is a challenging task due to limited availability of in-the-wild labeled datasets. Self-supervised learning has shown improvements on tasks with limited labeled datasets in domains like speech and natural language. Models such as BERT learn to incorporate context in word embeddings, which translates to improved performance in downstream tasks like question answering. In this wo… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: To appear in SLT2020

  27. Multi-modal embeddings using multi-task learning for emotion recognition

    Authors: Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram

    Abstract: General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machi… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

    Comments: To appear in Interspeech,2020

  28. arXiv:2008.04063  [pdf, other

    cs.LG stat.ML

    HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units

    Authors: Shenda Hong, Yanbo Xu, Alind Khare, Satria Priambada, Kevin Maher, Alaa Aljiffry, Jimeng Sun, Alexey Tumanov

    Abstract: Deep learning models have achieved expert-level performance in healthcare with an exclusive focus on training accurate models. However, in many clinical environments such as intensive care unit (ICU), real-time model serving is equally if not more important than accuracy, because in ICU patient care is simultaneously more urgent and more expensive. Clinical decisions and their timeliness, therefor… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

  29. arXiv:2004.14840  [pdf, other

    eess.AS cs.CV cs.LG cs.SD stat.ML

    Multiresolution and Multimodal Speech Recognition with Transformers

    Authors: Georgios Paraskevopoulos, Srinivas Parthasarathy, Aparna Khare, Shiva Sundaram

    Abstract: This paper presents an audio visual automatic speech recognition (AV-ASR) system using a Transformer-based architecture. We particularly focus on the scene context provided by the visual information, to ground the ASR. We extract representations for audio features in the encoder layers of the transformer and fuse video features using an additional crossmodal multihead attention layer. Additionally… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted for ACL 2020

  30. Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning

    Authors: Sanna Wager, Aparna Khare, Minhua Wu, Kenichi Kumatani, Shiva Sundaram

    Abstract: In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system. For the student, both multi-channel feature extraction layers an… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

    Comments: To appear in ICASSP 2020

  31. arXiv:2002.00122  [pdf, ps, other

    cs.SD eess.AS

    Multi-channel Acoustic Modeling using Mixed Bitrate OPUS Compression

    Authors: Aparna Khare, Shiva Sundaram, Minhua Wu

    Abstract: Recent literature has shown that a learned front end with multi-channel audio input can outperform traditional beam-forming algorithms for automatic speech recognition (ASR). In this paper, we present our study on multi-channel acoustic modeling using OPUS compression with different bitrates for the different channels. We analyze the degradation in word error rate (WER) as a function of the audio… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

  32. arXiv:1910.11605  [pdf, other

    cs.LG cs.CV stat.ML

    A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs

    Authors: Koyel Mukherjee, Alind Khare, Ashish Verma

    Abstract: Training neural networks on image datasets generally require extensive experimentation to find the optimal learning rate regime. Especially, for the cases of adversarial training or for training a newly synthesized model, one would not know the best learning rate regime beforehand. We propose an automated algorithm for determining the learning rate trajectory, that works across datasets and models… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.