Skip to main content

Showing 1–50 of 53 results for author: Jindal, A

  1. arXiv:2407.01866  [pdf, other

    cs.CV cs.GR

    Image-GS: Content-Adaptive Image Representation via 2D Gaussians

    Authors: Yunxiang Zhang, Alexandr Kuznetsov, Akshay Jindal, Kenneth Chen, Anton Sochenov, Anton Kaplanyan, Qi Sun

    Abstract: Neural image representations have recently emerged as a promising technique for storing, streaming, and rendering visual data. Coupled with learning-based workflows, these novel representations have demonstrated remarkable visual fidelity and memory efficiency. However, existing neural image representations often rely on explicit uniform data structures without content adaptivity or computation-in… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2405.11559  [pdf, ps, other

    cs.CL cs.AI

    DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning

    Authors: Suyash Vardhan Mathur, Akshett Rai Jindal, Manish Shrivastava

    Abstract: While significant work has been done in the field of NLP on vertical thinking, which involves primarily logical thinking, little work has been done towards lateral thinking, which involves looking at problems from an unconventional perspective and defying existing conceptions and notions. Towards this direction, SemEval 2024 introduces the task of BRAINTEASER, which involves two types of questions… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  3. arXiv:2404.02088  [pdf, other

    cs.CL cs.SD eess.AS

    LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task

    Authors: Suyash Vardhan Mathur, Akshett Rai Jindal, Hardik Mittal, Manish Shrivastava

    Abstract: Conversation is the most natural form of human communication, where each utterance can range over a variety of possible emotions. While significant work has been done towards the detection of emotions in text, relatively little work has been done towards finding the cause of the said emotions, especially in multimodal settings. SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  4. arXiv:2403.09827  [pdf, other

    eess.IV cs.CV

    FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images

    Authors: Yiqing Shen, Jingxing Li, Xinyuan Shao, Blanca Inigo Romillo, Ankush Jindal, David Dreizin, Mathias Unberath

    Abstract: Segment anything models (SAMs) are gaining attention for their zero-shot generalization capability in segmenting objects of unseen classes and in unseen domains when properly prompted. Interactivity is a key strength of SAMs, allowing users to iteratively provide prompts that specify objects of interest to refine outputs. However, to realize the interactive use of SAMs for 3D medical imaging tasks… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  5. arXiv:2403.02247  [pdf, ps, other

    cs.CL

    Birbal: An efficient 7B instruct-model fine-tuned with curated datasets

    Authors: Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh

    Abstract: LLMOps incur significant costs due to hardware requirements, hindering their widespread accessibility. Additionally, a lack of transparency in model training methods and data contributes to the majority of models being non-reproducible. To tackle these challenges, the LLM Efficiency Challenge was introduced at NeurIPS Workshop, aiming to adapt foundation models on a diverse set of tasks via fine-t… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  6. arXiv:2401.03723  [pdf, other

    cs.DB cs.LG

    Sibyl: Forecasting Time-Evolving Query Workloads

    Authors: Hanxian Huang, Tarique Siddiqui, Rana Alotaibi, Carlo Curino, Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesus Camacho-Rodriguez, Yuanyuan Tian

    Abstract: Database systems often rely on historical query traces to perform workload-based performance tuning. However, real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. To address this challenge, we propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries, with the entire query stat… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: The paper has been accepted by SIGMOD 2024

  7. arXiv:2401.01280  [pdf, other

    cs.DB cs.LG

    GEqO: ML-Accelerated Semantic Equivalence Detection

    Authors: Brandon Haynes, Rana Alotaibi, Anna Pavlenko, Jyoti Leeka, Alekh Jindal, Yuanyuan Tian

    Abstract: Large scale analytics engines have become a core dependency for modern data-driven enterprises to derive business insights and drive actions. These engines support a large number of analytic jobs processing huge volumes of data on a daily basis, and workloads are often inundated with overlapping computations across multiple jobs. Reusing common computation is crucial for efficient cluster resource… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Journal ref: Proceedings of the ACM on Management of Data (2024) Volume 1 Issue 4

  8. arXiv:2312.15272  [pdf, other

    cs.CL cs.CY cs.LG cs.SD eess.AS

    Detecting anxiety from short clips of free-form speech

    Authors: Prabhat Agarwal, Akshat Jindal, Shreya Singh

    Abstract: Barriers to accessing mental health assessments including cost and stigma continues to be an impediment in mental health diagnosis and treatment. Machine learning approaches based on speech samples could help in this direction. In this work, we develop machine learning solutions to diagnose anxiety disorders from audio journals of patients. We work on a novel anxiety dataset (provided through coll… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  9. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  10. arXiv:2312.02957  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Classification for everyone : Building geography agnostic models for fairer recognition

    Authors: Akshat Jindal, Shreya Singh, Soham Gadgil

    Abstract: In this paper, we analyze different methods to mitigate inherent geographical biases present in state of the art image classification models. We first quantitatively present this bias in two datasets - The Dollar Street Dataset and ImageNet, using images with location information. We then present different methods which can be employed to reduce this bias. Finally, we analyze the effectiveness of… ▽ More

    Submitted 2 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: typos corrected, references added

  11. arXiv:2311.04588  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection

    Authors: Akshit Jindal, Vikram Goyal, Saket Anand, Chetan Arora

    Abstract: Machine Learning (ML) models become vulnerable to Model Stealing Attacks (MSA) when they are deployed as a service. In such attacks, the deployed model is queried repeatedly to build a labelled dataset. This dataset allows the attacker to train a thief model that mimics the original model. To maximize query efficiency, the attacker has to select the most informative subset of data points from the… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures, paper accepted to WACV 2024

  12. arXiv:2311.00176  [pdf, other

    cs.CL

    ChipNeMo: Domain-Adapted LLMs for Chip Design

    Authors: Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Ankit Jindal, Brucek Khailany, George Kokai , et al. (17 additional authors not shown)

    Abstract: ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We e… ▽ More

    Submitted 4 April, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: Updated results for ChipNeMo-70B model

  13. arXiv:2308.14089  [pdf, other

    cs.CL cs.AI cs.LG

    MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records

    Authors: Scott L. Fleming, Alejandro Lozano, William J. Haberkorn, Jenelle A. Jindal, Eduardo P. Reis, Rahul Thapa, Louis Blankemeier, Julian Z. Genkins, Ethan Steinberg, Ashwin Nayak, Birju S. Patel, Chia-Chun Chiang, Alison Callahan, Zepeng Huo, Sergios Gatidis, Scott J. Adams, Oluseyi Fayanju, Shreya J. Shah, Thomas Savage, Ethan Goh, Akshay S. Chaudhari, Nima Aghaeepour, Christopher Sharp, Michael A. Pfeffer, Percy Liang , et al. (5 additional authors not shown)

    Abstract: The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture… ▽ More

    Submitted 24 December, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

  14. Exploring the Use of WebAssembly in HPC

    Authors: Mohak Chadha, Nils Krueger, Jophin John, Anshul Jindal, Michael Gerndt, Shajulin Benedict

    Abstract: Containerization approaches based on namespaces offered by the Linux kernel have seen an increasing popularity in the HPC community both as a means to isolate applications and as a format to package and distribute them. However, their adoption and usage in HPC systems faces several challenges. These include difficulties in unprivileged running and building of scientific application container image… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    Comments: ACM SIGPLAN PPoPP 2023

  15. FedLesScan: Mitigating Stragglers in Serverless Federated Learning

    Authors: Mohamed Elzohairy, Mohak Chadha, Anshul Jindal, Andreas Grafberger, Jianfeng Gu, Michael Gerndt, Osama Abboud

    Abstract: Federated Learning (FL) is a machine learning paradigm that enables the training of a shared global model across distributed clients while keeping the training data local. While most prior work on designing systems for FL has focused on using stateful always running components, recent work has shown that components in an FL system can greatly benefit from the usage of serverless computing and Func… ▽ More

    Submitted 28 November, 2022; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: IEEE BigData 2022

  16. Deploying a Steered Query Optimizer in Production at Microsoft

    Authors: Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal

    Abstract: Modern analytical workloads are highly heterogeneous and massively complex, making generic query optimizers untenable for many customers and scenarios. As a result, it is important to specialize these optimizers to instances of the workloads. In this paper, we continue a recent line of work in steering a query optimizer towards better plans for a given workload, and make major strides in pushing p… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of the 2022 International Conference on Management of Data 2022 Jun 10 (pp. 2299-2311)

  17. arXiv:2210.04212  [pdf, other

    cs.DC cs.PF cs.SE

    Migrating from Microservices to Serverless: An IoT Platform Case Study

    Authors: Mohak Chadha, Victor Pacyna, Anshul Jindal, Jianfeng Gu, Michael Gerndt

    Abstract: Microservice architecture is the common choice for developing cloud applications these days since each individual microservice can be independently modified, replaced, and scaled. As a result, application development and operating cloud infrastructure were bundled together into what is now commonly called DevOps. However, with the increasing popularity of the serverless computing paradigm and its… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

    Comments: ACM International Workshop on Serverless Computing 2022 (WoSC@Middleware 2022)

  18. arXiv:2207.06811  [pdf, other

    cs.SE cs.DC

    Bunk8s: Enabling Easy Integration Testing of Microservices in Kubernetes

    Authors: Christoph Reile, Mohak Chadha, Valentin Hauner, Anshul Jindal, Benjamin Hofmann, Michael Gerndt

    Abstract: Microservice architecture is the common choice for cloud applications these days since each individual microservice can be independently modified, replaced, and scaled. However, the complexity of microservice applications requires automated testing with a focus on the interactions between the services. While this is achievable with end-to-end tests, they are error-prone, brittle, expensive to writ… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

  19. arXiv:2207.06183  [pdf, other

    cs.DC

    SLAM: SLO-Aware Memory Optimization for Serverless Applications

    Authors: Gor Safaryan, Anshul Jindal, Mohak Chadha, Michael Gerndt

    Abstract: Serverless computing paradigm has become more ingrained into the industry, as it offers a cheap alternative for application development and deployment. This new paradigm has also created new kinds of problems for the developer, who needs to tune memory configurations for balancing cost and performance. Many researchers have addressed the issue of minimizing cost and meeting Service Level Objective… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: This is the preprint version of the accepted paper at IEEE CLOUD'22

  20. Scalable Infrastructure for Workload Characterization of Cluster Traces

    Authors: Thomas van Loo, Anshul Jindal, Shajulin Benedict, Mohak Chadha, Michael Gerndt

    Abstract: In the recent past, characterizing workloads has been attempted to gain a foothold in the emerging serverless cloud market, especially in the large production cloud clusters of Google, AWS, and so forth. While analyzing and characterizing real workloads from a large production cloud cluster benefits cloud providers, researchers, and daily users, analyzing the workload traces of these clusters has… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: 9 pages, CLOSER 2022

    Journal ref: In Proceedings of the 12th International Conference on Cloud Computing and Services Science - CLOSER 2022, ISBN 978-989-758-570-8; ISSN 2184-5042, pages 254-263

  21. arXiv:2205.10519  [pdf, other

    cs.IT eess.SP

    Impact of Multiple Fully-Absorbing Receivers in Molecular Communications

    Authors: Nithin V. Sabu, Abhishek K. Gupta, Neeraj Varshney, Anshuman Jindal

    Abstract: Molecular communication is a promising solution to enable intra-body communications among nanomachines. However, malicious and non-cooperative receivers can degrade the performance, compromising these systems' security. Analyzing the communication and security performance of these systems requires accurate channel models. However, such models are not present in the literature. In this work, we dev… ▽ More

    Submitted 21 May, 2022; originally announced May 2022.

  22. arXiv:2203.03541  [pdf, other

    cs.CL cs.AI

    Fairness for Text Classification Tasks with Identity Information Data Augmentation Methods

    Authors: Mohit Wadhwa, Mohan Bhambhani, Ashvini Jindal, Uma Sawant, Ramanujam Madhavan

    Abstract: Counterfactual fairness methods address the question: How would the prediction change if the sensitive identity attributes referenced in the text instance were different? These methods are entirely based on generating counterfactuals for the given training and test set instances. Counterfactual instances are commonly prepared by replacing sensitive identity terms, i.e., the identity terms present… ▽ More

    Submitted 4 February, 2022; originally announced March 2022.

  23. arXiv:2201.11454  [pdf, other

    cs.DC stat.OT

    Estimating the Capacities of Function-as-a-Service Functions

    Authors: Anshul Jindal, Mohak Chadha, Shajulin Benedict, Michael Gerndt

    Abstract: Serverless computing is a cloud computing paradigm that allows developers to focus exclusively on business logic as cloud service providers manage resource management tasks. Serverless applications follow this model, where the application is decomposed into a set of fine-grained Function-as-a-Service (FaaS) functions. However, the obscurities of the underlying system infrastructure and dependencie… ▽ More

    Submitted 27 January, 2022; originally announced January 2022.

    Comments: 8 pages, Accepted at CloudAM'21 Workshop (UCC)

  24. arXiv:2201.05232  [pdf, other

    cs.AR

    FARSI: Facebook AR System Investigator for Agile Domain-Specific System-on-Chip Exploration

    Authors: Behzad Boroujerdian, Ying Jing, Amit Kumar, Lavanya Subramanian, Luke Yen, Vincent Lee, Vivek Venkatesan, Amit Jindal, Robert Shearer, Vijay Janapa Reddi

    Abstract: Domain-specific SoCs (DSSoCs) are attractive solutions for domains with stringent power/performance/area constraints; however, they suffer from two fundamental complexities. On the one hand, their many specialized hardware blocks result in complex systems and thus high development effort. On the other, their many system knobs expand the complexity of design space, making the search for the optimal… ▽ More

    Submitted 17 January, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

  25. arXiv:2112.09549  [pdf, other

    cs.IT

    Channel Characterization and Performance of a 3-D Molecular Communication System with Multiple Fully-Absorbing Receivers

    Authors: Nithin V. Sabu, Abhishek K. Gupta, Neeraj Varshney, Anshuman Jindal

    Abstract: Molecular communication (MC) can enable the transfer of information between nanomachines using molecules as the information carrier. In MC systems, multiple receiver nanomachines often co-exist in the same communication channel to serve common or different purposes. However, the analytical channel model for a system with multiple fully absorbing receivers (FARs) does not exist in the literature, w… ▽ More

    Submitted 6 December, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

  26. arXiv:2112.08572  [pdf, other

    cs.DB cs.LG

    Predictive Price-Performance Optimization for Serverless Query Processing

    Authors: Rathijit Sen, Abhishek Roy, Alekh Jindal

    Abstract: We present an efficient, parametric modeling framework for predictive resource allocations, focusing on the amount of computational resources, that can optimize for a range of price-performance objectives for data analytics in serverless query processing settings. We discuss and evaluate in depth how our system, AutoExecutor, can use this framework to automatically select near-optimal executor and… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

  27. arXiv:2111.11052  [pdf, other

    cs.DC stat.ML

    IAD: Indirect Anomalous VMMs Detection in the Cloud-based Environment

    Authors: Anshul Jindal, Ilya Shakhat, Jorge Cardoso, Michael Gerndt, Vladimir Podolskiy

    Abstract: Server virtualization in the form of virtual machines (VMs) with the use of a hypervisor or a Virtual Machine Monitor (VMM) is an essential part of cloud computing technology to provide infrastructure-as-a-service (IaaS). A fault or an anomaly in the VMM can propagate to the VMs hosted on it and ultimately affect the availability and reliability of the applications running on those VMs. Therefore,… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

    Comments: Accepted at AIOps 2021 workshop (ICSOC 2021)

  28. arXiv:2111.10690  [pdf, other

    cs.NI cs.CG eess.SY

    Network Graph Generation through Adaptive Clustering and Infection Dynamics: A Step Towards Global Connectivity

    Authors: Aniq Ur Rahman, Fares Fourati, Khac-Hoang Ngo, Anish Jindal, Mohamed-Slim Alouini

    Abstract: More than 40% of the world's population is not connected to the internet, majorly due to the lack of adequate infrastructure. Our work aims to bridge this digital divide by proposing solutions for network deployment in remote areas. Specifically, a number of access points (APs) are deployed as an interface between the users and backhaul nodes (BNs). The main challenges include designing the number… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

    Comments: 6 pages, 8 figures, 2 algorithms

  29. arXiv:2111.03396  [pdf, other

    cs.CR cs.DC cs.LG cs.PF

    FedLess: Secure and Scalable Federated Learning Using Serverless Computing

    Authors: Andreas Grafberger, Mohak Chadha, Anshul Jindal, Jianfeng Gu, Michael Gerndt

    Abstract: The traditional cloud-centric approach for Deep Learning (DL) requires training data to be collected and processed at a central server which is often challenging in privacy-sensitive domains like healthcare. Towards this, a new learning paradigm called Federated Learning (FL) has been proposed that brings the potential of DL to these domains while addressing privacy and data ownership issues. FL e… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: IEEE BigData 2021

  30. arXiv:2110.02313  [pdf, other

    cs.DB cs.AI cs.LG

    Phoebe: A Learning-based Checkpoint Optimizer

    Authors: Yiwen Zhu, Matteo Interlandi, Abhishek Roy, Krishnadhan Das, Hiren Patel, Malay Bag, Hitesh Sharma, Alekh Jindal

    Abstract: Easy-to-use programming interfaces paired with cloud-scale processing engines have enabled big data system users to author arbitrarily complex analytical jobs over massive volumes of data. However, as the complexity and scale of analytical jobs increase, they encounter a number of unforeseen problems, hotspots with large intermediate data on temporary storage, longer job recovery time after failur… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Journal ref: Proceedings of the VLDB Endowment 14 (11), 2505-2518, 2021

  31. arXiv:2108.09457  [pdf, other

    cs.AI cs.DC

    DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices

    Authors: Stephan Patrick Baller, Anshul Jindal, Mohak Chadha, Michael Gerndt

    Abstract: EdgeAI (Edge computing based Artificial Intelligence) has been most actively researched for the last few years to handle variety of massively distributed AI applications to meet up the strict latency requirements. Meanwhile, many companies have released edge devices with smaller form factors (low power consumption and limited resources) like the popular Raspberry Pi and Nvidia's Jetson Nano for ac… ▽ More

    Submitted 21 August, 2021; originally announced August 2021.

    Comments: 12 pages, accepted at IC2E'21

  32. arXiv:2107.10008  [pdf, other

    cs.DC cs.PF

    Architecture-Specific Performance Optimization of Compute-Intensive FaaS Functions

    Authors: Mohak Chadha, Anshul Jindal, Michael Gerndt

    Abstract: FaaS allows an application to be decomposed into functions that are executed on a FaaS platform. The FaaS platform is responsible for the resource provisioning of the functions. Recently, there is a growing trend towards the execution of compute-intensive FaaS functions that run for several seconds. However, due to the billing policies followed by commercial FaaS offerings, the execution of these… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

    Comments: Extended version IEEE CLOUD 2021

  33. arXiv:2107.08594  [pdf, other

    cs.DB cs.LG

    Optimal Resource Allocation for Serverless Queries

    Authors: Anish Pimpley, Shuo Li, Anubha Srivastava, Vishal Rohra, Yi Zhu, Soundararajan Srinivasan, Alekh Jindal, Hiren Patel, Shi Qiao, Rathijit Sen

    Abstract: Optimizing resource allocation for analytical workloads is vital for reducing costs of cloud-data services. At the same time, it is incredibly hard for users to allocate resources per query in serverless processing systems, and they frequently misallocate by orders of magnitude. Unfortunately, prior work focused on predicting peak allocation while ignoring aggressive trade-offs between resource al… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

  34. arXiv:2106.08938  [pdf, other

    cs.DC

    Memory Leak Detection Algorithms in the Cloud-based Infrastructure

    Authors: Anshul Jindal, Paul Staab, Pooja Kulkarni, Jorge Cardoso, Michael Gerndt, Vladimir Podolskiy

    Abstract: A memory leak in an application deployed on the cloud can affect the availability and reliability of the application. Therefore, identifying and ultimately resolve it quickly is highly important. However, in the production environment running on the cloud, memory leak detection is a challenge without the knowledge of the application or its internal object allocation details. This paper addresses… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 10. pages. arXiv admin note: substantial text overlap with arXiv:2101.09799

  35. arXiv:2104.03071  [pdf, other

    cs.CL

    BreakingBERT@IITK at SemEval-2021 Task 9 : Statement Verification and Evidence Finding with Tables

    Authors: Aditya Jindal, Ankur Gupta, Jaya Srivastava, Preeti Menghwani, Vijit Malik, Vishesh Kaushik, Ashutosh Modi

    Abstract: Recently, there has been an interest in factual verification and prediction over structured data like tables and graphs. To circumvent any false news incident, it is necessary to not only model and predict over structured data efficiently but also to explain those predictions. In this paper, as part of the SemEval-2021 Task 9, we tackle the problem of fact verification and evidence finding over ta… ▽ More

    Submitted 10 April, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted at SemEval 2021 Task 9, 11 Pages (8 Pages main content+ 1 pages for references + 2 Pages Appendix)

  36. Function Delivery Network: Extending Serverless Computing for Heterogeneous Platforms

    Authors: Anshul Jindal, Michael Gerndt, Mohak Chadha, Vladimir Podolskiy, Pengfei Chen

    Abstract: Serverless computing has rapidly grown following the launch of Amazon's Lambda platform. Function-as-a-Service (FaaS) a key enabler of serverless computing allows an application to be decomposed into simple, standalone functions that are executed on a FaaS platform. The FaaS platform is responsible for deploying and facilitating resources to the functions. Several of today's cloud applications spr… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: Accepted at Journal of Software: Practice and Experience

  37. Online Memory Leak Detection in the Cloud-based Infrastructures

    Authors: Anshul Jindal, Paul Staab, Jorge Cardoso, Michael Gerndt, Vladimir Podolskiy

    Abstract: A memory leak in an application deployed on the cloud can affect the availability and reliability of the application. Therefore, to identify and ultimately resolve it quickly is highly important. However, in the production environment running on the cloud, memory leak detection is a challenge without the knowledge of the application or its internal object allocation details. This paper addresses… ▽ More

    Submitted 24 January, 2021; originally announced January 2021.

    Comments: 12 pages

    Journal ref: International Workshop on Artificial Intelligence for IT Operations (AIOPS) 2020

  38. arXiv:2101.09796  [pdf, other

    cs.SE cs.DC

    The Ifs and Buts of the Development Approaches for IoT Applications

    Authors: Saitel Daniela Agudelo-Sanabria, Anshul Jindal

    Abstract: The recent growth of the Internet of Things (IoT) devices has lead to the rise of various complex applications where these applications involve interactions among large numbers of heterogeneous devices. An important challenge that needs to be addressed is to facilitate the agile development of IoT applications with minimal effort by the various parties involved in the process. However, IoT applica… ▽ More

    Submitted 24 January, 2021; originally announced January 2021.

    Comments: 7 pages

  39. arXiv:2011.03729  [pdf, other

    cs.LG stat.ML

    Enhash: A Fast Streaming Algorithm For Concept Drift Detection

    Authors: Aashi Jindal, Prashant Gupta, Debarka Sengupta, Jayadeva

    Abstract: We propose Enhash, a fast ensemble learner that detects \textit{concept drift} in a data stream. A stream may consist of abrupt, gradual, virtual, or recurring events, or a mixture of various types of drift. Enhash employs projection hash to insert an incoming sample. We show empirically that the proposed method has competitive performance to existing ensemble learners in much lesser time. Also, E… ▽ More

    Submitted 7 November, 2020; originally announced November 2020.

  40. arXiv:2010.09808  [pdf, other

    cs.LG cs.AI stat.ML

    Imitation with Neural Density Models

    Authors: Kuno Kim, Akshat Jindal, Yang Song, Jiaming Song, Yanan Sui, Stefano Ermon

    Abstract: We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We prese… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

  41. arXiv:2009.12922  [pdf, other

    cs.DC cs.DB cs.LG cs.PF

    Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation

    Authors: Olga Poppe, Tayo Amuneke, Dalitso Banda, Aritra De, Ari Green, Manon Knoertzer, Ehi Nosakhare, Karthik Rajendran, Deepak Shankargouda, Meina Wang, Alan Au, Carlo Curino, Qun Guo, Alekh Jindal, Ajay Kalhan, Morgan Oslake, Sonia Parchani, Vijay Ramani, Raj Sellappan, Saikat Sen, Sheetal Shrotri, Soundararajan Srinivasan, Ping Xia, Shize Xu, Alicia Yang , et al. (1 additional authors not shown)

    Abstract: Microsoft Azure is dedicated to guarantee high quality of service to its customers, in particular, during periods of high customer activity, while controlling cost. We employ a Data Science (DS) driven solution to predict user load and leverage these predictions to optimize resource allocation. To this end, we built the Seagull infrastructure that processes per-server telemetry, validates the data… ▽ More

    Submitted 16 October, 2020; v1 submitted 27 September, 2020; originally announced September 2020.

    Comments: Technical report for the paper in VLDB 2021

  42. arXiv:2009.12156  [pdf, other

    cs.SE

    An Empirical Study on the Impact of Deep Parameters on Mobile App Energy Usage

    Authors: Qiang Xu, James C. Davis, Y. Charlie Hu, Abhilash Jindal

    Abstract: Improving software performance through configuration parameter tuning is a common activity during software maintenance. Beyond traditional performance metrics like latency, mobile app developers are interested in reducing app energy usage. Some mobile apps have centralized locations for parameter tuning, similar to databases and operating systems, but it is common for mobile apps to have hundreds… ▽ More

    Submitted 16 January, 2022; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: 12 pages, 9 figures, to be published in SANER 2022, camera-ready

  43. arXiv:2002.12393  [pdf, other

    cs.DB

    Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings

    Authors: Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao le

    Abstract: Query processing over big data is ubiquitous in modern clouds, where the system takes care of picking both the physical query execution plans and the resources needed to run those plans, using a cost-based query optimizer. A good cost model, therefore, is akin to better resource efficiency and lower operational costs. Unfortunately, the production workloads at Microsoft show that costs are very co… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

    Comments: To appear at SIGMOD 2020

  44. arXiv:1909.00659  [pdf, other

    cs.LG stat.ML

    Guided Random Forest and its application to data approximation

    Authors: Prashant Gupta, Aashi Jindal, Jayadeva, Debarka Sengupta

    Abstract: We present a new way of constructing an ensemble classifier, named the Guided Random Forest (GRAF) in the sequel. GRAF extends the idea of building oblique decision trees with localized partitioning to obtain a global partitioning. We show that global partitioning bridges the gap between decision trees and boosting algorithms. We empirically demonstrate that global partitioning reduces the general… ▽ More

    Submitted 2 September, 2019; originally announced September 2019.

  45. arXiv:1909.00084  [pdf, other

    cs.DB cs.DC cs.LG

    Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML

    Authors: Ashvin Agrawal, Rony Chatterjee, Carlo Curino, Avrilia Floratou, Neha Gowdal, Matteo Interlandi, Alekh Jindal, Kostantinos Karanasos, Subru Krishnan, Brian Kroth, Jyoti Leeka, Kwanghyun Park, Hiren Patel, Olga Poppe, Fotis Psallidas, Raghu Ramakrishnan, Abhishek Roy, Karla Saur, Rathijit Sen, Markus Weimer, Travis Wright, Yiwen Zhu

    Abstract: Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios including voice recognition and conversational understanding for customer support, autotuning for videoconferencing, intelligent feedback loops in large-scale sysops, manufacturing and autonomous vehicle management, complex… ▽ More

    Submitted 27 December, 2019; v1 submitted 30 August, 2019; originally announced September 2019.

  46. arXiv:1906.06590  [pdf, other

    cs.DB

    Query and Resource Optimizations: A Case for Breaking the Wall in Big Data Systems

    Authors: Alekh Jindal, Lalitha Viswanathan, Konstantinos Karanasos

    Abstract: Modern big data systems run on cloud environments where resources are shared amongst several users and applications. As a result, declarative user queries in these environments need to be optimized and executed over resources that constantly change and are provisioned on demand for each job. This requires us to rethink traditional query optimizers designed for systems that run on dedicated resourc… ▽ More

    Submitted 15 June, 2019; originally announced June 2019.

  47. arXiv:1901.03614  [pdf, other

    cs.IT

    Jammer-Assisted Resource Allocation in Secure OFDMA With Untrusted Users

    Authors: Ravikant Saini, Abhishek Jindal, Swades De

    Abstract: In this paper, we consider the problem of resource allocation in the orthogonal frequency division multiple access system with single source and M untrusted users in presence of a friendly jammer. The jammer is used to improve either the weighted sum secure rate or the overall system fairness. The formulated optimization problem in both the cases is a mixed integer non-linear programming problem,… ▽ More

    Submitted 11 January, 2019; originally announced January 2019.

  48. arXiv:1701.06093  [pdf, other

    cs.DB

    INGESTBASE: A Declarative Data Ingestion System

    Authors: Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, Samuel Madden

    Abstract: Big data applications have fast arriving data that must be quickly ingested. At the same time, they have specific needs to preprocess and transform the data before it could be put to use. The current practice is to do these preparatory transformations once the data is already ingested, however, this is expensive to run and cumbersome to manage. As a result, there is a need to push data preprocessi… ▽ More

    Submitted 21 January, 2017; originally announced January 2017.

  49. arXiv:1412.5263  [pdf, other

    cs.DB

    Graph Analytics using the Vertica Relational Database

    Authors: Alekh Jindal, Samuel Madden, Malu Castellanos, Meichun Hsu

    Abstract: Graph analytics is becoming increasingly popular, with a deluge of new systems for graph analytics having been proposed in the past few years. These systems often start from the assumption that a new storage or query processing system is needed, in spite of graph data being often collected and stored in a relational database in the first place. In this paper, we study Vertica relational database a… ▽ More

    Submitted 17 December, 2014; originally announced December 2014.

    ACM Class: H.2.4

  50. arXiv:1208.0287  [pdf, other

    cs.DB

    Only Aggressive Elephants are Fast Elephants

    Authors: Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Stefan Richter, Stefan Schuh, Alekh Jindal, Jörg Schad

    Abstract: Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider's orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We mak… ▽ More

    Submitted 1 August, 2012; originally announced August 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 11, pp. 1591-1602 (2012)