subscribe to arXiv mailings

Image-GS: Content-Adaptive Image Representation via 2D Gaussians

Authors: Yunxiang Zhang, Alexandr Kuznetsov, Akshay Jindal, Kenneth Chen, Anton Sochenov, Anton Kaplanyan, Qi Sun

Abstract: Neural image representations have recently emerged as a promising technique for storing, streaming, and rendering visual data. Coupled with learning-based workflows, these novel representations have demonstrated remarkable visual fidelity and memory efficiency. However, existing neural image representations often rely on explicit uniform data structures without content adaptivity or computation-in… ▽ More Neural image representations have recently emerged as a promising technique for storing, streaming, and rendering visual data. Coupled with learning-based workflows, these novel representations have demonstrated remarkable visual fidelity and memory efficiency. However, existing neural image representations often rely on explicit uniform data structures without content adaptivity or computation-intensive implicit models, limiting their adoption in real-time graphics applications. Inspired by recent advances in radiance field rendering, we propose Image-GS, a content-adaptive image representation. Using anisotropic 2D Gaussians as the basis, Image-GS shows high memory efficiency, supports fast random access, and offers a natural level of detail stack. Leveraging a tailored differentiable renderer, Image-GS fits a target image by adaptively allocating and progressively optimizing a set of 2D Gaussians. The generalizable efficiency and fidelity of Image-GS are validated against several recent neural image representations and industry-standard texture compressors on a diverse set of images. Notably, its memory and computation requirements solely depend on and linearly scale with the number of 2D Gaussians, providing flexible controls over the trade-off between visual fidelity and run-time efficiency. We hope this research offers insights for developing new applications that require adaptive quality and resource control, such as machine perception, asset streaming, and content generation. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2405.11559 [pdf, ps, other]

DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning

Authors: Suyash Vardhan Mathur, Akshett Rai Jindal, Manish Shrivastava

Abstract: While significant work has been done in the field of NLP on vertical thinking, which involves primarily logical thinking, little work has been done towards lateral thinking, which involves looking at problems from an unconventional perspective and defying existing conceptions and notions. Towards this direction, SemEval 2024 introduces the task of BRAINTEASER, which involves two types of questions… ▽ More While significant work has been done in the field of NLP on vertical thinking, which involves primarily logical thinking, little work has been done towards lateral thinking, which involves looking at problems from an unconventional perspective and defying existing conceptions and notions. Towards this direction, SemEval 2024 introduces the task of BRAINTEASER, which involves two types of questions -- Sentence Puzzles and Word Puzzles that defy conventional common-sense reasoning and constraints. In this paper, we tackle both types of questions using few-shot prompting on GPT-3.5 and gain insights regarding the difference in the nature of the two types. Our prompting strategy placed us 26th on the leaderboard for the Sentence Puzzle and 15th on the Word Puzzle task. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2404.02088 [pdf, other]

LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task

Authors: Suyash Vardhan Mathur, Akshett Rai Jindal, Hardik Mittal, Manish Shrivastava

Abstract: Conversation is the most natural form of human communication, where each utterance can range over a variety of possible emotions. While significant work has been done towards the detection of emotions in text, relatively little work has been done towards finding the cause of the said emotions, especially in multimodal settings. SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis… ▽ More Conversation is the most natural form of human communication, where each utterance can range over a variety of possible emotions. While significant work has been done towards the detection of emotions in text, relatively little work has been done towards finding the cause of the said emotions, especially in multimodal settings. SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis in Conversations, which aims to extract emotions reflected in individual utterances in a conversation involving multiple modalities (textual, audio, and visual modalities) along with the corresponding utterances that were the cause for the emotion. In this paper, we propose models that tackle this task as an utterance labeling and a sequence labeling problem and perform a comparative study of these models, involving baselines using different encoders, using BiLSTM for adding contextual information of the conversation, and finally adding a CRF layer to try to model the inter-dependencies between adjacent utterances more effectively. In the official leaderboard for the task, our architecture was ranked 8th, achieving an F1-score of 0.1759 on the leaderboard. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.09827 [pdf, other]

FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images

Authors: Yiqing Shen, Jingxing Li, Xinyuan Shao, Blanca Inigo Romillo, Ankush Jindal, David Dreizin, Mathias Unberath

Abstract: Segment anything models (SAMs) are gaining attention for their zero-shot generalization capability in segmenting objects of unseen classes and in unseen domains when properly prompted. Interactivity is a key strength of SAMs, allowing users to iteratively provide prompts that specify objects of interest to refine outputs. However, to realize the interactive use of SAMs for 3D medical imaging tasks… ▽ More Segment anything models (SAMs) are gaining attention for their zero-shot generalization capability in segmenting objects of unseen classes and in unseen domains when properly prompted. Interactivity is a key strength of SAMs, allowing users to iteratively provide prompts that specify objects of interest to refine outputs. However, to realize the interactive use of SAMs for 3D medical imaging tasks, rapid inference times are necessary. High memory requirements and long processing delays remain constraints that hinder the adoption of SAMs for this purpose. Specifically, while 2D SAMs applied to 3D volumes contend with repetitive computation to process all slices independently, 3D SAMs suffer from an exponential increase in model parameters and FLOPS. To address these challenges, we present FastSAM3D which accelerates SAM inference to 8 milliseconds per 128*128*128 3D volumetric image on an NVIDIA A100 GPU. This speedup is accomplished through 1) a novel layer-wise progressive distillation scheme that enables knowledge transfer from a complex 12-layer ViT-B to a lightweight 6-layer ViT-Tiny variant encoder without training from scratch; and 2) a novel 3D sparse flash attention to replace vanilla attention operators, substantially reducing memory needs and improving parallelization. Experiments on three diverse datasets reveal that FastSAM3D achieves a remarkable speedup of 527.38x compared to 2D SAMs and 8.75x compared to 3D SAMs on the same volumes without significant performance decline. Thus, FastSAM3D opens the door for low-cost truly interactive SAM-based 3D medical imaging segmentation with commonly used GPU hardware. Code is available at https://github.com/arcadelab/FastSAM3D. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.02247 [pdf, ps, other]

Birbal: An efficient 7B instruct-model fine-tuned with curated datasets

Authors: Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh

Abstract: LLMOps incur significant costs due to hardware requirements, hindering their widespread accessibility. Additionally, a lack of transparency in model training methods and data contributes to the majority of models being non-reproducible. To tackle these challenges, the LLM Efficiency Challenge was introduced at NeurIPS Workshop, aiming to adapt foundation models on a diverse set of tasks via fine-t… ▽ More LLMOps incur significant costs due to hardware requirements, hindering their widespread accessibility. Additionally, a lack of transparency in model training methods and data contributes to the majority of models being non-reproducible. To tackle these challenges, the LLM Efficiency Challenge was introduced at NeurIPS Workshop, aiming to adapt foundation models on a diverse set of tasks via fine-tuning on a single GPU (RTX 4090 or A100 with 40GB) within a 24-hour timeframe. In this system description paper, we introduce Birbal, our Mistral-7B based winning model, fine-tuned on a single RTX 4090 for 16 hours. Birbal's success lies in curating high-quality instructions covering diverse tasks, resulting in a 35% performance improvement over second-best Qwen-14B based submission. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2401.03723 [pdf, other]

doi 10.1145/3639308

Sibyl: Forecasting Time-Evolving Query Workloads

Authors: Hanxian Huang, Tarique Siddiqui, Rana Alotaibi, Carlo Curino, Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesus Camacho-Rodriguez, Yuanyuan Tian

Abstract: Database systems often rely on historical query traces to perform workload-based performance tuning. However, real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. To address this challenge, we propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries, with the entire query stat… ▽ More Database systems often rely on historical query traces to perform workload-based performance tuning. However, real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. To address this challenge, we propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries, with the entire query statements, in various prediction windows. Drawing insights from real-workloads, we propose template-based featurization techniques and develop a stacked-LSTM with an encoder-decoder architecture for accurate forecasting of query workloads. We also develop techniques to improve forecasting accuracy over large prediction windows and achieve high scalability over large workloads with high variability in arrival rates of queries. Finally, we propose techniques to handle workload drifts. Our evaluation on four real workloads demonstrates that SIBYL can forecast workloads with an $87.3\%$ median F1 score, and can result in $1.7\times$ and $1.3\times$ performance improvement when applied to materialized view selection and index selection applications, respectively. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: The paper has been accepted by SIGMOD 2024

arXiv:2401.01280 [pdf, other]

doi 10.1145/3626710

GEqO: ML-Accelerated Semantic Equivalence Detection

Authors: Brandon Haynes, Rana Alotaibi, Anna Pavlenko, Jyoti Leeka, Alekh Jindal, Yuanyuan Tian

Abstract: Large scale analytics engines have become a core dependency for modern data-driven enterprises to derive business insights and drive actions. These engines support a large number of analytic jobs processing huge volumes of data on a daily basis, and workloads are often inundated with overlapping computations across multiple jobs. Reusing common computation is crucial for efficient cluster resource… ▽ More Large scale analytics engines have become a core dependency for modern data-driven enterprises to derive business insights and drive actions. These engines support a large number of analytic jobs processing huge volumes of data on a daily basis, and workloads are often inundated with overlapping computations across multiple jobs. Reusing common computation is crucial for efficient cluster resource utilization and reducing job execution time. Detecting common computation is the first and key step for reducing this computational redundancy. However, detecting equivalence on large-scale analytics engines requires efficient and scalable solutions that are fully automated. In addition, to maximize computation reuse, equivalence needs to be detected at the semantic level instead of just the syntactic level (i.e., the ability to detect semantic equivalence of seemingly different-looking queries). Unfortunately, existing solutions fall short of satisfying these requirements. In this paper, we take a major step towards filling this gap by proposing GEqO, a portable and lightweight machine-learning-based framework for efficiently identifying semantically equivalent computations at scale. GEqO introduces two machine-learning-based filters that quickly prune out nonequivalent subexpressions and employs a semi-supervised learning feedback loop to iteratively improve its model with an intelligent sampling mechanism. Further, with its novel database-agnostic featurization method, GEqO can transfer the learning from one workload and database to another. Our extensive empirical evaluation shows that, on TPC-DS-like queries, GEqO yields significant performance gains-up to 200x faster than automated verifiers-and finds up to 2x more equivalences than optimizer and signature-based equivalence detection approaches. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Journal ref: Proceedings of the ACM on Management of Data (2024) Volume 1 Issue 4

arXiv:2312.15272 [pdf, other]

Detecting anxiety from short clips of free-form speech

Authors: Prabhat Agarwal, Akshat Jindal, Shreya Singh

Abstract: Barriers to accessing mental health assessments including cost and stigma continues to be an impediment in mental health diagnosis and treatment. Machine learning approaches based on speech samples could help in this direction. In this work, we develop machine learning solutions to diagnose anxiety disorders from audio journals of patients. We work on a novel anxiety dataset (provided through coll… ▽ More Barriers to accessing mental health assessments including cost and stigma continues to be an impediment in mental health diagnosis and treatment. Machine learning approaches based on speech samples could help in this direction. In this work, we develop machine learning solutions to diagnose anxiety disorders from audio journals of patients. We work on a novel anxiety dataset (provided through collaboration with Kintsugi Mindful Wellness Inc.) and experiment with several models of varying complexity utilizing audio, text and a combination of multiple modalities. We show that the multi-modal and audio embeddings based approaches achieve good performance in the task achieving an AUC ROC score of 0.68-0.69. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.02957 [pdf, other]

Classification for everyone : Building geography agnostic models for fairer recognition

Authors: Akshat Jindal, Shreya Singh, Soham Gadgil

Abstract: In this paper, we analyze different methods to mitigate inherent geographical biases present in state of the art image classification models. We first quantitatively present this bias in two datasets - The Dollar Street Dataset and ImageNet, using images with location information. We then present different methods which can be employed to reduce this bias. Finally, we analyze the effectiveness of… ▽ More In this paper, we analyze different methods to mitigate inherent geographical biases present in state of the art image classification models. We first quantitatively present this bias in two datasets - The Dollar Street Dataset and ImageNet, using images with location information. We then present different methods which can be employed to reduce this bias. Finally, we analyze the effectiveness of the different techniques on making these models more robust to geographical locations of the images. △ Less

Submitted 2 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: typos corrected, references added

arXiv:2311.04588 [pdf, other]

Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection

Authors: Akshit Jindal, Vikram Goyal, Saket Anand, Chetan Arora

Abstract: Machine Learning (ML) models become vulnerable to Model Stealing Attacks (MSA) when they are deployed as a service. In such attacks, the deployed model is queried repeatedly to build a labelled dataset. This dataset allows the attacker to train a thief model that mimics the original model. To maximize query efficiency, the attacker has to select the most informative subset of data points from the… ▽ More Machine Learning (ML) models become vulnerable to Model Stealing Attacks (MSA) when they are deployed as a service. In such attacks, the deployed model is queried repeatedly to build a labelled dataset. This dataset allows the attacker to train a thief model that mimics the original model. To maximize query efficiency, the attacker has to select the most informative subset of data points from the pool of available data. Existing attack strategies utilize approaches like Active Learning and Semi-Supervised learning to minimize costs. However, in the black-box setting, these approaches may select sub-optimal samples as they train only one thief model. Depending on the thief model's capacity and the data it was pretrained on, the model might even select noisy samples that harm the learning process. In this work, we explore the usage of an ensemble of deep learning models as our thief model. We call our attack Army of Thieves(AOT) as we train multiple models with varying complexities to leverage the crowd's wisdom. Based on the ensemble's collective decision, uncertain samples are selected for querying, while the most confident samples are directly included in the training data. Our approach is the first one to utilize an ensemble of thief models to perform model extraction. We outperform the base approaches of existing state-of-the-art methods by at least 3% and achieve a 21% higher adversarial sample transferability than previous work for models trained on the CIFAR-10 dataset. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 10 pages, 5 figures, paper accepted to WACV 2024

arXiv:2311.00176 [pdf, other]

ChipNeMo: Domain-Adapted LLMs for Chip Design

Authors: Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Ankit Jindal, Brucek Khailany, George Kokai , et al. (17 additional authors not shown)

Abstract: ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We e… ▽ More ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our evaluations demonstrate that domain-adaptive pretraining of language models, can lead to superior performance in domain related downstream tasks compared to their base LLaMA2 counterparts, without degradations in generic capabilities. In particular, our largest model, ChipNeMo-70B, outperforms the highly capable GPT-4 on two of our use cases, namely engineering assistant chatbot and EDA scripts generation, while exhibiting competitive performance on bug summarization and analysis. These results underscore the potential of domain-specific customization for enhancing the effectiveness of large language models in specialized applications. △ Less

Submitted 4 April, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: Updated results for ChipNeMo-70B model

arXiv:2308.14089 [pdf, other]

MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records

Authors: Scott L. Fleming, Alejandro Lozano, William J. Haberkorn, Jenelle A. Jindal, Eduardo P. Reis, Rahul Thapa, Louis Blankemeier, Julian Z. Genkins, Ethan Steinberg, Ashwin Nayak, Birju S. Patel, Chia-Chun Chiang, Alison Callahan, Zepeng Huo, Sergios Gatidis, Scott J. Adams, Oluseyi Fayanju, Shreya J. Shah, Thomas Savage, Ethan Goh, Akshay S. Chaudhari, Nima Aghaeepour, Christopher Sharp, Michael A. Pfeffer, Percy Liang , et al. (5 additional authors not shown)

Abstract: The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture… ▽ More The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs. We used MedAlign to evaluate 6 general domain LLMs, having clinicians rank the accuracy and quality of each LLM response. We found high error rates, ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct), and an 8.3% drop in accuracy moving from 32k to 2k context lengths for GPT-4. Finally, we report correlations between clinician rankings and automated natural language generation metrics as a way to rank LLMs without human review. We make MedAlign available under a research data use agreement to enable LLM evaluations on tasks aligned with clinician needs and preferences. △ Less

Submitted 24 December, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

arXiv:2301.03982 [pdf, other]

doi 10.1145/3572848.3577436

Exploring the Use of WebAssembly in HPC

Authors: Mohak Chadha, Nils Krueger, Jophin John, Anshul Jindal, Michael Gerndt, Shajulin Benedict

Abstract: Containerization approaches based on namespaces offered by the Linux kernel have seen an increasing popularity in the HPC community both as a means to isolate applications and as a format to package and distribute them. However, their adoption and usage in HPC systems faces several challenges. These include difficulties in unprivileged running and building of scientific application container image… ▽ More Containerization approaches based on namespaces offered by the Linux kernel have seen an increasing popularity in the HPC community both as a means to isolate applications and as a format to package and distribute them. However, their adoption and usage in HPC systems faces several challenges. These include difficulties in unprivileged running and building of scientific application container images directly on HPC resources, increasing heterogeneity of HPC architectures, and access to specialized networking libraries available only on HPC systems. These challenges of container-based HPC application development closely align with the several advantages that a new universal intermediate binary format called WebAssembly (Wasm) has to offer. These include a lightweight userspace isolation mechanism and portability across operating systems and processor architectures. In this paper, we explore the usage of Wasm as a distribution format for MPI-based HPC applications. To this end, we present MPIWasm, a novel Wasm embedder for MPI-based HPC applications that enables high-performance execution of Wasm code, has low-overhead for MPI calls, and supports high-performance networking interconnects present on HPC systems. We evaluate the performance and overhead of MPIWasm on a production HPC system and AWS Graviton2 nodes using standardized HPC benchmarks. Results from our experiments demonstrate that MPIWasm delivers competitive native application performance across all scenarios. Moreover, we observe that Wasm binaries are 139.5x smaller on average as compared to the statically-linked binaries for the different standardized benchmarks. △ Less

Submitted 10 January, 2023; originally announced January 2023.

Comments: ACM SIGPLAN PPoPP 2023

arXiv:2211.05739 [pdf, other]

doi 10.1109/BigData55660.2022.10021037

FedLesScan: Mitigating Stragglers in Serverless Federated Learning

Authors: Mohamed Elzohairy, Mohak Chadha, Anshul Jindal, Andreas Grafberger, Jianfeng Gu, Michael Gerndt, Osama Abboud

Abstract: Federated Learning (FL) is a machine learning paradigm that enables the training of a shared global model across distributed clients while keeping the training data local. While most prior work on designing systems for FL has focused on using stateful always running components, recent work has shown that components in an FL system can greatly benefit from the usage of serverless computing and Func… ▽ More Federated Learning (FL) is a machine learning paradigm that enables the training of a shared global model across distributed clients while keeping the training data local. While most prior work on designing systems for FL has focused on using stateful always running components, recent work has shown that components in an FL system can greatly benefit from the usage of serverless computing and Function-as-a-Service technologies. To this end, distributed training of models with serverless FL systems can be more resource-efficient and cheaper than conventional FL systems. However, serverless FL systems still suffer from the presence of stragglers, i.e., slow clients due to their resource and statistical heterogeneity. While several strategies have been proposed for mitigating stragglers in FL, most methodologies do not account for the particular characteristics of serverless environments, i.e., cold-starts, performance variations, and the ephemeral stateless nature of the function instances. Towards this, we propose FedLesScan, a novel clustering-based semi-asynchronous training strategy, specifically tailored for serverless FL. FedLesScan dynamically adapts to the behaviour of clients and minimizes the effect of stragglers on the overall system. We implement our strategy by extending an open-source serverless FL system called FedLess. Moreover, we comprehensively evaluate our strategy using the 2nd generation Google Cloud Functions with four datasets and varying percentages of stragglers. Results from our experiments show that compared to other approaches FedLesScan reduces training time and cost by an average of 8% and 20% respectively while utilizing clients better with an average increase in the effective update ratio of 17.75%. △ Less

Submitted 28 November, 2022; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: IEEE BigData 2022

arXiv:2210.13625 [pdf, other]

doi 10.1145/3514221.3526052

Deploying a Steered Query Optimizer in Production at Microsoft

Authors: Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal

Abstract: Modern analytical workloads are highly heterogeneous and massively complex, making generic query optimizers untenable for many customers and scenarios. As a result, it is important to specialize these optimizers to instances of the workloads. In this paper, we continue a recent line of work in steering a query optimizer towards better plans for a given workload, and make major strides in pushing p… ▽ More Modern analytical workloads are highly heterogeneous and massively complex, making generic query optimizers untenable for many customers and scenarios. As a result, it is important to specialize these optimizers to instances of the workloads. In this paper, we continue a recent line of work in steering a query optimizer towards better plans for a given workload, and make major strides in pushing previous research ideas to production deployment. Along the way we solve several operational challenges including, making steering actions more manageable, keeping the costs of steering within budget, and avoiding unexpected performance regressions in production. Our resulting system, QQ-advisor, essentially externalizes the query planner to a massive offline pipeline for better exploration and specialization. We discuss various aspects of our design and show detailed results over production SCOPE workloads at Microsoft, where the system is currently enabled by default. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Journal ref: Proceedings of the 2022 International Conference on Management of Data 2022 Jun 10 (pp. 2299-2311)

arXiv:2210.04212 [pdf, other]

Migrating from Microservices to Serverless: An IoT Platform Case Study

Authors: Mohak Chadha, Victor Pacyna, Anshul Jindal, Jianfeng Gu, Michael Gerndt

Abstract: Microservice architecture is the common choice for developing cloud applications these days since each individual microservice can be independently modified, replaced, and scaled. As a result, application development and operating cloud infrastructure were bundled together into what is now commonly called DevOps. However, with the increasing popularity of the serverless computing paradigm and its… ▽ More Microservice architecture is the common choice for developing cloud applications these days since each individual microservice can be independently modified, replaced, and scaled. As a result, application development and operating cloud infrastructure were bundled together into what is now commonly called DevOps. However, with the increasing popularity of the serverless computing paradigm and its several advantages such as no infrastructure management, a pay-per-use billing policy, and on-demand fine-grained autoscaling, there is a growing interest in utilizing FaaS and serverless CaaS technologies for refactoring microservices-based applications. Towards this, we migrate a complex IoT platform application onto OpenWhisk (OW) and Google Cloud Run (GCR). We comprehensively evaluate the performance of the different deployment strategies, i.e., Google Kubernetes Engine (GKE)-Standard, OW, and GCR for the IoT platform using different load testing scenarios. Results from our experiments show that while GKE standard performs best for most scenarios, GCR is always cheaper wrt costs. △ Less

Submitted 9 October, 2022; originally announced October 2022.

Comments: ACM International Workshop on Serverless Computing 2022 (WoSC@Middleware 2022)

arXiv:2207.06811 [pdf, other]

Bunk8s: Enabling Easy Integration Testing of Microservices in Kubernetes

Authors: Christoph Reile, Mohak Chadha, Valentin Hauner, Anshul Jindal, Benjamin Hofmann, Michael Gerndt

Abstract: Microservice architecture is the common choice for cloud applications these days since each individual microservice can be independently modified, replaced, and scaled. However, the complexity of microservice applications requires automated testing with a focus on the interactions between the services. While this is achievable with end-to-end tests, they are error-prone, brittle, expensive to writ… ▽ More Microservice architecture is the common choice for cloud applications these days since each individual microservice can be independently modified, replaced, and scaled. However, the complexity of microservice applications requires automated testing with a focus on the interactions between the services. While this is achievable with end-to-end tests, they are error-prone, brittle, expensive to write, time-consuming to run, and require the entire application to be deployed. Integration tests are an alternative to end-to-end tests since they have a smaller test scope and require the deployment of a significantly fewer number of services. The de-facto standard for deploying microservice applications in the cloud is containers with Kubernetes being the most widely used container orchestration platform. To support the integration testing of microservices in Kubernetes, several tools such as Octopus, Istio, and Jenkins exist. However, each of these tools either lack crucial functionality or lead to a substantial increase in the complexity and growth of the tool landscape when introduced into a project. To this end, we present \emph{Bunk8s}, a tool for integration testing of microservice applications in Kubernetes that overcomes the limitations of these existing tools. \emph{Bunk8s} is independent of the test framework used for writing integration tests, independent of the used CI/CD infrastructure, and supports test result publishing. A video demonstrating the functioning of our tool is available from \url{https://www.youtube.com/watch?v=e8wbS25O4Bo}. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

arXiv:2207.06183 [pdf, other]

SLAM: SLO-Aware Memory Optimization for Serverless Applications

Authors: Gor Safaryan, Anshul Jindal, Mohak Chadha, Michael Gerndt

Abstract: Serverless computing paradigm has become more ingrained into the industry, as it offers a cheap alternative for application development and deployment. This new paradigm has also created new kinds of problems for the developer, who needs to tune memory configurations for balancing cost and performance. Many researchers have addressed the issue of minimizing cost and meeting Service Level Objective… ▽ More Serverless computing paradigm has become more ingrained into the industry, as it offers a cheap alternative for application development and deployment. This new paradigm has also created new kinds of problems for the developer, who needs to tune memory configurations for balancing cost and performance. Many researchers have addressed the issue of minimizing cost and meeting Service Level Objective (SLO) requirements for a single FaaS function, but there has been a gap for solving the same problem for an application consisting of many FaaS functions, creating complex application workflows. In this work, we designed a tool called SLAM to address the issue. SLAM uses distributed tracing to detect the relationship among the FaaS functions within a serverless application. By modeling each of them, it estimates the execution time for the application at different memory configurations. Using these estimations, SLAM determines the optimal memory configuration for the given serverless application based on the specified SLO requirements and user-specified objectives (minimum cost or minimum execution time). We demonstrate the functionality of SLAM on AWS Lambda by testing on four applications. Our results show that the suggested memory configurations guarantee that more than 95% of requests are completed within the predefined SLOs. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: This is the preprint version of the accepted paper at IEEE CLOUD'22

arXiv:2205.11582 [pdf, other]

doi 10.5220/0011080300003200

Scalable Infrastructure for Workload Characterization of Cluster Traces

Authors: Thomas van Loo, Anshul Jindal, Shajulin Benedict, Mohak Chadha, Michael Gerndt

Abstract: In the recent past, characterizing workloads has been attempted to gain a foothold in the emerging serverless cloud market, especially in the large production cloud clusters of Google, AWS, and so forth. While analyzing and characterizing real workloads from a large production cloud cluster benefits cloud providers, researchers, and daily users, analyzing the workload traces of these clusters has… ▽ More In the recent past, characterizing workloads has been attempted to gain a foothold in the emerging serverless cloud market, especially in the large production cloud clusters of Google, AWS, and so forth. While analyzing and characterizing real workloads from a large production cloud cluster benefits cloud providers, researchers, and daily users, analyzing the workload traces of these clusters has been an arduous task due to the heterogeneous nature of data. This article proposes a scalable infrastructure based on Google's dataproc for analyzing the workload traces of cloud environments. We evaluated the functioning of the proposed infrastructure using the workload traces of Google cloud cluster-usage-traces-v3. We perform the workload characterization on this dataset, focusing on the heterogeneity of the workload, the variations in job durations, aspects of resources consumption, and the overall availability of resources provided by the cluster. The findings reported in the paper will be beneficial for cloud infrastructure providers and users while managing the cloud computing resources, especially serverless platforms. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: 9 pages, CLOSER 2022

Journal ref: In Proceedings of the 12th International Conference on Cloud Computing and Services Science - CLOSER 2022, ISBN 978-989-758-570-8; ISSN 2184-5042, pages 254-263

arXiv:2205.10519 [pdf, other]

Impact of Multiple Fully-Absorbing Receivers in Molecular Communications

Authors: Nithin V. Sabu, Abhishek K. Gupta, Neeraj Varshney, Anshuman Jindal

Abstract: Molecular communication is a promising solution to enable intra-body communications among nanomachines. However, malicious and non-cooperative receivers can degrade the performance, compromising these systems' security. Analyzing the communication and security performance of these systems requires accurate channel models. However, such models are not present in the literature. In this work, we dev… ▽ More Molecular communication is a promising solution to enable intra-body communications among nanomachines. However, malicious and non-cooperative receivers can degrade the performance, compromising these systems' security. Analyzing the communication and security performance of these systems requires accurate channel models. However, such models are not present in the literature. In this work, we develop an analytical framework to derive the hitting probability of a molecule on a fully absorbing receiver (FAR) in the presence of other FARs, which can be either be cooperative or malicious. We first present an approximate hitting probability expression for the 3-FARs case. A simplified expression is obtained for the case when FARs are symmetrically positioned. Using the derived expressions, we study the impact of malicious receivers on the intended receiver and discuss how to minimize this impact to obtain a secure communication channel. We also study the gain that can be obtained by the cooperation of these FARs. We then present an approach to extend the analysis for a system with N FARs. The derived expressions can be used to analyze and design multiple input/output and secure molecular communication systems. △ Less

Submitted 21 May, 2022; originally announced May 2022.

arXiv:2203.03541 [pdf, other]

Fairness for Text Classification Tasks with Identity Information Data Augmentation Methods

Authors: Mohit Wadhwa, Mohan Bhambhani, Ashvini Jindal, Uma Sawant, Ramanujam Madhavan

Abstract: Counterfactual fairness methods address the question: How would the prediction change if the sensitive identity attributes referenced in the text instance were different? These methods are entirely based on generating counterfactuals for the given training and test set instances. Counterfactual instances are commonly prepared by replacing sensitive identity terms, i.e., the identity terms present… ▽ More Counterfactual fairness methods address the question: How would the prediction change if the sensitive identity attributes referenced in the text instance were different? These methods are entirely based on generating counterfactuals for the given training and test set instances. Counterfactual instances are commonly prepared by replacing sensitive identity terms, i.e., the identity terms present in the instance are replaced with other identity terms that fall under the same sensitive category. Therefore, the efficacy of these methods depends heavily on the quality and comprehensiveness of identity pairs. In this paper, we offer a two-step data augmentation process where (1) the former stage consists of a novel method for preparing a comprehensive list of identity pairs with word embeddings, and (2) the latter consists of leveraging prepared identity pairs list to enhance the training instances by applying three simple operations (namely identity pair replacement, identity term blindness, and identity pair swap). We empirically show that the two-stage augmentation process leads to diverse identity pairs and an enhanced training set, with an improved counterfactual token-based fairness metric score on two well-known text classification tasks. △ Less

Submitted 4 February, 2022; originally announced March 2022.

arXiv:2201.11454 [pdf, other]

Estimating the Capacities of Function-as-a-Service Functions

Authors: Anshul Jindal, Mohak Chadha, Shajulin Benedict, Michael Gerndt

Abstract: Serverless computing is a cloud computing paradigm that allows developers to focus exclusively on business logic as cloud service providers manage resource management tasks. Serverless applications follow this model, where the application is decomposed into a set of fine-grained Function-as-a-Service (FaaS) functions. However, the obscurities of the underlying system infrastructure and dependencie… ▽ More Serverless computing is a cloud computing paradigm that allows developers to focus exclusively on business logic as cloud service providers manage resource management tasks. Serverless applications follow this model, where the application is decomposed into a set of fine-grained Function-as-a-Service (FaaS) functions. However, the obscurities of the underlying system infrastructure and dependencies between FaaS functions within the application pose a challenge for estimating the performance of FaaS functions. To characterize the performance of a FaaS function that is relevant for the user, we define Function Capacity (FC) as the maximal number of concurrent invocations the function can serve in a time without violating the Service-Level Objective (SLO). The paper addresses the challenge of quantifying the FC individually for each FaaS function within a serverless application. This challenge is addressed by sandboxing a FaaS function and building its performance model. To this end, we develop FnCapacitor - an end-to-end automated Function Capacity estimation tool. We demonstrate the functioning of our tool on Google Cloud Functions (GCF) and AWS Lambda. FnCapacitor estimates the FCs on different deployment configurations (allocated memory & maximum function instances) by conducting time-framed load tests and building various models using statistical: linear, ridge, and polynomial regression, and Deep Neural Network (DNN) methods on the acquired performance data. Our evaluation of different FaaS functions shows relatively accurate predictions, with an accuracy greater than 75% using DNN for both cloud providers. △ Less

Submitted 27 January, 2022; originally announced January 2022.

Comments: 8 pages, Accepted at CloudAM'21 Workshop (UCC)

arXiv:2201.05232 [pdf, other]

FARSI: Facebook AR System Investigator for Agile Domain-Specific System-on-Chip Exploration

Authors: Behzad Boroujerdian, Ying Jing, Amit Kumar, Lavanya Subramanian, Luke Yen, Vincent Lee, Vivek Venkatesan, Amit Jindal, Robert Shearer, Vijay Janapa Reddi

Abstract: Domain-specific SoCs (DSSoCs) are attractive solutions for domains with stringent power/performance/area constraints; however, they suffer from two fundamental complexities. On the one hand, their many specialized hardware blocks result in complex systems and thus high development effort. On the other, their many system knobs expand the complexity of design space, making the search for the optimal… ▽ More Domain-specific SoCs (DSSoCs) are attractive solutions for domains with stringent power/performance/area constraints; however, they suffer from two fundamental complexities. On the one hand, their many specialized hardware blocks result in complex systems and thus high development effort. On the other, their many system knobs expand the complexity of design space, making the search for the optimal design difficult. Thus to reach prevalence, taming such complexities is necessary. This work identifies necessary features of an early-stage design space exploration (DSE) framework that targets the complex design space of DSSoCs and further provides an instance of one called FARSI, (F)acebook (AR) (S)ystem (I)nvestigator. Concretely, FARSI provides an agile system-level simulator with speed up and accuracy of 8,400X and 98.5% comparing to Synopsys Platform Architect. FARSI also provides an efficient exploration heuristic and achieves up to 16X improvementin convergence time comparing to naive simulated annealing (SA). This is done by augmenting SA with architectural reasoning such as locality exploitation and bottleneck relaxation. Furthermore, we embed various co-design capabilities and show that on average, they have a 32% impact on the convergence rate. Finally, we demonstrate that using simple development-cost-aware policies can lower the system complexity, both in terms of the component count and variation by as much as 150% and 118% (e,g., for Network-on-a-Chip subsystem) △ Less

Submitted 17 January, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

arXiv:2112.09549 [pdf, other]

Channel Characterization and Performance of a 3-D Molecular Communication System with Multiple Fully-Absorbing Receivers

Authors: Nithin V. Sabu, Abhishek K. Gupta, Neeraj Varshney, Anshuman Jindal

Abstract: Molecular communication (MC) can enable the transfer of information between nanomachines using molecules as the information carrier. In MC systems, multiple receiver nanomachines often co-exist in the same communication channel to serve common or different purposes. However, the analytical channel model for a system with multiple fully absorbing receivers (FARs) does not exist in the literature, w… ▽ More Molecular communication (MC) can enable the transfer of information between nanomachines using molecules as the information carrier. In MC systems, multiple receiver nanomachines often co-exist in the same communication channel to serve common or different purposes. However, the analytical channel model for a system with multiple fully absorbing receivers (FARs) does not exist in the literature, which is significantly different from the single FAR system due to the mutual influence of FARs. The analytical channel model is essential in analyzing systems with multiple FARs, including MIMO, SIMO, and cognitive molecular communication systems. In this work, we derive an approximate analytical expression for the hitting probability of a molecule emitted from a point source on each FAR on a diffusion-based MC system with three or more FARs. Using these expressions, we derive the channel model for a SIMO system with a single transmitter and multiple FARs arranged in a uniform circular array (UCA). We then analyze the communication performance of this SIMO system under different cooperative receiver schemes and develop several interesting insights. △ Less

Submitted 6 December, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

arXiv:2112.08572 [pdf, other]

Predictive Price-Performance Optimization for Serverless Query Processing

Authors: Rathijit Sen, Abhishek Roy, Alekh Jindal

Abstract: We present an efficient, parametric modeling framework for predictive resource allocations, focusing on the amount of computational resources, that can optimize for a range of price-performance objectives for data analytics in serverless query processing settings. We discuss and evaluate in depth how our system, AutoExecutor, can use this framework to automatically select near-optimal executor and… ▽ More We present an efficient, parametric modeling framework for predictive resource allocations, focusing on the amount of computational resources, that can optimize for a range of price-performance objectives for data analytics in serverless query processing settings. We discuss and evaluate in depth how our system, AutoExecutor, can use this framework to automatically select near-optimal executor and core counts for Spark SQL queries running on Azure Synapse. Our techniques improve upon Spark's in-built, reactive, dynamic executor allocation capabilities by substantially reducing the total executors allocated and executor occupancy while running queries, thereby freeing up executors that can potentially be used by other concurrent queries or in reducing the overall cluster provisioning needs. In contrast with post-execution analysis tools such as Sparklens, we predict resource allocations for queries before executing them and can also account for changes in input data sizes for predicting the desired allocations. △ Less

Submitted 15 December, 2021; originally announced December 2021.

arXiv:2111.11052 [pdf, other]

IAD: Indirect Anomalous VMMs Detection in the Cloud-based Environment

Authors: Anshul Jindal, Ilya Shakhat, Jorge Cardoso, Michael Gerndt, Vladimir Podolskiy

Abstract: Server virtualization in the form of virtual machines (VMs) with the use of a hypervisor or a Virtual Machine Monitor (VMM) is an essential part of cloud computing technology to provide infrastructure-as-a-service (IaaS). A fault or an anomaly in the VMM can propagate to the VMs hosted on it and ultimately affect the availability and reliability of the applications running on those VMs. Therefore,… ▽ More Server virtualization in the form of virtual machines (VMs) with the use of a hypervisor or a Virtual Machine Monitor (VMM) is an essential part of cloud computing technology to provide infrastructure-as-a-service (IaaS). A fault or an anomaly in the VMM can propagate to the VMs hosted on it and ultimately affect the availability and reliability of the applications running on those VMs. Therefore, identifying and eventually resolving it quickly is highly important. However, anomalous VMM detection is a challenge in the cloud environment since the user does not have access to the VMM. This paper addresses this challenge of anomalous VMM detection in the cloud-based environment without having any knowledge or data from VMM by introducing a novel machine learning-based algorithm called IAD: Indirect Anomalous VMMs Detection. This algorithm solely uses the VM's resources utilization data hosted on those VMMs for the anomalous VMMs detection. The developed algorithm's accuracy was tested on four datasets comprising the synthetic and real and compared against four other popular algorithms, which can also be used to the described problem. It was found that the proposed IAD algorithm has an average F1-score of 83.7% averaged across four datasets, and also outperforms other algorithms by an average F1-score of 11\%. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: Accepted at AIOps 2021 workshop (ICSOC 2021)

arXiv:2111.10690 [pdf, other]

Network Graph Generation through Adaptive Clustering and Infection Dynamics: A Step Towards Global Connectivity

Authors: Aniq Ur Rahman, Fares Fourati, Khac-Hoang Ngo, Anish Jindal, Mohamed-Slim Alouini

Abstract: More than 40% of the world's population is not connected to the internet, majorly due to the lack of adequate infrastructure. Our work aims to bridge this digital divide by proposing solutions for network deployment in remote areas. Specifically, a number of access points (APs) are deployed as an interface between the users and backhaul nodes (BNs). The main challenges include designing the number… ▽ More More than 40% of the world's population is not connected to the internet, majorly due to the lack of adequate infrastructure. Our work aims to bridge this digital divide by proposing solutions for network deployment in remote areas. Specifically, a number of access points (APs) are deployed as an interface between the users and backhaul nodes (BNs). The main challenges include designing the number and location of the APs, and connecting them to the BNs. In order to address these challenges, we first propose a metric called connectivity ratio to assess the quality of the deployment. Next, we propose an agile search algorithm to determine the number of APs that maximizes this metric and perform clustering to find the optimal locations of the APs. Furthermore, we propose a novel algorithm inspired by infection dynamics to connect all the deployed APs to the existing BNs economically. To support the existing terrestrial BNs, we investigate the deployment of non-terrestrial BNs, which further improves the network performance in terms of average hop count, traffic distribution, and backhaul length. Finally, we use real datasets from a remote village to test our solution. △ Less

Submitted 20 November, 2021; originally announced November 2021.

Comments: 6 pages, 8 figures, 2 algorithms

arXiv:2111.03396 [pdf, other]

FedLess: Secure and Scalable Federated Learning Using Serverless Computing

Authors: Andreas Grafberger, Mohak Chadha, Anshul Jindal, Jianfeng Gu, Michael Gerndt

Abstract: The traditional cloud-centric approach for Deep Learning (DL) requires training data to be collected and processed at a central server which is often challenging in privacy-sensitive domains like healthcare. Towards this, a new learning paradigm called Federated Learning (FL) has been proposed that brings the potential of DL to these domains while addressing privacy and data ownership issues. FL e… ▽ More The traditional cloud-centric approach for Deep Learning (DL) requires training data to be collected and processed at a central server which is often challenging in privacy-sensitive domains like healthcare. Towards this, a new learning paradigm called Federated Learning (FL) has been proposed that brings the potential of DL to these domains while addressing privacy and data ownership issues. FL enables remote clients to learn a shared ML model while keeping the data local. However, conventional FL systems face several challenges such as scalability, complex infrastructure management, and wasted compute and incurred costs due to idle clients. These challenges of FL systems closely align with the core problems that serverless computing and Function-as-a-Service (FaaS) platforms aim to solve. These include rapid scalability, no infrastructure management, automatic scaling to zero for idle clients, and a pay-per-use billing model. To this end, we present a novel system and framework for serverless FL, called FedLess. Our system supports multiple commercial and self-hosted FaaS providers and can be deployed in the cloud, on-premise in institutional data centers, and on edge devices. To the best of our knowledge, we are the first to enable FL across a large fabric of heterogeneous FaaS providers while providing important features like security and Differential Privacy. We demonstrate with comprehensive experiments that the successful training of DNNs for different tasks across up to 200 client functions and more is easily possible using our system. Furthermore, we demonstrate the practical viability of our methodology by comparing it against a traditional FL system and show that it can be cheaper and more resource-efficient. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: IEEE BigData 2021

arXiv:2110.02313 [pdf, other]

doi 10.14778/3476249.3476298

Phoebe: A Learning-based Checkpoint Optimizer

Authors: Yiwen Zhu, Matteo Interlandi, Abhishek Roy, Krishnadhan Das, Hiren Patel, Malay Bag, Hitesh Sharma, Alekh Jindal

Abstract: Easy-to-use programming interfaces paired with cloud-scale processing engines have enabled big data system users to author arbitrarily complex analytical jobs over massive volumes of data. However, as the complexity and scale of analytical jobs increase, they encounter a number of unforeseen problems, hotspots with large intermediate data on temporary storage, longer job recovery time after failur… ▽ More Easy-to-use programming interfaces paired with cloud-scale processing engines have enabled big data system users to author arbitrarily complex analytical jobs over massive volumes of data. However, as the complexity and scale of analytical jobs increase, they encounter a number of unforeseen problems, hotspots with large intermediate data on temporary storage, longer job recovery time after failures, and worse query optimizer estimates being examples of issues that we are facing at Microsoft. To address these issues, we propose Phoebe, an efficient learning-based checkpoint optimizer. Given a set of constraints and an objective function at compile-time, Phoebe is able to determine the decomposition of job plans, and the optimal set of checkpoints to preserve their outputs to durable global storage. Phoebe consists of three machine learning predictors and one optimization module. For each stage of a job, Phoebe makes accurate predictions for: (1) the execution time, (2) the output size, and (3) the start/end time taking into account the inter-stage dependencies. Using these predictions, we formulate checkpoint optimization as an integer programming problem and propose a scalable heuristic algorithm that meets the latency requirement of the production environment. We demonstrate the effectiveness of Phoebe in production workloads, and show that we can free the temporary storage on hotspots by more than 70% and restart failed jobs 68% faster on average with minimum performance impact. Phoebe also illustrates that adding multiple sets of checkpoints is not cost-efficient, which dramatically reduces the complexity of the optimization. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Journal ref: Proceedings of the VLDB Endowment 14 (11), 2505-2518, 2021

arXiv:2108.09457 [pdf, other]

DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices

Authors: Stephan Patrick Baller, Anshul Jindal, Mohak Chadha, Michael Gerndt

Abstract: EdgeAI (Edge computing based Artificial Intelligence) has been most actively researched for the last few years to handle variety of massively distributed AI applications to meet up the strict latency requirements. Meanwhile, many companies have released edge devices with smaller form factors (low power consumption and limited resources) like the popular Raspberry Pi and Nvidia's Jetson Nano for ac… ▽ More EdgeAI (Edge computing based Artificial Intelligence) has been most actively researched for the last few years to handle variety of massively distributed AI applications to meet up the strict latency requirements. Meanwhile, many companies have released edge devices with smaller form factors (low power consumption and limited resources) like the popular Raspberry Pi and Nvidia's Jetson Nano for acting as compute nodes at the edge computing environments. Although the edge devices are limited in terms of computing power and hardware resources, they are powered by accelerators to enhance their performance behavior. Therefore, it is interesting to see how AI-based Deep Neural Networks perform on such devices with limited resources. In this work, we present and compare the performance in terms of inference time and power consumption of the four Systems on a Chip (SoCs): Asus Tinker Edge R, Raspberry Pi 4, Google Coral Dev Board, Nvidia Jetson Nano, and one microcontroller: Arduino Nano 33 BLE, on different deep learning models and frameworks. We also provide a method for measuring power consumption, inference time and accuracy for the devices, which can be easily extended to other devices. Our results showcase that, for Tensorflow based quantized model, the Google Coral Dev Board delivers the best performance, both for inference time and power consumption. For a low fraction of inference computation time, i.e. less than 29.3% of the time for MobileNetV2, the Jetson Nano performs faster than the other devices. △ Less

Submitted 21 August, 2021; originally announced August 2021.

Comments: 12 pages, accepted at IC2E'21

arXiv:2107.10008 [pdf, other]

Architecture-Specific Performance Optimization of Compute-Intensive FaaS Functions

Authors: Mohak Chadha, Anshul Jindal, Michael Gerndt

Abstract: FaaS allows an application to be decomposed into functions that are executed on a FaaS platform. The FaaS platform is responsible for the resource provisioning of the functions. Recently, there is a growing trend towards the execution of compute-intensive FaaS functions that run for several seconds. However, due to the billing policies followed by commercial FaaS offerings, the execution of these… ▽ More FaaS allows an application to be decomposed into functions that are executed on a FaaS platform. The FaaS platform is responsible for the resource provisioning of the functions. Recently, there is a growing trend towards the execution of compute-intensive FaaS functions that run for several seconds. However, due to the billing policies followed by commercial FaaS offerings, the execution of these functions can incur significantly higher costs. Moreover, due to the abstraction of underlying processor architectures on which the functions are executed, the performance optimization of these functions is challenging. As a result, most FaaS functions use pre-compiled libraries generic to x86-64 leading to performance degradation. In this paper, we examine the underlying processor architectures for Google Cloud Functions (GCF) and determine their prevalence across the 19 available GCF regions. We modify, adapt, and optimize three compute-intensive FaaS workloads written in Python using Numba, a JIT compiler based on LLVM, and present results wrt performance, memory consumption, and costs on GCF. Results from our experiments show that the optimization of FaaS functions can improve performance by 12.8x (geometric mean) and save costs by 73.4% on average for the three functions. Our results show that optimization of the FaaS functions for the specific architecture is very important. We achieved a maximum speedup of 1.79x by tuning the function especially for the instruction set of the underlying processor architecture. △ Less

Submitted 21 July, 2021; originally announced July 2021.

Comments: Extended version IEEE CLOUD 2021

arXiv:2107.08594 [pdf, other]

Optimal Resource Allocation for Serverless Queries

Authors: Anish Pimpley, Shuo Li, Anubha Srivastava, Vishal Rohra, Yi Zhu, Soundararajan Srinivasan, Alekh Jindal, Hiren Patel, Shi Qiao, Rathijit Sen

Abstract: Optimizing resource allocation for analytical workloads is vital for reducing costs of cloud-data services. At the same time, it is incredibly hard for users to allocate resources per query in serverless processing systems, and they frequently misallocate by orders of magnitude. Unfortunately, prior work focused on predicting peak allocation while ignoring aggressive trade-offs between resource al… ▽ More Optimizing resource allocation for analytical workloads is vital for reducing costs of cloud-data services. At the same time, it is incredibly hard for users to allocate resources per query in serverless processing systems, and they frequently misallocate by orders of magnitude. Unfortunately, prior work focused on predicting peak allocation while ignoring aggressive trade-offs between resource allocation and run-time. Additionally, these methods fail to predict allocation for queries that have not been observed in the past. In this paper, we tackle both these problems. We introduce a system for optimal resource allocation that can predict performance with aggressive trade-offs, for both new and past observed queries. We introduce the notion of a performance characteristic curve (PCC) as a parameterized representation that can compactly capture the relationship between resources and performance. To tackle training data sparsity, we introduce a novel data augmentation technique to efficiently synthesize the entire PCC using a single run of the query. Lastly, we demonstrate the advantages of a constrained loss function coupled with GNNs, over traditional ML methods, for capturing the domain specific behavior through an extensive experimental evaluation over SCOPE big data workloads at Microsoft. △ Less

Submitted 18 July, 2021; originally announced July 2021.

arXiv:2106.08938 [pdf, other]

Memory Leak Detection Algorithms in the Cloud-based Infrastructure

Authors: Anshul Jindal, Paul Staab, Pooja Kulkarni, Jorge Cardoso, Michael Gerndt, Vladimir Podolskiy

Abstract: A memory leak in an application deployed on the cloud can affect the availability and reliability of the application. Therefore, identifying and ultimately resolve it quickly is highly important. However, in the production environment running on the cloud, memory leak detection is a challenge without the knowledge of the application or its internal object allocation details. This paper addresses… ▽ More A memory leak in an application deployed on the cloud can affect the availability and reliability of the application. Therefore, identifying and ultimately resolve it quickly is highly important. However, in the production environment running on the cloud, memory leak detection is a challenge without the knowledge of the application or its internal object allocation details. This paper addresses this challenge of detection of memory leaks in cloud-based infrastructure without having any internal knowledge by introducing two novel machine learning-based algorithms: Linear Backward Regression (LBR) and Precog and, their two variants: Linear Backward Regression with Change Points Detection (LBRCPD) and Precog with Maximum Filteration (PrecogMF). These algorithms only use one metric i.e the system's memory utilization on which the application is deployed for detection of a memory leak. The developed algorithm's accuracy was tested on 60 virtual machines manually labeled memory utilization data and it was found that the proposed PrecogMF algorithm achieves the highest accuracy score of 85%. The same algorithm also achieves this by decreasing the overall compute time by 80% when compared to LBR's compute time. The paper also presents the different memory leak patterns found in the various memory leak applications and are further classified into different classes based on their visual representation. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: 10. pages. arXiv admin note: substantial text overlap with arXiv:2101.09799

arXiv:2104.03071 [pdf, other]

BreakingBERT@IITK at SemEval-2021 Task 9 : Statement Verification and Evidence Finding with Tables

Authors: Aditya Jindal, Ankur Gupta, Jaya Srivastava, Preeti Menghwani, Vijit Malik, Vishesh Kaushik, Ashutosh Modi

Abstract: Recently, there has been an interest in factual verification and prediction over structured data like tables and graphs. To circumvent any false news incident, it is necessary to not only model and predict over structured data efficiently but also to explain those predictions. In this paper, as part of the SemEval-2021 Task 9, we tackle the problem of fact verification and evidence finding over ta… ▽ More Recently, there has been an interest in factual verification and prediction over structured data like tables and graphs. To circumvent any false news incident, it is necessary to not only model and predict over structured data efficiently but also to explain those predictions. In this paper, as part of the SemEval-2021 Task 9, we tackle the problem of fact verification and evidence finding over tabular data. There are two subtasks. Given a table and a statement/fact, subtask A determines whether the statement is inferred from the tabular data, and subtask B determines which cells in the table provide evidence for the former subtask. We make a comparison of the baselines and state-of-the-art approaches over the given SemTabFact dataset. We also propose a novel approach CellBERT to solve evidence finding as a form of the Natural Language Inference task. We obtain a 3-way F1 score of 0.69 on subtask A and an F1 score of 0.65 on subtask B. △ Less

Submitted 10 April, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: Accepted at SemEval 2021 Task 9, 11 Pages (8 Pages main content+ 1 pages for references + 2 Pages Appendix)

arXiv:2102.02330 [pdf, other]

doi 10.1002/spe.2966

Function Delivery Network: Extending Serverless Computing for Heterogeneous Platforms

Authors: Anshul Jindal, Michael Gerndt, Mohak Chadha, Vladimir Podolskiy, Pengfei Chen

Abstract: Serverless computing has rapidly grown following the launch of Amazon's Lambda platform. Function-as-a-Service (FaaS) a key enabler of serverless computing allows an application to be decomposed into simple, standalone functions that are executed on a FaaS platform. The FaaS platform is responsible for deploying and facilitating resources to the functions. Several of today's cloud applications spr… ▽ More Serverless computing has rapidly grown following the launch of Amazon's Lambda platform. Function-as-a-Service (FaaS) a key enabler of serverless computing allows an application to be decomposed into simple, standalone functions that are executed on a FaaS platform. The FaaS platform is responsible for deploying and facilitating resources to the functions. Several of today's cloud applications spread over heterogeneous connected computing resources and are highly dynamic in their structure and resource requirements. However, FaaS platforms are limited to homogeneous clusters and homogeneous functions and do not account for the data access behavior of functions before scheduling. We introduce an extension of FaaS to heterogeneous clusters and to support heterogeneous functions through a network of distributed heterogeneous target platforms called Function Delivery Network (FDN). A target platform is a combination of a cluster of homogeneous nodes and a FaaS platform on top of it. FDN provides Function-Delivery-as-a-Service (FDaaS), delivering the function to the right target platform. We showcase the opportunities such as varied target platform's characteristics, possibility of collaborative execution between multiple target platforms, and localization of data that the FDN offers in fulfilling two objectives: Service Level Objective (SLO) requirements and energy efficiency when scheduling functions by evaluating over five distributed target platforms using the FDNInspector, a tool developed by us for benchmarking distributed target platforms. Scheduling functions on an edge target platform in our evaluation reduced the overall energy consumption by 17x without violating the SLO requirements in comparison to scheduling on a high-end target platform. △ Less

Submitted 3 February, 2021; originally announced February 2021.

Comments: Accepted at Journal of Software: Practice and Experience

arXiv:2101.09799 [pdf, other]

doi 10.1007/978-3-030-76352-7_21

Online Memory Leak Detection in the Cloud-based Infrastructures

Authors: Anshul Jindal, Paul Staab, Jorge Cardoso, Michael Gerndt, Vladimir Podolskiy

Abstract: A memory leak in an application deployed on the cloud can affect the availability and reliability of the application. Therefore, to identify and ultimately resolve it quickly is highly important. However, in the production environment running on the cloud, memory leak detection is a challenge without the knowledge of the application or its internal object allocation details. This paper addresses… ▽ More A memory leak in an application deployed on the cloud can affect the availability and reliability of the application. Therefore, to identify and ultimately resolve it quickly is highly important. However, in the production environment running on the cloud, memory leak detection is a challenge without the knowledge of the application or its internal object allocation details. This paper addresses this challenge of online detection of memory leaks in cloud-based infrastructure without having any internal application knowledge by introducing a novel machine learning based algorithm Precog. This algorithm solely uses one metric i.e the system's memory utilization on which the application is deployed for the detection of a memory leak. The developed algorithm's accuracy was tested on 60 virtual machines manually labeled memory utilization data provided by our industry partner Huawei Munich Research Center and it was found that the proposed algorithm achieves the accuracy score of 85\% with less than half a second prediction time per virtual machine. △ Less

Submitted 24 January, 2021; originally announced January 2021.

Comments: 12 pages

Journal ref: International Workshop on Artificial Intelligence for IT Operations (AIOPS) 2020

arXiv:2101.09796 [pdf, other]

The Ifs and Buts of the Development Approaches for IoT Applications

Authors: Saitel Daniela Agudelo-Sanabria, Anshul Jindal

Abstract: The recent growth of the Internet of Things (IoT) devices has lead to the rise of various complex applications where these applications involve interactions among large numbers of heterogeneous devices. An important challenge that needs to be addressed is to facilitate the agile development of IoT applications with minimal effort by the various parties involved in the process. However, IoT applica… ▽ More The recent growth of the Internet of Things (IoT) devices has lead to the rise of various complex applications where these applications involve interactions among large numbers of heterogeneous devices. An important challenge that needs to be addressed is to facilitate the agile development of IoT applications with minimal effort by the various parties involved in the process. However, IoT application development is challenging due to the wide variety of hardware and software technologies that interact in an IoT system. Moreover, it involves dealing with issues that are attributed to different software life-cycle phases: development, deployment, and progression. In this paper, we examine three IoT application development approaches: Mashup-based development, Model-based development, and Function-as-a-Service based development. The advantages and disadvantages of each approach are discussed from different perspectives, including reliability, deployment expeditiousness, ease of use, and targeted audience. Finally, we propose a simple solution where these techniques are combined to deliver reliable applications while reducing costs and time to release. △ Less

Submitted 24 January, 2021; originally announced January 2021.

Comments: 7 pages

arXiv:2011.03729 [pdf, other]

Enhash: A Fast Streaming Algorithm For Concept Drift Detection

Authors: Aashi Jindal, Prashant Gupta, Debarka Sengupta, Jayadeva

Abstract: We propose Enhash, a fast ensemble learner that detects \textit{concept drift} in a data stream. A stream may consist of abrupt, gradual, virtual, or recurring events, or a mixture of various types of drift. Enhash employs projection hash to insert an incoming sample. We show empirically that the proposed method has competitive performance to existing ensemble learners in much lesser time. Also, E… ▽ More We propose Enhash, a fast ensemble learner that detects \textit{concept drift} in a data stream. A stream may consist of abrupt, gradual, virtual, or recurring events, or a mixture of various types of drift. Enhash employs projection hash to insert an incoming sample. We show empirically that the proposed method has competitive performance to existing ensemble learners in much lesser time. Also, Enhash has moderate resource requirements. Experiments relevant to performance comparison were performed on 6 artificial and 4 real data sets consisting of various types of drifts. △ Less

Submitted 7 November, 2020; originally announced November 2020.

arXiv:2010.09808 [pdf, other]

Imitation with Neural Density Models

Authors: Kuno Kim, Akshat Jindal, Yang Song, Jiaming Song, Yanan Sui, Stefano Ermon

Abstract: We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We prese… ▽ More We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We present a practical IL algorithm, Neural Density Imitation (NDI), which obtains state-of-the-art demonstration efficiency on benchmark control tasks. △ Less

Submitted 19 October, 2020; originally announced October 2020.

arXiv:2009.12922 [pdf, other]

Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation

Authors: Olga Poppe, Tayo Amuneke, Dalitso Banda, Aritra De, Ari Green, Manon Knoertzer, Ehi Nosakhare, Karthik Rajendran, Deepak Shankargouda, Meina Wang, Alan Au, Carlo Curino, Qun Guo, Alekh Jindal, Ajay Kalhan, Morgan Oslake, Sonia Parchani, Vijay Ramani, Raj Sellappan, Saikat Sen, Sheetal Shrotri, Soundararajan Srinivasan, Ping Xia, Shize Xu, Alicia Yang , et al. (1 additional authors not shown)

Abstract: Microsoft Azure is dedicated to guarantee high quality of service to its customers, in particular, during periods of high customer activity, while controlling cost. We employ a Data Science (DS) driven solution to predict user load and leverage these predictions to optimize resource allocation. To this end, we built the Seagull infrastructure that processes per-server telemetry, validates the data… ▽ More Microsoft Azure is dedicated to guarantee high quality of service to its customers, in particular, during periods of high customer activity, while controlling cost. We employ a Data Science (DS) driven solution to predict user load and leverage these predictions to optimize resource allocation. To this end, we built the Seagull infrastructure that processes per-server telemetry, validates the data, trains and deploys ML models. The models are used to predict customer load per server (24h into the future), and optimize service operations. Seagull continually re-evaluates accuracy of predictions, fallback to previously known good models and triggers alerts as appropriate. We deployed this infrastructure in production for PostgreSQL and MySQL servers across all Azure regions, and applied it to the problem of scheduling server backups during low-load time. This minimizes interference with user-induced load and improves customer experience. △ Less

Submitted 16 October, 2020; v1 submitted 27 September, 2020; originally announced September 2020.

Comments: Technical report for the paper in VLDB 2021

arXiv:2009.12156 [pdf, other]

An Empirical Study on the Impact of Deep Parameters on Mobile App Energy Usage

Authors: Qiang Xu, James C. Davis, Y. Charlie Hu, Abhilash Jindal

Abstract: Improving software performance through configuration parameter tuning is a common activity during software maintenance. Beyond traditional performance metrics like latency, mobile app developers are interested in reducing app energy usage. Some mobile apps have centralized locations for parameter tuning, similar to databases and operating systems, but it is common for mobile apps to have hundreds… ▽ More Improving software performance through configuration parameter tuning is a common activity during software maintenance. Beyond traditional performance metrics like latency, mobile app developers are interested in reducing app energy usage. Some mobile apps have centralized locations for parameter tuning, similar to databases and operating systems, but it is common for mobile apps to have hundreds of parameters scattered around the source code. The correlation between these "deep" parameters and app energy usage is unclear. Researchers have studied the energy effects of deep parameters in specific modules, but we lack a systematic understanding of the energy impact of mobile deep parameters. In this paper we empirically investigate this topic, combining a developer survey with systematic energy measurements. Our motivational survey of 25 Android developers suggests that developers do not understand, and largely ignore, the energy impact of deep parameters. To assess the potential implications of this practice, we propose a deep parameter energy profiling framework that can analyze the energy impact of deep parameters in an app. Our framework identifies deep parameters, mutates them based on our parameter value selection scheme, and performs reliable energy impact analysis. Applying the framework to 16 popular Android apps, we discovered that deep parameter-induced energy inefficiency is rare. We found only 2 out of 1644 deep parameters for which a different value would significantly improve its app's energy efficiency. A detailed analysis found that most deep parameters have either no energy impact, limited energy impact, or an energy impact only under extreme values. Our study suggests that it is generally safe for developers to ignore the energy impact when choosing deep parameter values in mobile apps. △ Less

Submitted 16 January, 2022; v1 submitted 22 September, 2020; originally announced September 2020.

Comments: 12 pages, 9 figures, to be published in SANER 2022, camera-ready

arXiv:2002.12393 [pdf, other]

Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings

Authors: Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao le

Abstract: Query processing over big data is ubiquitous in modern clouds, where the system takes care of picking both the physical query execution plans and the resources needed to run those plans, using a cost-based query optimizer. A good cost model, therefore, is akin to better resource efficiency and lower operational costs. Unfortunately, the production workloads at Microsoft show that costs are very co… ▽ More Query processing over big data is ubiquitous in modern clouds, where the system takes care of picking both the physical query execution plans and the resources needed to run those plans, using a cost-based query optimizer. A good cost model, therefore, is akin to better resource efficiency and lower operational costs. Unfortunately, the production workloads at Microsoft show that costs are very complex to model for big data systems. In this work, we investigate two key questions: (i) can we learn accurate cost models for big data systems, and (ii) can we integrate the learned models within the query optimizer. To answer these, we make three core contributions. First, we exploit workload patterns to learn a large number of individual cost models and combine them to achieve high accuracy and coverage over a long period. Second, we propose extensions to Cascades framework to pick optimal resources, i.e, number of containers, during query planning. And third, we integrate the learned cost models within the Cascade-style query optimizer of SCOPE at Microsoft. We evaluate the resulting system, Cleo, in a production environment using both production and TPC-H workloads. Our results show that the learned cost models are 2 to 3 orders of magnitude more accurate, and 20X more correlated with the actual runtimes, with a large majority (70%) of the plan changes leading to substantial improvements in latency as well as resource usage. △ Less

Submitted 27 February, 2020; originally announced February 2020.

Comments: To appear at SIGMOD 2020

arXiv:1909.00659 [pdf, other]

Guided Random Forest and its application to data approximation

Authors: Prashant Gupta, Aashi Jindal, Jayadeva, Debarka Sengupta

Abstract: We present a new way of constructing an ensemble classifier, named the Guided Random Forest (GRAF) in the sequel. GRAF extends the idea of building oblique decision trees with localized partitioning to obtain a global partitioning. We show that global partitioning bridges the gap between decision trees and boosting algorithms. We empirically demonstrate that global partitioning reduces the general… ▽ More We present a new way of constructing an ensemble classifier, named the Guided Random Forest (GRAF) in the sequel. GRAF extends the idea of building oblique decision trees with localized partitioning to obtain a global partitioning. We show that global partitioning bridges the gap between decision trees and boosting algorithms. We empirically demonstrate that global partitioning reduces the generalization error bound. Results on 115 benchmark datasets show that GRAF yields comparable or better results on a majority of datasets. We also present a new way of approximating the datasets in the framework of random forests. △ Less

Submitted 2 September, 2019; originally announced September 2019.

arXiv:1909.00084 [pdf, other]

Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML

Authors: Ashvin Agrawal, Rony Chatterjee, Carlo Curino, Avrilia Floratou, Neha Gowdal, Matteo Interlandi, Alekh Jindal, Kostantinos Karanasos, Subru Krishnan, Brian Kroth, Jyoti Leeka, Kwanghyun Park, Hiren Patel, Olga Poppe, Fotis Psallidas, Raghu Ramakrishnan, Abhishek Roy, Karla Saur, Rathijit Sen, Markus Weimer, Travis Wright, Yiwen Zhu

Abstract: Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios including voice recognition and conversational understanding for customer support, autotuning for videoconferencing, intelligent feedback loops in large-scale sysops, manufacturing and autonomous vehicle management, complex… ▽ More Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios including voice recognition and conversational understanding for customer support, autotuning for videoconferencing, intelligent feedback loops in large-scale sysops, manufacturing and autonomous vehicle management, complex financial predictions, just to name a few. Meanwhile, as the value of data is increasingly recognized and monetized, concerns about securing valuable data and risks to individual privacy have been growing. Consequently, rigorous data management has emerged as a key requirement in enterprise settings. How will these trends (ML growing popularity, and stricter data governance) intersect? What are the unmet requirements for applying ML in enterprise settings? What are the technical challenges for the DB community to solve? In this paper, we present our vision of how ML and database systems are likely to come together, and early steps we take towards making this vision a reality. △ Less

Submitted 27 December, 2019; v1 submitted 30 August, 2019; originally announced September 2019.

arXiv:1906.06590 [pdf, other]

Query and Resource Optimizations: A Case for Breaking the Wall in Big Data Systems

Authors: Alekh Jindal, Lalitha Viswanathan, Konstantinos Karanasos

Abstract: Modern big data systems run on cloud environments where resources are shared amongst several users and applications. As a result, declarative user queries in these environments need to be optimized and executed over resources that constantly change and are provisioned on demand for each job. This requires us to rethink traditional query optimizers designed for systems that run on dedicated resourc… ▽ More Modern big data systems run on cloud environments where resources are shared amongst several users and applications. As a result, declarative user queries in these environments need to be optimized and executed over resources that constantly change and are provisioned on demand for each job. This requires us to rethink traditional query optimizers designed for systems that run on dedicated resources. In this paper, we show evidence that the choice of query plans depends heavily on the available resources, and the current practice of choosing query plans before picking the resources could lead to significant performance loss in two popular big data systems, namely Hive and SparkSQL. Therefore, we make a case for Resource and Query Optimization (or RAQO), i.e., choosing both the query plan and the resource configuration at the same time. We describe rule-based RAQO and present alternate decisions trees to make resource-aware query planning in Hive and Spark. We further present cost-based RAQO that integrates resource planning within a query planner, and show techniques to significantly reduce the resource planning overheads. We evaluate cost-based RAQO using state-of-the-art System R query planner as well as a recently proposed multi-objective query planner. Our evaluation on TPC-H and randomly generated schemas show that: (i) we can reduce the resource planning overhead by up to 16x, and (ii) RAQO can scale to schemas as large as 100 table joins as well as clusters as big as 100K containers with 100GB each. △ Less

Submitted 15 June, 2019; originally announced June 2019.

arXiv:1901.03614 [pdf, other]

Jammer-Assisted Resource Allocation in Secure OFDMA With Untrusted Users

Authors: Ravikant Saini, Abhishek Jindal, Swades De

Abstract: In this paper, we consider the problem of resource allocation in the orthogonal frequency division multiple access system with single source and M untrusted users in presence of a friendly jammer. The jammer is used to improve either the weighted sum secure rate or the overall system fairness. The formulated optimization problem in both the cases is a mixed integer non-linear programming problem,… ▽ More In this paper, we consider the problem of resource allocation in the orthogonal frequency division multiple access system with single source and M untrusted users in presence of a friendly jammer. The jammer is used to improve either the weighted sum secure rate or the overall system fairness. The formulated optimization problem in both the cases is a mixed integer non-linear programming problem, belonging to the class of NP-hard. In the sum secure rate maximization scenario, we decouple the problem and first obtain the subcarrier allocation at source and the decision for jammer power utilization on a per-subcarrier basis. Then, we do joint source and jammer power allocation using primal decomposition and alternating optimization framework. Next, we consider fair resource allocation by introducing a novel concept of subcarrier snatching with the help of jammer. We propose two schemes for jammer power utilization, called proactively fair allocation (PFA) and on-demand allocation (ODA). PFA considers equitable distribution of jammer power among the subcarriers, while ODA distributes jammer power based on the user demand. In both cases of jammer usage, we also present suboptimal solutions that solve the power allocation at a highly reduced complexity. Asymptotically optimal solutions are derived to benchmark optimality of the proposed schemes. We compare the performance of our proposed schemes with equal power allocation at source and jammer. Our simulation results demonstrate that the jammer can indeed help in improving either the sum secure rate or the overall system fairness. △ Less

Submitted 11 January, 2019; originally announced January 2019.

arXiv:1701.06093 [pdf, other]

INGESTBASE: A Declarative Data Ingestion System

Authors: Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, Samuel Madden

Abstract: Big data applications have fast arriving data that must be quickly ingested. At the same time, they have specific needs to preprocess and transform the data before it could be put to use. The current practice is to do these preparatory transformations once the data is already ingested, however, this is expensive to run and cumbersome to manage. As a result, there is a need to push data preprocessi… ▽ More Big data applications have fast arriving data that must be quickly ingested. At the same time, they have specific needs to preprocess and transform the data before it could be put to use. The current practice is to do these preparatory transformations once the data is already ingested, however, this is expensive to run and cumbersome to manage. As a result, there is a need to push data preprocessing down to the ingestion itself. In this paper, we present a declarative data ingestion system, called INGESTBASE, to allow application developers to plan and specify their data ingestion logic in a more systematic manner. We introduce the notion of ingestions plans, analogous to query plans, and present a declarative ingestion language to help developers easily build sophisticated ingestion plans. INGESTBASE provides an extensible ingestion optimizer to rewrite and optimize ingestion plans by applying rules such as operator reordering and pipelining. Finally, the INGESTBASE runtime engine runs the optimized ingestion plan in a distributed and fault-tolerant manner. Later, at query processing time, INGESTBASE supports ingestion-aware data access and interfaces with upstream query processors, such as Hadoop MapReduce and Spark, to post- process the ingested data. We demonstrate through a number of experiments that INGESTBASE: (i) is flexible enough to express a variety of ingestion techniques, (ii) incurs a low ingestion overhead, (iii) provides efficient access to the ingested data, and (iv) has much better performance, up to 6 times, than preparing data as an afterthought, via a query processor. △ Less

Submitted 21 January, 2017; originally announced January 2017.

arXiv:1412.5263 [pdf, other]

Graph Analytics using the Vertica Relational Database

Authors: Alekh Jindal, Samuel Madden, Malu Castellanos, Meichun Hsu

Abstract: Graph analytics is becoming increasingly popular, with a deluge of new systems for graph analytics having been proposed in the past few years. These systems often start from the assumption that a new storage or query processing system is needed, in spite of graph data being often collected and stored in a relational database in the first place. In this paper, we study Vertica relational database a… ▽ More Graph analytics is becoming increasingly popular, with a deluge of new systems for graph analytics having been proposed in the past few years. These systems often start from the assumption that a new storage or query processing system is needed, in spite of graph data being often collected and stored in a relational database in the first place. In this paper, we study Vertica relational database as a platform for graph analytics. We show that vertex-centric graph analysis can be translated to SQL queries, typically involving table scans and joins, and that modern column-oriented databases are very well suited to running such queries. Specifically, we present an experimental evaluation of the Vertica relational database system on a variety of graph analytics, including iterative analysis, a combination of graph and relational analyses, and more complex 1- hop neighborhood graph analytics, showing that it is competitive to two popular vertex-centric graph analytics systems, namely Giraph and GraphLab. △ Less

Submitted 17 December, 2014; originally announced December 2014.

ACM Class: H.2.4

arXiv:1208.0287 [pdf, other]

Only Aggressive Elephants are Fast Elephants

Authors: Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Stefan Richter, Stefan Schuh, Alekh Jindal, Jörg Schad

Abstract: Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider's orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We mak… ▽ More Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider's orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We make elephants aggressive; only this will make them very fast. We propose HAIL (Hadoop Aggressive Indexing Library), an enhancement of HDFS and Hadoop MapReduce that dramatically improves runtimes of several classes of MapReduce jobs. HAIL changes the upload pipeline of HDFS in order to create different clustered indexes on each data block replica. An interesting feature of HAIL is that we typically create a win-win situation: we improve both data upload to HDFS and the runtime of the actual Hadoop MapReduce job. In terms of data upload, HAIL improves over HDFS by up to 60% with the default replication factor of three. In terms of query execution, we demonstrate that HAIL runs up to 68x faster than Hadoop. In our experiments, we use six clusters including physical and EC2 clusters of up to 100 nodes. A series of scalability experiments also demonstrates the superiority of HAIL. △ Less

Submitted 1 August, 2012; originally announced August 2012.

Comments: VLDB2012

Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 11, pp. 1591-1602 (2012)

Showing 1–50 of 53 results for author: Jindal, A