subscribe to arXiv mailings

DRPC: Distributed Reinforcement Learning Approach for Scalable Resource Provisioning in Container-based Clusters

Authors: Haoyu Bai, Minxian Xu, Kejiang Ye, Rajkumar Buyya, Chengzhong Xu

Abstract: Microservices have transformed monolithic applications into lightweight, self-contained, and isolated application components, establishing themselves as a dominant paradigm for application development and deployment in public clouds such as Google and Alibaba. Autoscaling emerges as an efficient strategy for managing resources allocated to microservices' replicas. However, the dynamic and intricat… ▽ More Microservices have transformed monolithic applications into lightweight, self-contained, and isolated application components, establishing themselves as a dominant paradigm for application development and deployment in public clouds such as Google and Alibaba. Autoscaling emerges as an efficient strategy for managing resources allocated to microservices' replicas. However, the dynamic and intricate dependencies within microservice chains present challenges to the effective management of scaled microservices. Additionally, the centralized autoscaling approach can encounter scalability issues, especially in the management of large-scale microservice-based clusters. To address these challenges and enhance scalability, we propose an innovative distributed resource provisioning approach for microservices based on the Twin Delayed Deep Deterministic Policy Gradient algorithm. This approach enables effective autoscaling decisions and decentralizes responsibilities from a central node to distributed nodes. Comparative results with state-of-the-art approaches, obtained from a realistic testbed and traces, indicate that our approach reduces the average response time by 15% and the number of failed requests by 24%, validating improved scalability as the number of requests increases. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 12 pages

Journal ref: IEEE Transactions on Service Computing, 2024

arXiv:2407.07267 [pdf, other]

Service Colonies: A Novel Architectural Style for Developing Software Systems with Autonomous and Cooperative Services

Authors: Thakshila Imiya Mohottige, Artem Polyvyanyy, Rajkumar Buyya, Colin Fidge, Alistair Barros

Abstract: This paper presents the concept of a service colony and its characteristics. A service colony is a novel architectural style for developing a software system as a group of autonomous software services co-operating to fulfill the objectives of the system. Each inhabitant service in the colony implements a specific system functionality, collaborates with the other services, and makes proactive decis… ▽ More This paper presents the concept of a service colony and its characteristics. A service colony is a novel architectural style for developing a software system as a group of autonomous software services co-operating to fulfill the objectives of the system. Each inhabitant service in the colony implements a specific system functionality, collaborates with the other services, and makes proactive decisions that impact its performance and interaction patterns with other inhabitants. By increasing the level of self-awareness and autonomy available to individual system components, the resulting system is increasingly more decentralized, distributed, flexible, adaptable, distributed, modular, robust, and fault-tolerant. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 8 pages, 7 figures, SE 2030, July 2024, Puerto Galinhas (Brazil)

arXiv:2407.02828 [pdf]

Quantum Serverless Paradigm and Application Development using the QFaaS Framework

Authors: Hoa T. Nguyen, Bui Binh An Pham, Muhammad Usman, Rajkumar Buyya

Abstract: Quantum computing has the potential to solve complex problems beyond the capabilities of classical computers. However, its practical use is currently limited due to early-stage quantum software engineering and the constraints of Noisy Intermediate-Scale Quantum (NISQ) devices. To address this issue, this chapter introduces the concept of serverless quantum computing with examples using QFaaS, a pr… ▽ More Quantum computing has the potential to solve complex problems beyond the capabilities of classical computers. However, its practical use is currently limited due to early-stage quantum software engineering and the constraints of Noisy Intermediate-Scale Quantum (NISQ) devices. To address this issue, this chapter introduces the concept of serverless quantum computing with examples using QFaaS, a practical Quantum Function-as-a-Service framework. This framework utilizes the serverless computing model to simplify quantum application development and deployment by abstracting the complexities of quantum hardware and enhancing application portability across different quantum software development kits and quantum backends. The chapter provides comprehensive documentation and guidelines for deploying and using QFaaS, detailing the setup, component deployment, and examples of service-oriented quantum applications. This framework offers a promising approach to overcoming current limitations and advancing the practical software engineering of quantum computing. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Guidelines for deploying and using the QFaaS Framework (for the original paper, see https://doi.org/10.1016/j.future.2024.01.018)

arXiv:2407.02748 [pdf, other]

DRLQ: A Deep Reinforcement Learning-based Task Placement for Quantum Cloud Computing

Authors: Hoa T. Nguyen, Muhammad Usman, Rajkumar Buyya

Abstract: The quantum cloud computing paradigm presents unique challenges in task placement due to the dynamic and heterogeneous nature of quantum computation resources. Traditional heuristic approaches fall short in adapting to the rapidly evolving landscape of quantum computing. This paper proposes DRLQ, a novel Deep Reinforcement Learning (DRL)-based technique for task placement in quantum cloud computin… ▽ More The quantum cloud computing paradigm presents unique challenges in task placement due to the dynamic and heterogeneous nature of quantum computation resources. Traditional heuristic approaches fall short in adapting to the rapidly evolving landscape of quantum computing. This paper proposes DRLQ, a novel Deep Reinforcement Learning (DRL)-based technique for task placement in quantum cloud computing environments, addressing the optimization of task completion time and quantum task scheduling efficiency. It leverages the Deep Q Network (DQN) architecture, enhanced with the Rainbow DQN approach, to create a dynamic task placement strategy. This approach is one of the first in the field of quantum cloud resource management, enabling adaptive learning and decision-making for quantum cloud environments and effectively optimizing task placement based on changing conditions and resource availability. We conduct extensive experiments using the QSimPy simulation toolkit to evaluate the performance of our method, demonstrating substantial improvements in task execution efficiency and a reduction in the need to reschedule quantum tasks. Our results show that utilizing the DRLQ approach for task placement can significantly reduce total quantum task completion time by 37.81% to 72.93% and prevent task rescheduling attempts compared to other heuristic approaches. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted paper at IEEE CLOUD 2024 conference

arXiv:2406.05517 [pdf]

Blockchain Integrated Federated Learning in Edge-Fog-Cloud Systems for IoT based Healthcare Applications A Survey

Authors: Shinu M. Rajagopal, Supriya M., Rajkumar Buyya

Abstract: Modern Internet of Things (IoT) applications generate enormous amounts of data, making data-driven machine learning essential for developing precise and reliable statistical models. However, data is often stored in silos, and strict user-privacy legislation complicates data utilization, limiting machine learning's potential in traditional centralized paradigms due to diverse data probability distr… ▽ More Modern Internet of Things (IoT) applications generate enormous amounts of data, making data-driven machine learning essential for developing precise and reliable statistical models. However, data is often stored in silos, and strict user-privacy legislation complicates data utilization, limiting machine learning's potential in traditional centralized paradigms due to diverse data probability distributions and lack of personalization. Federated learning, a new distributed paradigm, supports collaborative learning while preserving privacy, making it ideal for IoT applications. By employing cryptographic techniques, IoT systems can securely store and transmit data, ensuring consistency. The integration of federated learning and blockchain is particularly advantageous for handling sensitive data, such as in healthcare. Despite the potential of these technologies, a comprehensive examination of their integration in edge-fog-cloud-based IoT computing systems and healthcare applications is needed. This survey article explores the architecture, structure, functions, and characteristics of federated learning and blockchain, their applications in various computing paradigms, and evaluates their implementations in healthcare. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2405.04490 [pdf, other]

Resource-Efficient and Self-Adaptive Quantum Search in a Quantum-Classical Hybrid System

Authors: Zihao Jiang, Zefan Du, Shaolun Ruan, Juntao Chen, Yong Wang, Long Cheng, Rajkumar Buyya, Ying Mao

Abstract: Over the past decade, the rapid advancement of deep learning and big data applications has been driven by vast datasets and high-performance computing systems. However, as we approach the physical limits of semiconductor fabrication in the post-Moore's Law era, questions arise about the future of these applications. In parallel, quantum computing has made significant progress with the potential to… ▽ More Over the past decade, the rapid advancement of deep learning and big data applications has been driven by vast datasets and high-performance computing systems. However, as we approach the physical limits of semiconductor fabrication in the post-Moore's Law era, questions arise about the future of these applications. In parallel, quantum computing has made significant progress with the potential to break limits. Major companies like IBM, Google, and Microsoft provide access to noisy intermediate-scale quantum (NISQ) computers. Despite the theoretical promise of Shor's and Grover's algorithms, practical implementation on current quantum devices faces challenges, such as demanding additional resources and a high number of controlled operations. To tackle these challenges and optimize the utilization of limited onboard qubits, we introduce ReSaQuS, a resource-efficient index-value searching system within a quantum-classical hybrid framework. Building on Grover's algorithm, ReSaQuS employs an automatically managed iterative search approach. This method analyzes problem size, filters fewer probable data points, and progressively reduces the dataset with decreasing qubit requirements. Implemented using Qiskit and evaluated through extensive experiments, ReSaQuS has demonstrated a substantial reduction, up to 86.36\% in cumulative qubit consumption and 72.72\% in active periods, reinforcing its potential in optimizing quantum computing application deployment. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.01021 [pdf, other]

QSimPy: A Learning-centric Simulation Framework for Quantum Cloud Resource Management

Authors: Hoa T. Nguyen, Muhammad Usman, Rajkumar Buyya

Abstract: Quantum cloud computing is an emerging computing paradigm that allows seamless access to quantum hardware as cloud-based services. However, effective use of quantum resources is challenging and necessitates robust simulation frameworks for effective resource management design and evaluation. To address this need, we proposed QSimPy, a novel discrete-event simulation framework designed with the mai… ▽ More Quantum cloud computing is an emerging computing paradigm that allows seamless access to quantum hardware as cloud-based services. However, effective use of quantum resources is challenging and necessitates robust simulation frameworks for effective resource management design and evaluation. To address this need, we proposed QSimPy, a novel discrete-event simulation framework designed with the main focus of facilitating learning-centric approaches for quantum resource management problems in cloud environments. Underpinned by extensibility, compatibility, and reusability principles, QSimPy provides a lightweight simulation environment based on SimPy, a well-known Python-based simulation engine for modeling dynamics of quantum cloud resources and task operations. We integrate the Gymnasium environment into our framework to support the creation of simulated environments for developing and evaluating reinforcement learning-based techniques for optimizing quantum cloud resource management. The QSimPy framework encapsulates the operational intricacies of quantum cloud environments, supporting research in dynamic task allocation and optimization through DRL approaches. We also demonstrate the use of QSimPy in developing reinforcement learning policies for quantum task placement problems, demonstrating its potential as a useful framework for future quantum cloud research. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.11420 [pdf, other]

Quantum Cloud Computing: A Review, Open Problems, and Future Directions

Authors: Hoa T. Nguyen, Prabhakar Krishnan, Dilip Krishnaswamy, Muhammad Usman, Rajkumar Buyya

Abstract: Quantum cloud computing is an emerging paradigm of computing that empowers quantum applications and their deployment on quantum computing resources without the need for a specialized environment to host and operate physical quantum computers. This paper reviews recent advances, identifies open problems, and proposes future directions in quantum cloud computing. It discusses the state-of-the-art qu… ▽ More Quantum cloud computing is an emerging paradigm of computing that empowers quantum applications and their deployment on quantum computing resources without the need for a specialized environment to host and operate physical quantum computers. This paper reviews recent advances, identifies open problems, and proposes future directions in quantum cloud computing. It discusses the state-of-the-art quantum cloud advances, including the various cloud-based models, platforms, and recently developed technologies and software use cases. Furthermore, it discusses different aspects of the quantum cloud, including resource management, quantum serverless, security, and privacy problems. Finally, the paper examines open problems and proposes the future directions of quantum cloud computing, including potential opportunities and ongoing research in this emerging field. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.18622 [pdf, other]

qIoV: A Quantum-Driven Internet-of-Vehicles-Based Approach for Environmental Monitoring and Rapid Response Systems

Authors: Ankur Nahar, Koustav Kumar Mondal, Debasis Das, Rajkumar Buyya

Abstract: This research addresses the critical necessity for advanced rapid response operations in managing a spectrum of environmental hazards. We propose a novel framework, qIoV that integrates quantum computing with the Internet-of-Vehicles (IoV) to leverage the computational efficiency, parallelism, and entanglement properties of quantum mechanics. Our approach involves the use of environmental sensors… ▽ More This research addresses the critical necessity for advanced rapid response operations in managing a spectrum of environmental hazards. We propose a novel framework, qIoV that integrates quantum computing with the Internet-of-Vehicles (IoV) to leverage the computational efficiency, parallelism, and entanglement properties of quantum mechanics. Our approach involves the use of environmental sensors mounted on vehicles for precise air quality assessment. These sensors are designed to be highly sensitive and accurate, leveraging the principles of quantum mechanics to detect and measure environmental parameters. A salient feature of our proposal is the Quantum Mesh Network Fabric (QMF), a system designed to dynamically adjust the quantum network topology in accordance with vehicular movements. This capability is critical to maintaining the integrity of quantum states against environmental and vehicular disturbances, thereby ensuring reliable data transmission and processing. Moreover, our methodology is further augmented by the incorporation of a variational quantum classifier (VQC) with advanced quantum entanglement techniques. This integration offers a significant reduction in latency for hazard alert transmission, thus enabling expedited communication of crucial data to emergency response teams and the public. Our study on the IBM OpenQSAM 3 platform, utilizing a 127 Qubit system, revealed significant advancements in pair plot analysis, achieving over 90% in precision, recall, and F1-Score metrics and an 83% increase in the speed of toxic gas detection compared to conventional methods.Additionally, theoretical analyses validate the efficiency of quantum rotation, teleportation protocols, and the fidelity of quantum entanglement, further underscoring the potential of quantum computing in enhancing analytical performance. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2402.17117 [pdf]

Deep Reinforcement Learning (DRL)-based Methods for Serverless Stream Processing Engines: A Vision, Architectural Elements, and Future Directions

Authors: Maria R. Read, Chinmaya Dehury, Satish Narayana Srirama, Rajkumar Buyya

Abstract: Streaming applications are becoming widespread across an extensive range of business domains as an increasing number of sources continuously produce data that need to be processed and analysed in real time. Modern businesses are aggressively using streaming data to generate valuable knowledge that can be used to automate processes, help decision-making, optimize resource usage, and ultimately gene… ▽ More Streaming applications are becoming widespread across an extensive range of business domains as an increasing number of sources continuously produce data that need to be processed and analysed in real time. Modern businesses are aggressively using streaming data to generate valuable knowledge that can be used to automate processes, help decision-making, optimize resource usage, and ultimately generate revenue for the organization. Despite their increased adoption and tangible benefits, support for the automated deployment and management of streaming applications is yet to emerge. Although a plethora of stream management systems have flooded the open source community in recent years, all of the existing frameworks demand a considerably challenging and lengthy effort from human operators to manually and continuously tune their configuration and deployment environment in order to reach and maintain the desired performance goals. To address these challenges, this article proposes a vision for creating Deep Reinforcement Learning (DRL)-based methods for transforming stream processing engines into self-managed serverless solutions. This will lead to an increase in productivity as engineers can focus on the actual development process, an increase in application performance potentially leading to reduced response times and more accurate and meaningful results, and a considerable decrease in operational costs for organizations. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 21 pages, 10 figures

arXiv:2402.00356 [pdf, other]

Securing Cloud-Based Internet of Things: Challenges and Mitigations

Authors: Nivedita Singh, Rajkumar Buyya, Hyoungshich Kim

Abstract: The Internet of Things (IoT) has seen remarkable advancements in recent years, leading to a paradigm shift in the digital landscape. However, these technological strides have also brought new challenges, particularly in terms of cybersecurity. IoT devices are inherently connected to the internet, which makes them more vulnerable to attack. In addition, IoT services often handle sensitive user data… ▽ More The Internet of Things (IoT) has seen remarkable advancements in recent years, leading to a paradigm shift in the digital landscape. However, these technological strides have also brought new challenges, particularly in terms of cybersecurity. IoT devices are inherently connected to the internet, which makes them more vulnerable to attack. In addition, IoT services often handle sensitive user data, which could be misused by malicious actors or unauthorized service providers. As more mainstream service providers emerge without uniform regulations, these security risks are expected to escalate exponentially. The task of maintaining the security of IoT devices while they interact with cloud services is also challenging. Newer IoT services, especially those developed and deployed via Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service (IaaS) models, pose additional security threats. Although IoT devices are becoming more affordable and ubiquitous, their growing complexity could expose users to heightened security and privacy risks. This paper highlights these pressing security concerns associated with the widespread adoption of IoT devices and services. We propose potential solutions to bridge the existing security gaps and expect future challenges. Our approach entails a comprehensive exploration of the key security challenges that IoT services are currently facing. We also suggest proactive strategies to mitigate these risks, strengthening the overall security of IoT devices and services. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.02469 [pdf, other]

doi 10.1016/j.teler.2024.100116

Modern Computing: Vision and Challenges

Authors: Sukhpal Singh Gill, Huaming Wu, Panos Patros, Carlo Ottaviani, Priyansh Arora, Victor Casamayor Pujol, David Haunschild, Ajith Kumar Parlikad, Oktay Cetinkaya, Hanan Lutfiyya, Vlado Stankovski, Ruidong Li, Yuemin Ding, Junaid Qadir, Ajith Abraham, Soumya K. Ghosh, Houbing Herbert Song, Rizos Sakellariou, Omer Rana, Joel J. P. C. Rodrigues, Salil S. Kanhere, Schahram Dustdar, Steve Uhlig, Kotagiri Ramamohanarao, Rajkumar Buyya

Abstract: Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has… ▽ More Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has led to new paradigms such as cloud, fog, edge computing, and the Internet of Things (IoT), which offer fresh economic and creative opportunities. Nevertheless, this rapid change poses complex research challenges, especially in maximizing potential and enhancing functionality. As such, to maintain an economical level of performance that meets ever-tighter requirements, one must understand the drivers of new model emergence and expansion, and how contemporary challenges differ from past ones. To that end, this article investigates and assesses the factors influencing the evolution of computing systems, covering established systems and architectures as well as newer developments, such as serverless computing, quantum computing, and on-device AI on edge devices. Trends emerge when one traces technological trajectory, which includes the rapid obsolescence of frameworks due to business and technical constraints, a move towards specialized systems and models, and varying approaches to centralized and decentralized control. This comprehensive review of modern computing systems looks ahead to the future of research in the field, highlighting key challenges and emerging trends, and underscoring their importance in cost-effectively driving technological progress. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: Preprint submitted to Telematics and Informatics Reports, Elsevier (2024)

Journal ref: Elsevier Telematics and Informatics Reports, Volume 13, March 2024

arXiv:2312.11739 [pdf, other]

TPTO: A Transformer-PPO based Task Offloading Solution for Edge Computing Environments

Authors: Niloofar Gholipour, Marcos Dias de Assuncao, Pranav Agarwal, julien gascon-samson, Rajkumar Buyya

Abstract: Emerging applications in healthcare, autonomous vehicles, and wearable assistance require interactive and low-latency data analysis services. Unfortunately, cloud-centric architectures cannot fulfill the low-latency demands of these applications, as user devices are often distant from cloud data centers. Edge computing aims to reduce the latency by enabling processing tasks to be offloaded to reso… ▽ More Emerging applications in healthcare, autonomous vehicles, and wearable assistance require interactive and low-latency data analysis services. Unfortunately, cloud-centric architectures cannot fulfill the low-latency demands of these applications, as user devices are often distant from cloud data centers. Edge computing aims to reduce the latency by enabling processing tasks to be offloaded to resources located at the network's edge. However, determining which tasks must be offloaded to edge servers to reduce the latency of application requests is not trivial, especially if the tasks present dependencies. This paper proposes a DRL approach called TPTO, which leverages Transformer Networks and PPO to offload dependent tasks of IoT applications in edge computing. We consider users with various preferences, where devices can offload computation to an edge server via wireless channels. Performance evaluation results demonstrate that under fat application graphs, TPTO is more effective than state-of-the-art methods, such as Greedy, HEFT, and MRLCO, by reducing latency by 30.24%, 29.61%, and 12.41%, respectively. In addition, TPTO presents a training time approximately 2.5 times faster than an existing DRL approach. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Journal ref: 2023 IEEE 29nd International Conferance on Parallel and Distributed System(ICPADS)

arXiv:2311.00974 [pdf, other]

CloudSim Express: A Novel Framework for Rapid Low Code Simulation of Cloud Computing Environments

Authors: Tharindu B. Hewage, Shashikant Ilager, Maria A. Rodriguez, Rajkumar Buyya

Abstract: Cloud computing environment simulators enable cost-effective experimentation of novel infrastructure designs and management approaches by avoiding significant costs incurred from repetitive deployments in real Cloud platforms. However, widely used Cloud environment simulators compromise on usability due to complexities in design and configuration, along with the added overhead of programming langu… ▽ More Cloud computing environment simulators enable cost-effective experimentation of novel infrastructure designs and management approaches by avoiding significant costs incurred from repetitive deployments in real Cloud platforms. However, widely used Cloud environment simulators compromise on usability due to complexities in design and configuration, along with the added overhead of programming language expertise. Existing approaches attempting to reduce this overhead, such as script-based simulators and Graphical User Interface (GUI) based simulators, often compromise on the extensibility of the simulator. Simulator extensibility allows for customization at a fine-grained level, thus reducing it significantly affects flexibility in creating simulations. To address these challenges, we propose an architectural framework to enable human-readable script-based simulations in existing Cloud environment simulators while minimizing the impact on simulator extensibility. We implement the proposed framework for the widely used Cloud environment simulator, the CloudSim toolkit, and compare it against state-of-the-art baselines using a practical use case. The resulting framework, called CloudSim Express, achieves extensible simulations while surpassing baselines with over a 71.43% reduction in code complexity and an 89.42% reduction in lines of code. △ Less

Submitted 10 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.09003 [pdf, other]

μ-DDRL: A QoS-Aware Distributed Deep Reinforcement Learning Technique for Service Offloading in Fog computing Environments

Authors: Mohammad Goudarzi, Maria A. Rodriguez, Majid Sarvi, Rajkumar Buyya

Abstract: Fog and Edge computing extend cloud services to the proximity of end users, allowing many Internet of Things (IoT) use cases, particularly latency-critical applications. Smart devices, such as traffic and surveillance cameras, often do not have sufficient resources to process computation-intensive and latency-critical services. Hence, the constituent parts of services can be offloaded to nearby Ed… ▽ More Fog and Edge computing extend cloud services to the proximity of end users, allowing many Internet of Things (IoT) use cases, particularly latency-critical applications. Smart devices, such as traffic and surveillance cameras, often do not have sufficient resources to process computation-intensive and latency-critical services. Hence, the constituent parts of services can be offloaded to nearby Edge/Fog resources for processing and storage. However, making offloading decisions for complex services in highly stochastic and dynamic environments is an important, yet difficult task. Recently, Deep Reinforcement Learning (DRL) has been used in many complex service offloading problems; however, existing techniques are most suitable for centralized environments, and their convergence to the best-suitable solutions is slow. In addition, constituent parts of services often have predefined data dependencies and quality of service constraints, which further intensify the complexity of service offloading. To solve these issues, we propose a distributed DRL technique following the actor-critic architecture based on Asynchronous Proximal Policy Optimization (APPO) to achieve efficient and diverse distributed experience trajectory generation. Also, we employ PPO clipping and V-trace techniques for off-policy correction for faster convergence to the most suitable service offloading solutions. The results obtained demonstrate that our technique converges quickly, offers high scalability and adaptability, and outperforms its counterparts by improving the execution time of heterogeneous services. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2309.12592 [pdf, other]

ChainsFormer: A Chain Latency-aware Resource Provisioning Approach for Microservices Cluster

Authors: Chenghao Song, Minxian Xu, Kejiang Ye, Huaming Wu, Sukhpal Singh Gill, Rajkumar Buyya, Chengzhong Xu

Abstract: The trend towards transitioning from monolithic applications to microservices has been widely embraced in modern distributed systems and applications. This shift has resulted in the creation of lightweight, fine-grained, and self-contained microservices. Multiple microservices can be linked together via calls and inter-dependencies to form complex functions. One of the challenges in managing micro… ▽ More The trend towards transitioning from monolithic applications to microservices has been widely embraced in modern distributed systems and applications. This shift has resulted in the creation of lightweight, fine-grained, and self-contained microservices. Multiple microservices can be linked together via calls and inter-dependencies to form complex functions. One of the challenges in managing microservices is provisioning the optimal amount of resources for microservices in the chain to ensure application performance while improving resource usage efficiency. This paper presents ChainsFormer, a framework that analyzes microservice inter-dependencies to identify critical chains and nodes, and provision resources based on reinforcement learning. To analyze chains, ChainsFormer utilizes light-weight machine learning techniques to address the dynamic nature of microservice chains and workloads. For resource provisioning, a reinforcement learning approach is used that combines vertical and horizontal scaling to determine the amount of allocated resources and the number of replicates. We evaluate the effectiveness of ChainsFormer using realistic applications and traces on a real testbed based on Kubernetes. Our experimental results demonstrate that ChainsFormer can reduce response time by up to 26% and improve processed requests per second by 8% compared with state-of-the-art techniques. △ Less

Submitted 7 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 15 pages

Journal ref: In the Proceedings of International Conference on Service Oriented Computing (ICSOC 2023)

arXiv:2309.10671 [pdf, other]

CloudSimSC: A Toolkit for Modeling and Simulation of Serverless Computing Environments

Authors: Anupama Mampage, Rajkumar Buyya

Abstract: Serverless computing is gaining traction as an attractive model for the deployment of a multitude of workloads in the cloud. Designing and building effective resource management solutions for any computing environment requires extensive long term testing, experimentation and analysis of the achieved performance metrics. Utilizing real test beds and serverless platforms for such experimentation wor… ▽ More Serverless computing is gaining traction as an attractive model for the deployment of a multitude of workloads in the cloud. Designing and building effective resource management solutions for any computing environment requires extensive long term testing, experimentation and analysis of the achieved performance metrics. Utilizing real test beds and serverless platforms for such experimentation work is often times not possible due to resource, time and cost constraints. Thus, employing simulators to model these environments is key to overcoming the challenge of examining the viability of such novel ideas for resource management. Existing simulation software developed for serverless environments lack generalizibility in terms of their architecture as well as the various aspects of resource management, where most are purely focused on modeling function performance under a specific platform architecture. In contrast, we have developed a serverless simulation model with induced flexibility in its architecture as well as the key resource management aspects of function scheduling and scaling. Further, we incorporate techniques for easily deriving monitoring metrics required for evaluating any implemented solutions by users. Our work is presented as CloudSimSC, a modular extension to CloudSim which is a simulator tool extensively used for modeling cloud environments by the research community. We discuss the implemented features in our simulation tool using multiple use cases. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.07407 [pdf, other]

Deep Reinforcement Learning-based Scheduling for Optimizing System Load and Response Time in Edge and Fog Computing Environments

Authors: Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya

Abstract: Edge/fog computing, as a distributed computing paradigm, satisfies the low-latency requirements of ever-increasing number of IoT applications and has become the mainstream computing paradigm behind IoT applications. However, because large number of IoT applications require execution on the edge/fog resources, the servers may be overloaded. Hence, it may disrupt the edge/fog servers and also negati… ▽ More Edge/fog computing, as a distributed computing paradigm, satisfies the low-latency requirements of ever-increasing number of IoT applications and has become the mainstream computing paradigm behind IoT applications. However, because large number of IoT applications require execution on the edge/fog resources, the servers may be overloaded. Hence, it may disrupt the edge/fog servers and also negatively affect IoT applications' response time. Moreover, many IoT applications are composed of dependent components incurring extra constraints for their execution. Besides, edge/fog computing environments and IoT applications are inherently dynamic and stochastic. Thus, efficient and adaptive scheduling of IoT applications in heterogeneous edge/fog computing environments is of paramount importance. However, limited computational resources on edge/fog servers imposes an extra burden for applying optimal but computationally demanding techniques. To overcome these challenges, we propose a Deep Reinforcement Learning-based IoT application Scheduling algorithm, called DRLIS to adaptively and efficiently optimize the response time of heterogeneous IoT applications and balance the load of the edge/fog servers. We implemented DRLIS as a practical scheduler in the FogBus2 function-as-a-service framework for creating an edge-fog-cloud integrated serverless computing environment. Results obtained from extensive experiments show that DRLIS significantly reduces the execution cost of IoT applications by up to 55%, 37%, and 50% in terms of load balancing, response time, and weighted cost, respectively, compared with metaheuristic algorithms and other reinforcement learning techniques. △ Less

Submitted 22 October, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

arXiv:2308.11209 [pdf, other]

A Deep Reinforcement Learning based Algorithm for Time and Cost Optimized Scaling of Serverless Applications

Authors: Anupama Mampage, Shanika Karunasekera, Rajkumar Buyya

Abstract: Serverless computing has gained a strong traction in the cloud computing community in recent years. Among the many benefits of this novel computing model, the rapid auto-scaling capability of user applications takes prominence. However, the offer of adhoc scaling of user deployments at function level introduces many complications to serverless systems. The added delay and failures in function requ… ▽ More Serverless computing has gained a strong traction in the cloud computing community in recent years. Among the many benefits of this novel computing model, the rapid auto-scaling capability of user applications takes prominence. However, the offer of adhoc scaling of user deployments at function level introduces many complications to serverless systems. The added delay and failures in function request executions caused by the time consumed for dynamically creating new resources to suit function workloads, known as the cold-start delay, is one such very prevalent shortcoming. Maintaining idle resource pools to alleviate this issue often results in wasted resources from the cloud provider perspective. Existing solutions to address this limitation mostly focus on predicting and understanding function load levels in order to proactively create required resources. Although these solutions improve function performance, the lack of understanding on the overall system characteristics in making these scaling decisions often leads to the sub-optimal usage of system resources. Further, the multi-tenant nature of serverless systems requires a scalable solution adaptable for multiple co-existing applications, a limitation seen in most current solutions. In this paper, we introduce a novel multi-agent Deep Reinforcement Learning based intelligent solution for both horizontal and vertical scaling of function resources, based on a comprehensive understanding on both function and system requirements. Our solution elevates function performance reducing cold starts, while also offering the flexibility for optimizing resource maintenance cost to the service providers. Experiments conducted considering varying workload scenarios show improvements of up to 23% and 34% in terms of application latency and request failures, while also saving up to 45% in infrastructure cost for the service providers. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: 15 pages, 22 figures

arXiv:2308.10559 [pdf]

Metaverse: A Vision, Architectural Elements, and Future Directions for Scalable and Realtime Virtual Worlds

Authors: Leila Ismail, Rajkumar Buyya

Abstract: With the emergence of Cloud computing, Internet of Things-enabled Human-Computer Interfaces, Generative Artificial Intelligence, and high-accurate Machine and Deep-learning recognition and predictive models, along with the Post Covid-19 proliferation of social networking, and remote communications, the Metaverse gained a lot of popularity. Metaverse has the prospective to extend the physical world… ▽ More With the emergence of Cloud computing, Internet of Things-enabled Human-Computer Interfaces, Generative Artificial Intelligence, and high-accurate Machine and Deep-learning recognition and predictive models, along with the Post Covid-19 proliferation of social networking, and remote communications, the Metaverse gained a lot of popularity. Metaverse has the prospective to extend the physical world using virtual and augmented reality so the users can interact seamlessly with the real and virtual worlds using avatars and holograms. It has the potential to impact people in the way they interact on social media, collaborate in their work, perform marketing and business, teach, learn, and even access personalized healthcare. Several works in the literature examine Metaverse in terms of hardware wearable devices, and virtual reality gaming applications. However, the requirements of realizing the Metaverse in realtime and at a large-scale need yet to be examined for the technology to be usable. To address this limitation, this paper presents the temporal evolution of Metaverse definitions and captures its evolving requirements. Consequently, we provide insights into Metaverse requirements. In addition to enabling technologies, we lay out architectural elements for scalable, reliable, and efficient Metaverse systems, and a classification of existing Metaverse applications along with proposing required future research directions. △ Less

Submitted 24 August, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

MSC Class: cs.HC; cs.DC

arXiv:2308.07541 [pdf, other]

Reinforcement Learning (RL) Augmented Cold Start Frequency Reduction in Serverless Computing

Authors: Siddharth Agarwal, Maria A. Rodriguez, Rajkumar Buyya

Abstract: Function-as-a-Service is a cloud computing paradigm offering an event-driven execution model to applications. It features serverless attributes by eliminating resource management responsibilities from developers and offers transparent and on-demand scalability of applications. Typical serverless applications have stringent response time and scalability requirements and therefore rely on deployed s… ▽ More Function-as-a-Service is a cloud computing paradigm offering an event-driven execution model to applications. It features serverless attributes by eliminating resource management responsibilities from developers and offers transparent and on-demand scalability of applications. Typical serverless applications have stringent response time and scalability requirements and therefore rely on deployed services to provide quick and fault-tolerant feedback to clients. However, the FaaS paradigm suffers from cold starts as there is a non-negligible delay associated with on-demand function initialization. This work focuses on reducing the frequency of cold starts on the platform by using Reinforcement Learning. Our approach uses Q-learning and considers metrics such as function CPU utilization, existing function instances, and response failure rate to proactively initialize functions in advance based on the expected demand. The proposed solution was implemented on Kubeless and was evaluated using a normalised real-world function demand trace with matrix multiplication as the workload. The results demonstrate a favourable performance of the RL-based agent when compared to Kubeless' default policy and function keep-alive policy by improving throughput by up to 8.81% and reducing computation load and resource wastage by up to 55% and 37%, respectively, which is a direct outcome of reduced cold starts. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 13 figures, 10 pages, 3 tables

arXiv:2308.05937 [pdf, other]

A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions

Authors: Siddharth Agarwal, Maria A. Rodriguez, Rajkumar Buyya

Abstract: Function-as-a-Service (FaaS) introduces a lightweight, function-based cloud execution model that finds its relevance in applications like IoT-edge data processing and anomaly detection. While CSP offer a near-infinite function elasticity, these applications often experience fluctuating workloads and stricter performance constraints. A typical CSP strategy is to empirically determine and adjust des… ▽ More Function-as-a-Service (FaaS) introduces a lightweight, function-based cloud execution model that finds its relevance in applications like IoT-edge data processing and anomaly detection. While CSP offer a near-infinite function elasticity, these applications often experience fluctuating workloads and stricter performance constraints. A typical CSP strategy is to empirically determine and adjust desired function instances, "autoscaling", based on monitoring-based thresholds such as CPU or memory, to cope with demand and performance. However, threshold configuration either requires expert knowledge, historical data or a complete view of environment, making autoscaling a performance bottleneck lacking an adaptable solution.RL algorithms are proven to be beneficial in analysing complex cloud environments and result in an adaptable policy that maximizes the expected objectives. Most realistic cloud environments usually involve operational interference and have limited visibility, making them partially observable. A general solution to tackle observability in highly dynamic settings is to integrate Recurrent units with model-free RL algorithms and model a decision process as a POMDP. Therefore, in this paper, we investigate a model-free Recurrent RL agent for function autoscaling and compare it against the model-free Proximal Policy Optimisation (PPO) algorithm. We explore the integration of a LSTM network with the state-of-the-art PPO algorithm to find that under our experimental and evaluation settings, recurrent policies were able to capture the environment parameters and show promising results for function autoscaling. We further compare a PPO-based autoscaling agent with commercially used threshold-based function autoscaling and posit that a LSTM-based autoscaling agent is able to improve throughput by 18%, function execution by 13% and account for 8.4% more function instances. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 12 pages, 13 figures, 4 tables

arXiv:2308.02834 [pdf, other]

FLight: A Lightweight Federated Learning Framework in Edge and Fog Computing

Authors: Wuji Zhu, Mohammad Goudarzi, Rajkumar Buyya

Abstract: The number of Internet of Things (IoT) applications, especially latency-sensitive ones, have been significantly increased. So, Cloud computing, as one of the main enablers of the IoT that offers centralized services, cannot solely satisfy the requirements of IoT applications. Edge/Fog computing, as a distributed computing paradigm, processes, and stores IoT data at the edge of the network, offerin… ▽ More The number of Internet of Things (IoT) applications, especially latency-sensitive ones, have been significantly increased. So, Cloud computing, as one of the main enablers of the IoT that offers centralized services, cannot solely satisfy the requirements of IoT applications. Edge/Fog computing, as a distributed computing paradigm, processes, and stores IoT data at the edge of the network, offering low latency, reduced network traffic, and higher bandwidth. The Edge/Fog resources are often less powerful compared to Cloud, and IoT data is dispersed among many geo-distributed servers. Hence, Federated Learning (FL), which is a machine learning approach that enables multiple distributed servers to collaborate on building models without exchanging the raw data, is well-suited to Edge/Fog computing environments, where data privacy is of paramount importance. Besides, to manage different FL tasks on Edge/Fog computing environments, a lightweight resource management framework is required to manage different incoming FL tasks while does not incur significant overhead on the system. Accordingly, in this paper, we propose a lightweight FL framework, called FLight, to be deployed on a diverse range of devices, ranging from resource limited Edge/Fog devices to powerful Cloud servers. FLight is implemented based on the FogBus2 framework, which is a containerized distributed resource management framework. Moreover, FLight integrates both synchronous and asynchronous models of FL. Besides, we propose a lightweight heuristic-based worker selection algorithm to select a suitable set of available workers to participate in the training step to obtain higher training time efficiency. The obtained results demonstrate the efficiency of the FLight. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2211.07238

arXiv:2306.03823 [pdf]

doi 10.1016/j.iotcps.2023.06.002

Transformative Effects of ChatGPT on Modern Education: Emerging Era of AI Chatbots

Authors: Sukhpal Singh Gill, Minxian Xu, Panos Patros, Huaming Wu, Rupinder Kaur, Kamalpreet Kaur, Stephanie Fuller, Manmeet Singh, Priyansh Arora, Ajith Kumar Parlikad, Vlado Stankovski, Ajith Abraham, Soumya K. Ghosh, Hanan Lutfiyya, Salil S. Kanhere, Rami Bahsoon, Omer Rana, Schahram Dustdar, Rizos Sakellariou, Steve Uhlig, Rajkumar Buyya

Abstract: ChatGPT, an AI-based chatbot, was released to provide coherent and useful replies based on analysis of large volumes of data. In this article, leading scientists, researchers and engineers discuss the transformative effects of ChatGPT on modern education. This research seeks to improve our knowledge of ChatGPT capabilities and its use in the education sector, identifying potential concerns and cha… ▽ More ChatGPT, an AI-based chatbot, was released to provide coherent and useful replies based on analysis of large volumes of data. In this article, leading scientists, researchers and engineers discuss the transformative effects of ChatGPT on modern education. This research seeks to improve our knowledge of ChatGPT capabilities and its use in the education sector, identifying potential concerns and challenges. Our preliminary evaluation concludes that ChatGPT performed differently in each subject area including finance, coding and maths. While ChatGPT has the ability to help educators by creating instructional content, offering suggestions and acting as an online educator to learners by answering questions and promoting group work, there are clear drawbacks in its use, such as the possibility of producing inaccurate or false data and circumventing duplicate content (plagiarism) detectors where originality is essential. The often reported hallucinations within Generative AI in general, and also relevant for ChatGPT, can render its use of limited benefit where accuracy is essential. What ChatGPT lacks is a stochastic measure to help provide sincere and sensitive communication with its users. Academic regulations and evaluation practices used in educational institutions need to be updated, should ChatGPT be used as a tool in education. To address the transformative effects of ChatGPT on the learning environment, educating teachers and students alike about its capabilities and limitations will be crucial. △ Less

Submitted 25 May, 2023; originally announced June 2023.

Comments: Preprint submitted to IoTCPS Elsevier (2023)

Journal ref: Internet of Things and Cyber-Physical Systems (Elsevier), Volume 4, 2024, Pages 19-23

arXiv:2304.04450 [pdf]

doi 10.1002/spe.3340

Sustainable Edge Computing: Challenges and Future Directions

Authors: Patricia Arroba, Rajkumar Buyya, Román Cárdenas, José L. Risco-Martín, José M. Moya

Abstract: An increasing amount of data is being injected into the network from IoT (Internet of Things) applications. Many of these applications, developed to improve society's quality of life, are latency-critical and inject large amounts of data into the network. These requirements of IoT applications trigger the emergence of Edge computing paradigm. Currently, data centers are responsible for a global en… ▽ More An increasing amount of data is being injected into the network from IoT (Internet of Things) applications. Many of these applications, developed to improve society's quality of life, are latency-critical and inject large amounts of data into the network. These requirements of IoT applications trigger the emergence of Edge computing paradigm. Currently, data centers are responsible for a global energy use between 2% and 3%. However, this trend is difficult to maintain, as bringing computing infrastructures closer to the edge of the network comes with its own set of challenges for energy efficiency. In this paper, we propose our approach for the sustainability of future computing infrastructures to provide (i) an energy-efficient and economically viable deployment, (ii) a fault-tolerant automated operation, and (iii) a collaborative resource management to improve resource efficiency. We identify the main limitations of applying Cloud-based approaches close to the data sources and present the research challenges to Edge sustainability arising from these constraints. We propose two-phase immersion cooling, formal modeling, machine learning, and energy-centric federated management as Edge-enabling technologies. We present our early results towards the sustainability of an Edge infrastructure to demonstrate the benefits of our approach for future computing environments and deployments. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: 26 pages, 16 figures

arXiv:2303.15729 [pdf, other]

doi 10.1109/QSW59989.2023.00013

iQuantum: A Case for Modeling and Simulation of Quantum Computing Environments

Authors: Hoa T. Nguyen, Muhammad Usman, Rajkumar Buyya

Abstract: Today's quantum computers are primarily accessible through the cloud and potentially shifting to the edge network in the future. With the rapid advancement and proliferation of quantum computing research worldwide, there has been a considerable increase in demand for using cloud-based quantum computation resources. This demand has highlighted the need for designing efficient and adaptable resource… ▽ More Today's quantum computers are primarily accessible through the cloud and potentially shifting to the edge network in the future. With the rapid advancement and proliferation of quantum computing research worldwide, there has been a considerable increase in demand for using cloud-based quantum computation resources. This demand has highlighted the need for designing efficient and adaptable resource management strategies and service models for quantum computing. However, the limited quantity, quality, and accessibility of quantum resources pose significant challenges to practical research in quantum software and systems. To address these challenges, we propose iQuantum, a first-of-its-kind simulation toolkit that can model hybrid quantum-classical computing environments for prototyping and evaluating system design and scheduling algorithms. This paper presents the quantum computing system model, architectural design, proof-of-concept implementation, potential use cases, and future development of iQuantum. Our proposed iQuantum simulator is anticipated to boost research in quantum software and systems, particularly in the creation and evaluation of policies and algorithms for resource management, job scheduling, and hybrid quantum-classical task orchestration in quantum computing environments integrating edge and cloud resources. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: 10 pages, 8 figures

arXiv:2303.10572 [pdf]

Energy-Efficiency and Sustainability in New Generation Cloud Computing: A Vision and Directions for Integrated Management of Data Centre Resources and Workloads

Authors: Rajkumar Buyya, Shashikant Ilager, Patricia Arroba

Abstract: Cloud computing has become a critical infrastructure for modern society, like electric power grids and roads. As the backbone of the modern economy, it offers subscription-based computing services anytime, anywhere, on a pay-as-you-go basis. Its use is growing exponentially with the continued development of new classes of applications driven by a huge number of emerging networked devices. However,… ▽ More Cloud computing has become a critical infrastructure for modern society, like electric power grids and roads. As the backbone of the modern economy, it offers subscription-based computing services anytime, anywhere, on a pay-as-you-go basis. Its use is growing exponentially with the continued development of new classes of applications driven by a huge number of emerging networked devices. However, the success of Cloud computing has created a new global energy challenge, as it comes at the cost of vast energy usage. Currently, data centres hosting Cloud services world-wide consume more energy than most countries. Globally, by 2025, they are projected to consume 20% of global electricity and emit up to 5.5% of the world's carbon emissions. In addition, a significant part of the energy consumed is transformed into heat which leads to operational problems, including a reduction in system reliability and the life expectancy of devices, and escalation in cooling requirements. Therefore, for future generations of Cloud computing to address the environmental and operational consequences of such significant energy usage, they must become energy-efficient and environmentally sustainable while continuing to deliver high-quality services. In this paper, we propose a vision for learning-centric approach for the integrated management of new generation Cloud computing environments to reduce their energy consumption and carbon footprint while delivering service quality guarantees. In this paper, we identify the dimensions and key issues of integrated resource management and our envisioned approaches to address them. We present a conceptual architecture for energy-efficient new generation Clouds and early results on the integrated management of resources and workloads that evidence its potential benefits towards energy efficiency and sustainability. △ Less

Submitted 20 July, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

Comments: 15 pages, 6 figures

ACM Class: C.1.4

arXiv:2303.06896 [pdf]

Quality of Service (QoS)-driven Edge Computing and Smart Hospitals: A Vision, Architectural Elements, and Future Directions

Authors: Rajkumar Buyya, Satish N. Srirama, Redowan Mahmud, Mohammad Goudarzi, Leila Ismail, Vassilis Kostakos

Abstract: The Internet of Things (IoT) paradigm is drastically changing our world by making everyday objects an integral part of the Internet. This transformation is increasingly being adopted in the healthcare sector, where Smart Hospitals are now relying on IoT technologies to track staff, patients, devices, and equipment, both within a hospital and beyond. This paradigm opens the door to new innovations… ▽ More The Internet of Things (IoT) paradigm is drastically changing our world by making everyday objects an integral part of the Internet. This transformation is increasingly being adopted in the healthcare sector, where Smart Hospitals are now relying on IoT technologies to track staff, patients, devices, and equipment, both within a hospital and beyond. This paradigm opens the door to new innovations for creating novel types of interactions among objects, services, and people in smarter ways to enhance the quality of patient services and the efficient utilisation of resources. However, the realisation of real-time IoT applications in healthcare and, ultimately, the development of Smart Hospitals are constrained by their current Cloud-based computing environment. Edge computing emerged as a new computing model that harnesses edge-based resources alongside Clouds for real-time IoT applications. It helps to capitalise on the potential economic impact of the IoT paradigm of $11 trillion per year, with a trillion IoT devices deployed by 2025 to sense, manage and monitor the hospital systems in real-time. This vision paper proposes new algorithms and software systems to tackle important challenges in Edge computing-enabled Smart Hospitals, including how to manage and execute diverse real-time IoT applications and how to meet their diverse and strict Quality of Service (QoS) requirements in hospital settings. The vision we outline can help tackle timely challenges that hospitals increasingly face. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 14 pages, 6 figures

ACM Class: C.1.4

arXiv:2302.06971 [pdf, other]

MicroFog: A Framework for Scalable Placement of Microservices-based IoT Applications in Federated Fog Environments

Authors: Samodha Pallewatta, Vassilis Kostakos, Rajkumar Buyya

Abstract: MicroService Architecture (MSA) is gaining rapid popularity for developing large-scale IoT applications for deployment within distributed and resource-constrained Fog computing environments. As a cloud-native application architecture, the true power of microservices comes from their loosely coupled, independently deployable and scalable nature, enabling distributed placement and dynamic compositio… ▽ More MicroService Architecture (MSA) is gaining rapid popularity for developing large-scale IoT applications for deployment within distributed and resource-constrained Fog computing environments. As a cloud-native application architecture, the true power of microservices comes from their loosely coupled, independently deployable and scalable nature, enabling distributed placement and dynamic composition across federated Fog and Cloud clusters. Thus, it is necessary to develop novel microservice placement algorithms that utilise these microservice characteristics to improve the performance of the applications. However, existing Fog computing frameworks lack support for integrating such placement policies due to their shortcomings in multiple areas, including MSA application placement and deployment across multi-fog multi-cloud environments, dynamic microservice composition across multiple distributed clusters, scalability of the framework, support for deploying heterogeneous microservice applications, etc. To this end, we design and implement MicroFog, a Fog computing framework providing a scalable, easy-to-configure control engine that executes placement algorithms and deploys applications across federated Fog environments. Furthermore, MicroFog provides a sufficient abstraction over container orchestration and dynamic microservice composition. The framework is evaluated using multiple use cases. The results demonstrate that MicroFog is a scalable, extensible and easy-to-configure framework that can integrate and evaluate novel placement policies for deploying microservice-based applications within multi-fog multi-cloud environments. We integrate multiple microservice placement policies to demonstrate MicroFog's ability to support horizontally scaled placement, thus reducing the application service response time up to 54%. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2302.03885 [pdf]

doi 10.1201/9781003145189

Classification of Methods to Reduce Clinical Alarm Signals for Remote Patient Monitoring: A Critical Review

Authors: Teena Arora, Venki Balasubramanian, Andrew Stranieri, Shenhan Mai, Rajkumar Buyya, Sardar Islam

Abstract: Remote Patient Monitoring (RPM) is an emerging technology paradigm that helps reduce clinician workload by automated monitoring and raising intelligent alarm signals. High sensitivity and intelligent data-processing algorithms used in RPM devices result in frequent false-positive alarms, resulting in alarm fatigue. This study aims to critically review the existing literature to identify the causes… ▽ More Remote Patient Monitoring (RPM) is an emerging technology paradigm that helps reduce clinician workload by automated monitoring and raising intelligent alarm signals. High sensitivity and intelligent data-processing algorithms used in RPM devices result in frequent false-positive alarms, resulting in alarm fatigue. This study aims to critically review the existing literature to identify the causes of these false-positive alarms and categorize the various interventions used in the literature to eliminate these causes. That act as a catalog and helps in false alarm reduction algorithm design. A step-by-step approach to building an effective alarm signal generator for clinical use has been proposed in this work. Second, the possible causes of false-positive alarms amongst RPM applications were analyzed from the literature. Third, a critical review has been done of the various interventions used in the literature depending on causes and classification based on four major approaches: clinical knowledge, physiological data, medical sensor devices, and clinical environments. A practical clinical alarm strategy could be developed by following our pentagon approach. The first phase of this approach emphasizes identifying the various causes for the high number of false-positive alarms. Future research will focus on developing a false alarm reduction method using data mining. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: 25 pages, 6 figures

ACM Class: A.1

arXiv:2211.11523 [pdf, other]

doi 10.3389/friot.2022.1073780

Revisiting the Internet of Things: New Trends, Opportunities and Grand Challenges

Authors: Khalid Elgazzar, Haytham Khalil, Taghreed Alghamdi, Ahmed Badr, Ghadeer Abdelkader, Abdelrahman Elewah, Rajkumar Buyya

Abstract: The Internet of Things (IoT) has brought the dream of ubiquitous data access from physical environments into reality. IoT embeds sensors and actuators in physical objects so that they can communicate and exchange data between themselves to improve efficiency along with enabling real-time intelligent services and offering better quality of life to people. The number of deployed IoT devices has rapi… ▽ More The Internet of Things (IoT) has brought the dream of ubiquitous data access from physical environments into reality. IoT embeds sensors and actuators in physical objects so that they can communicate and exchange data between themselves to improve efficiency along with enabling real-time intelligent services and offering better quality of life to people. The number of deployed IoT devices has rapidly grown in the past five years in a way that makes IoT the most disruptive technology in recent history. In this paper, we reevaluate the position of IoT in our life and provide deep insights on its enabling technologies, applications, rising trends and grand challenges. The paper also highlights the role of artificial intelligence to make IoT the top transformative technology that has been ever developed in human history. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Journal ref: Front. Internet. Things 1:1073780 (2022)

arXiv:2208.01218 [pdf]

SLA Management in Intent-Driven Service Management Systems: A Taxonomy and Future Directions

Authors: Yogesh Sharma, Deval Bhamare, Nishanth Sastry, Bahman Javadi, RajKumar Buyya

Abstract: Traditionally, network and system administrators are responsible for designing, configuring, and resolving the Internet service requests. Human-driven system configuration and management are proving unsatisfactory due to the recent interest in time-sensitive applications with stringent quality of service (QoS). Aiming to transition from the traditional human-driven to zero-touch service management… ▽ More Traditionally, network and system administrators are responsible for designing, configuring, and resolving the Internet service requests. Human-driven system configuration and management are proving unsatisfactory due to the recent interest in time-sensitive applications with stringent quality of service (QoS). Aiming to transition from the traditional human-driven to zero-touch service management in the field of networks and computing, intent-driven service management (IDSM) has been proposed as a response to stringent quality of service requirements. In IDSM, users express their service requirements in a declarative manner as intents. IDSM, with the help of closed control-loop operations, perform configurations and deployments, autonomously to meet service request requirements. The result is a faster deployment of Internet services and reduction in configuration errors caused by manual operations, which in turn reduces the service-level agreement (SLA) violations. In the early stages of development, IDSM systems require attention from industry as well as academia. In an attempt to fill the gaps in current research, we conducted a systematic literature review of SLA management in IDSM systems. As an outcome, we have identified four IDSM intent management activities and proposed a taxonomy for each activity. Analysis of all studies and future research directions, are presented in the conclusions. △ Less

Submitted 26 May, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: Accepted for ACM Computing Surveys (CSUR) in March 2023

arXiv:2208.00761 [pdf, other]

AI Augmented Edge and Fog Computing: Trends and Challenges

Authors: Shreshth Tuli, Fatemeh Mirhakimi, Samodha Pallewatta, Syed Zawad, Giuliano Casale, Bahman Javadi, Feng Yan, Rajkumar Buyya, Nicholas R. Jennings

Abstract: In recent years, the landscape of computing paradigms has witnessed a gradual yet remarkable shift from monolithic computing to distributed and decentralized paradigms such as Internet of Things (IoT), Edge, Fog, Cloud, and Serverless. The frontiers of these computing technologies have been boosted by shift from manually encoded algorithms to Artificial Intelligence (AI)-driven autonomous systems… ▽ More In recent years, the landscape of computing paradigms has witnessed a gradual yet remarkable shift from monolithic computing to distributed and decentralized paradigms such as Internet of Things (IoT), Edge, Fog, Cloud, and Serverless. The frontiers of these computing technologies have been boosted by shift from manually encoded algorithms to Artificial Intelligence (AI)-driven autonomous systems for optimum and reliable management of distributed computing resources. Prior work focuses on improving existing systems using AI across a wide range of domains, such as efficient resource provisioning, application deployment, task placement, and service management. This survey reviews the evolution of data-driven AI-augmented technologies and their impact on computing systems. We demystify new techniques and draw key insights in Edge, Fog and Cloud resource management-related uses of AI methods and also look at how AI can innovate traditional applications for enhanced Quality of Service (QoS) in the presence of a continuum of resources. We present the latest trends and impact areas such as optimizing AI models that are deployed on or for computing systems. We layout a roadmap for future research directions in areas such as resource management for QoS optimization and service reliability. Finally, we discuss blue-sky ideas and envision this work as an anchor point for future research on AI-driven computing systems. △ Less

Submitted 14 April, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: Accepted in Elsevier Journal of Network and Computer Applications

arXiv:2207.05399 [pdf, other]

Placement of Microservices-based IoT Applications in Fog Computing: A Taxonomy and Future Directions

Authors: Samodha Pallewatta, Vassilis Kostakos, Rajkumar Buyya

Abstract: The Fog computing paradigm utilises distributed, heterogeneous and resource-constrained devices at the edge of the network for efficient deployment of latency-critical and bandwidth-hungry IoT application services. Moreover, MicroService Architecture (MSA) is increasingly adopted to keep up with the rapid development and deployment needs of fast-evolving IoT applications. Due to the fine-grained m… ▽ More The Fog computing paradigm utilises distributed, heterogeneous and resource-constrained devices at the edge of the network for efficient deployment of latency-critical and bandwidth-hungry IoT application services. Moreover, MicroService Architecture (MSA) is increasingly adopted to keep up with the rapid development and deployment needs of fast-evolving IoT applications. Due to the fine-grained modularity of the microservices and their independently deployable and scalable nature, MSA exhibits great potential in harnessing Fog and Cloud resources, thus giving rise to novel paradigms like Osmotic computing. The loosely coupled nature of the microservices, aided by the container orchestrators and service mesh technologies, enables the dynamic composition of distributed and scalable microservices to achieve diverse performance requirements of the IoT applications using distributed Fog resources. To this end, efficient placement of microservice plays a vital role, and scalable placement algorithms are required to utilise the said characteristics of the MSA while overcoming novel challenges introduced by the architecture. Thus, we present a comprehensive taxonomy of recent literature on microservices-based IoT applications placement within Fog computing environments. Furthermore, we organise multiple taxonomies to capture the main aspects of the placement problem, analyse and classify related works, identify research gaps within each category, and discuss future research directions. △ Less

Submitted 14 February, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

Comments: 41 pages, 11 figures, submitted to ACM Computing Surveys

arXiv:2206.10473 [pdf]

doi 10.1002/spy2.200

Securing the Future Internet of Things with Post-Quantum Cryptography

Authors: Adarsh Kumar, Carlo Ottaviani, Sukhpal Singh Gill, Rajkumar Buyya

Abstract: Traditional and lightweight cryptography primitives and protocols are insecure against quantum attacks. Thus, a real-time application using traditional or lightweight cryptography primitives and protocols does not ensure full-proof security. Post-quantum Cryptography is important for the Internet of Things (IoT) due to its security against Quantum attacks. This paper offers a broad literature anal… ▽ More Traditional and lightweight cryptography primitives and protocols are insecure against quantum attacks. Thus, a real-time application using traditional or lightweight cryptography primitives and protocols does not ensure full-proof security. Post-quantum Cryptography is important for the Internet of Things (IoT) due to its security against Quantum attacks. This paper offers a broad literature analysis of post-quantum cryptography for IoT networks, including the challenges and research directions to adopt in real-time applications. The work draws focus towards post-quantum cryptosystems that are useful for resource-constraint devices. Further, those quantum attacks are surveyed, which may occur over traditional and lightweight cryptographic primitives. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: Accepted version

Journal ref: Security and Privacy 5 (2), e20, 2022

arXiv:2205.14845 [pdf, other]

doi 10.1016/j.future.2024.01.018

QFaaS: A Serverless Function-as-a-Service Framework for Quantum Computing

Authors: Hoa T. Nguyen, Muhammad Usman, Rajkumar Buyya

Abstract: Recent breakthroughs in quantum hardware are creating opportunities for its use in many applications. However, quantum software engineering is still in its infancy with many challenges, especially dealing with the diversity of quantum programming languages and hardware platforms. To alleviate these challenges, we propose QFaaS, a novel Quantum Function-as-a-Service framework, which leverages the a… ▽ More Recent breakthroughs in quantum hardware are creating opportunities for its use in many applications. However, quantum software engineering is still in its infancy with many challenges, especially dealing with the diversity of quantum programming languages and hardware platforms. To alleviate these challenges, we propose QFaaS, a novel Quantum Function-as-a-Service framework, which leverages the advantages of the serverless model and the state-of-the-art software engineering approaches to advance practical quantum computing. Our framework provides essential components of a quantum serverless platform to simplify the software development and adapt to the quantum cloud computing paradigm, such as combining hybrid quantum-classical computation, containerizing functions, and integrating DevOps features. We design QFaaS as a unified quantum computing framework by supporting well-known quantum languages and software development kits (Qiskit, Q#, Cirq, and Braket), executing the quantum tasks on multiple simulators and quantum cloud providers (IBM Quantum and Amazon Braket). This paper proposes architectural design, principal components, the life cycle of hybrid quantum-classical function, operation workflow, and implementation of QFaaS. We present two practical use cases and perform the evaluations on quantum computers and simulators to demonstrate our framework's ability to ease the burden on traditional engineers to expedite the ongoing quantum software transition. △ Less

Submitted 30 May, 2022; originally announced May 2022.

Comments: 35 pages, 15 figures

Journal ref: Future Generation Computer Systems (FGCS) 154 (2024) 281-300

arXiv:2204.12580 [pdf, other]

Scheduling IoT Applications in Edge and Fog Computing Environments: A Taxonomy and Future Directions

Authors: Mohammad Goudarzi, Marimuthu Palaniswami, Rajkumar Buyya

Abstract: Fog computing, as a distributed paradigm, offers cloud-like services at the edge of the network with low latency and high-access bandwidth to support a diverse range of IoT application scenarios. To fully utilize the potential of this computing paradigm, scalable, adaptive, and accurate scheduling mechanisms and algorithms are required to efficiently capture the dynamics and requirements of users,… ▽ More Fog computing, as a distributed paradigm, offers cloud-like services at the edge of the network with low latency and high-access bandwidth to support a diverse range of IoT application scenarios. To fully utilize the potential of this computing paradigm, scalable, adaptive, and accurate scheduling mechanisms and algorithms are required to efficiently capture the dynamics and requirements of users, IoT applications, environmental properties, and optimization targets. This paper presents a taxonomy of recent literature on scheduling IoT applications in Fog computing. Based on our new classification schemes, current works in the literature are analyzed, research gaps of each category are identified, and respective future directions are described. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: ACM Computing Surveys (CSUR): Revised

arXiv:2203.05161 [pdf, other]

Container Orchestration in Edge and Fog Computing Environments for Real-Time IoT Applications

Authors: Zhiyu Wang, Mohammad Goudarzi, Jagannath Aryal, Rajkumar Buyya

Abstract: Resource management is the principal factor to fully utilize the potential of Edge/Fog computing to execute real-time and critical IoT applications. Although some resource management frameworks exist, the majority are not designed based on distributed containerized components. Hence, they are not suitable for highly distributed and heterogeneous computing environments. Containerized resource manag… ▽ More Resource management is the principal factor to fully utilize the potential of Edge/Fog computing to execute real-time and critical IoT applications. Although some resource management frameworks exist, the majority are not designed based on distributed containerized components. Hence, they are not suitable for highly distributed and heterogeneous computing environments. Containerized resource management frameworks such as FogBus2 enable efficient distribution of framework's components alongside IoT applications' components. However, the management, deployment, health-check, and scalability of a large number of containers are challenging issues. To orchestrate a multitude of containers, several orchestration tools are developed. But, many of these orchestration tools are heavy-weight and have a high overhead, especially for resource-limited Edge/Fog nodes. Thus, for hybrid computing environments, consisting of heterogeneous Edge/Fog and/or Cloud nodes, lightweight container orchestration tools are required to support both resource-limited resources at the Edge/Fog and resource-rich resources at the Cloud. Thus, in this paper, we propose a feasible approach to build a hybrid and lightweight cluster based on K3s, for the FogBus2 framework that offers containerized resource management framework. This work addresses the challenge of creating lightweight computing clusters in hybrid computing environments. It also proposes three design patterns for the deployment of the FogBus2 framework in hybrid environments, including 1) Host Network, 2) Proxy Server, and 3) Environment Variable. The performance evaluation shows that the proposed approach improves the response time of real-time IoT applications up to 29% with acceptable and low overhead. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: 20 pages, 10 figures

arXiv:2112.02593 [pdf, other]

A Taxonomy of Live Migration Management in Cloud Computing

Authors: TianZhang He, Rajkumar Buyya

Abstract: Cloud Data Centers have become the backbone infrastructure to provide services. With the emerging edge computing paradigm, computation and networking capabilities have been pushed from clouds to the edge to provide computation, intelligence, networking management with low end-to-end latency. Service migration across different computing nodes in edge and cloud computing becomes essential to guarant… ▽ More Cloud Data Centers have become the backbone infrastructure to provide services. With the emerging edge computing paradigm, computation and networking capabilities have been pushed from clouds to the edge to provide computation, intelligence, networking management with low end-to-end latency. Service migration across different computing nodes in edge and cloud computing becomes essential to guarantee the quality of service in the dynamic environment. Many studies have been conducted on the dynamic resource management involving migrating Virtual Machines to achieve various objectives, such as load balancing, consolidation, performance, energy-saving, and disaster recovery. Some have investigated to improve and predict the performance of single live migration of VM and container. Recently, several studies service migration in the edge-centric computing paradigms. However, there is a lack of surveys to focus on the live migration management in edge and cloud computing environments. We examine the characteristics of each field and conduct a migration management-centric taxonomy and survey. We also identify the gap and research opportunities to guarantee the performance of resource management with live migrations. △ Less

Submitted 5 December, 2021; originally announced December 2021.

Comments: 35 pagees, 10 figures, 4 tables

arXiv:2112.02267 [pdf]

Integration of FogBus2 Framework with Container Orchestration Tools in Cloud and Edge Computing Environments

Authors: Zhiyu Wang, Rajkumar Buyya

Abstract: Currently, due to the advantages of light weight, simple deployment, multi-environment support, short startup time, scalability, and easy migration, container technology has been widely used in both cloud and edge/fog computing, and addresses the problem of device heterogeneity in different computing environments. On this basis, as one of the most popular container orchestration and management sys… ▽ More Currently, due to the advantages of light weight, simple deployment, multi-environment support, short startup time, scalability, and easy migration, container technology has been widely used in both cloud and edge/fog computing, and addresses the problem of device heterogeneity in different computing environments. On this basis, as one of the most popular container orchestration and management systems, Kubernetes almost dominates the cloud environment. However, since it is primarily designed for centralized resource management scenarios where computing resources are sufficient, the system is unstable in edge environments due to hardware limitations. Therefore, in order to realize container orchestration in the cloud and edge/fog hybrid computing environment, we propose a feasible approach to build a hybrid clustering based on K3s, which solves the problem that virtual instances in different environments cannot be connected due to IP addresses. We also propose three design patterns for deploying the FogBus2 framework into hybrid environments, including 1) Host Network Mode, 2) Proxy Server, and 3) Environment Variable. △ Less

Submitted 11 December, 2021; v1 submitted 4 December, 2021; originally announced December 2021.

arXiv:2111.10241 [pdf, other]

START: Straggler Prediction and Mitigation for Cloud Computing Environments using Encoder LSTM Networks

Authors: Shreshth Tuli, Sukhpal Singh Gill, Peter Garraghan, Rajkumar Buyya, Giuliano Casale, Nicholas R. Jennings

Abstract: Modern large-scale computing systems distribute jobs into multiple smaller tasks which execute in parallel to accelerate job completion rates and reduce energy consumption. However, a common performance problem in such systems is dealing with straggler tasks that are slow running instances that increase the overall response time. Such tasks can significantly impact the system's Quality of Service… ▽ More Modern large-scale computing systems distribute jobs into multiple smaller tasks which execute in parallel to accelerate job completion rates and reduce energy consumption. However, a common performance problem in such systems is dealing with straggler tasks that are slow running instances that increase the overall response time. Such tasks can significantly impact the system's Quality of Service (QoS) and the Service Level Agreements (SLA). To combat this issue, there is a need for automatic straggler detection and mitigation mechanisms that execute jobs without violating the SLA. Prior work typically builds reactive models that focus first on detection and then mitigation of straggler tasks, which leads to delays. Other works use prediction based proactive mechanisms, but ignore heterogeneous host or volatile task characteristics. In this paper, we propose a Straggler Prediction and Mitigation Technique (START) that is able to predict which tasks might be stragglers and dynamically adapt scheduling to achieve lower response times. Our technique analyzes all tasks and hosts based on compute and network resource consumption using an Encoder Long-Short-Term-Memory (LSTM) network. The output of this network is then used to predict and mitigate expected straggler tasks. This reduces the SLA violation rate and execution time without compromising QoS. Specifically, we use the CloudSim toolkit to simulate START in a cloud environment and compare it with state-of-the-art techniques (IGRU-SD, SGC, Dolly, GRASS, NearestFit and Wrangler) in terms of QoS parameters such as energy consumption, execution time, resource contention, CPU utilization and SLA violation rate. Experiments show that START reduces execution time, resource contention, energy and SLA violations by 13%, 11%, 16% and 19%, respectively, compared to the state-of-the-art approaches. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: Accepted in IEEE Transactions on Services Computing, 2021

arXiv:2111.08942 [pdf, other]

doi 10.1109/TPDS.2021.3139014

CAMIG: Concurrency-Aware Live Migration Management of Multiple Virtual Machines in SDN-enabled Clouds

Authors: TianZhang He, Adel N Toosi, Rajkumar Buyya

Abstract: By integrating Software-Defined Networking and cloud computing, virtualized networking and computing resources can be dynamically reallocated through live migration of Virtual Machines (VMs). Dynamic resource management such as load balancing and energy-saving policies can request multiple migrations when the algorithms are triggered periodically. There exist notable research efforts in dynamic re… ▽ More By integrating Software-Defined Networking and cloud computing, virtualized networking and computing resources can be dynamically reallocated through live migration of Virtual Machines (VMs). Dynamic resource management such as load balancing and energy-saving policies can request multiple migrations when the algorithms are triggered periodically. There exist notable research efforts in dynamic resource management that alleviate single migration overheads, such as single migration time and co-location interference while selecting the potential VMs and migration destinations. However, by neglecting the resource dependency among potential migration requests, the existing solutions of dynamic resource management can result in the Quality of Service (QoS) degradation and Service Level Agreement (SLA) violations during the migration schedule. Therefore, it is essential to integrate both single and multiple migration overheads into VM reallocation planning. In this paper, we propose a concurrency-aware multiple migration selector that operates based on the maximal cliques and independent sets of the resource dependency graph of multiple migration requests. Our proposed method can be integrated with existing dynamic resource management policies. The experimental results demonstrate that our solution efficiently minimizes migration interference and shortens the convergence time of reallocation by maximizing the multiple migration performance while achieving the objective of dynamic resource management. △ Less

Submitted 17 November, 2021; originally announced November 2021.

arXiv:2111.08936 [pdf, other]

Efficient Large-Scale Multiple Migration Planning and Scheduling in SDN-enabled Edge Computing

Authors: TianZhang He, Adel N Toosi, Rajkumar Buyya

Abstract: The containerized services allocated in the mobile edge clouds bring up the opportunity for large-scale and real-time applications to have low latency responses. Meanwhile, live container migration is introduced to support dynamic resource management and users' mobility. However, with the expansion of network topology scale and increasing migration requests, the current multiple migration planning… ▽ More The containerized services allocated in the mobile edge clouds bring up the opportunity for large-scale and real-time applications to have low latency responses. Meanwhile, live container migration is introduced to support dynamic resource management and users' mobility. However, with the expansion of network topology scale and increasing migration requests, the current multiple migration planning and scheduling algorithms of cloud data centers can not suit large-scale scenarios in edge computing. The user mobility-induced live migrations in edge computing require near real-time level scheduling. Therefore, in this paper, through the Software-Defined Networking (SDN) controller, the resource competitions among live migrations are modeled as a dynamic resource dependency graph. We propose an iterative Maximal Independent Set (MIS)-based multiple migration planning and scheduling algorithm. Using real-world mobility traces of taxis and telecom base station coordinates, the evaluation results indicate that our solution can efficiently schedule multiple live container migrations in large-scale edge computing environments. It improves the processing time by 3000 times compared with the state-of-the-art migration planning algorithm in clouds while providing guaranteed migration performance for time-critical migrations. △ Less

Submitted 17 November, 2021; originally announced November 2021.

arXiv:2110.12415 [pdf, other]

A Distributed Deep Reinforcement Learning Technique for Application Placement in Edge and Fog Computing Environments

Authors: Mohammad Goudarzi, Marimuthu Palaniswami, Rajkumar Buyya

Abstract: Fog/Edge computing is a novel computing paradigm supporting resource-constrained Internet of Things (IoT) devices by the placement of their tasks on the edge and/or cloud servers. Recently, several Deep Reinforcement Learning (DRL)-based placement techniques have been proposed in fog/edge computing environments, which are only suitable for centralized setups. The training of well-performed DRL age… ▽ More Fog/Edge computing is a novel computing paradigm supporting resource-constrained Internet of Things (IoT) devices by the placement of their tasks on the edge and/or cloud servers. Recently, several Deep Reinforcement Learning (DRL)-based placement techniques have been proposed in fog/edge computing environments, which are only suitable for centralized setups. The training of well-performed DRL agents requires manifold training data while obtaining training data is costly. Hence, these centralized DRL-based techniques lack generalizability and quick adaptability, thus failing to efficiently tackle application placement problems. Moreover, many IoT applications are modeled as Directed Acyclic Graphs (DAGs) with diverse topologies. Satisfying dependencies of DAG-based IoT applications incur additional constraints and increase the complexity of placement problems. To overcome these challenges, we propose an actor-critic-based distributed application placement technique, working based on the IMPortance weighted Actor-Learner Architectures (IMPALA). IMPALA is known for efficient distributed experience trajectory generation that significantly reduces the exploration costs of agents. Besides, it uses an adaptive off-policy correction method for faster convergence to optimal solutions. Our technique uses recurrent layers to capture temporal behaviors of input data and a replay buffer to improve the sample efficiency. The performance results, obtained from simulation and testbed experiments, demonstrate that our technique significantly improves the execution cost of IoT applications up to 30\% compared to its counterparts. △ Less

Submitted 24 October, 2021; originally announced October 2021.

Comments: This Paper is accepted in IEEE Transactions on Mobile Computing (TMC), on 23 October 2021

arXiv:2110.09913 [pdf, other]

Prepartition: Load Balancing Approach for Virtual Machine Reservations in a Cloud Data Center

Authors: Wenhong Tian, Minxian Xu, Guangyao Zhou, Kui Wu, Chengzhong Xu, Rajkumar Buyya

Abstract: Load balancing is vital for the efficient and long-term operation of cloud data centers. With virtualization, post (reactive) migration of virtual machines after allocation is the traditional way for load balancing and consolidation. However, reactive migration is not easy to obtain predefined load balance objectives and may interrupt services and bring instability. Therefore, we provide a new app… ▽ More Load balancing is vital for the efficient and long-term operation of cloud data centers. With virtualization, post (reactive) migration of virtual machines after allocation is the traditional way for load balancing and consolidation. However, reactive migration is not easy to obtain predefined load balance objectives and may interrupt services and bring instability. Therefore, we provide a new approach, called Prepartition, for load balancing. It partitions a VM request into a few sub-requests sequentially with start time, end time and capacity demands, and treats each sub-request as a regular VM request. In this way, it can proactively set a bound for each VM request on each physical machine and makes the scheduler get ready before VM migration to obtain the predefined load balancing goal, which supports the resource allocation in a fine-grained manner. Simulations with real-world trace and synthetic data show that Prepartition for offline (PrepartitionOff) scheduling has 10%-20% better performance than the existing load balancing algorithms under several metrics, including average utilization, imbalance degree, makespan and Capacity_makespan. We also extend Prepartition to online load balancing. Evaluation results show that our proposed approach also outperforms existing online algorithms. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: 10 figures, 5 tables, 21 pages, accepted with minor in Journal of Computer Science and Technology

arXiv:2110.05529 [pdf, other]

doi 10.1016/j.jss.2021.111124

HUNTER: AI based Holistic Resource Management for Sustainable Cloud Computing

Authors: Shreshth Tuli, Sukhpal Singh Gill, Minxian Xu, Peter Garraghan, Rami Bahsoon, Schahram Dustdar, Rizos Sakellariou, Omer Rana, Rajkumar Buyya, Giuliano Casale, Nicholas R. Jennings

Abstract: The worldwide adoption of cloud data centers (CDCs) has given rise to the ubiquitous demand for hosting application services on the cloud. Further, contemporary data-intensive industries have seen a sharp upsurge in the resource requirements of modern applications. This has led to the provisioning of an increased number of cloud servers, giving rise to higher energy consumption and, consequently,… ▽ More The worldwide adoption of cloud data centers (CDCs) has given rise to the ubiquitous demand for hosting application services on the cloud. Further, contemporary data-intensive industries have seen a sharp upsurge in the resource requirements of modern applications. This has led to the provisioning of an increased number of cloud servers, giving rise to higher energy consumption and, consequently, sustainability concerns. Traditional heuristics and reinforcement learning based algorithms for energy-efficient cloud resource management address the scalability and adaptability related challenges to a limited extent. Existing work often fails to capture dependencies across thermal characteristics of hosts, resource consumption of tasks and the corresponding scheduling decisions. This leads to poor scalability and an increase in the compute resource requirements, particularly in environments with non-stationary resource demands. To address these limitations, we propose an artificial intelligence (AI) based holistic resource management technique for sustainable cloud computing called HUNTER. The proposed model formulates the goal of optimizing energy efficiency in data centers as a multi-objective scheduling problem, considering three important models: energy, thermal and cooling. HUNTER utilizes a Gated Graph Convolution Network as a surrogate model for approximating the Quality of Service (QoS) for a system state and generating optimal scheduling decisions. Experiments on simulated and physical cloud environments using the CloudSim toolkit and the COSCO framework show that HUNTER outperforms state-of-the-art baselines in terms of energy consumption, SLA violation, scheduling time, cost and temperature by up to 12, 35, 43, 54 and 3 percent respectively. △ Less

Submitted 28 October, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: Accepted in Elsevier Journal of Systems and Software, 2021

arXiv:2109.15139 [pdf, other]

doi 10.1016/j.jss.2021.111208

High-Availability Clusters: A Taxonomy, Survey, and Future Directions

Authors: Premathas Somasekaram, Radu Calinescu, Rajkumar Buyya

Abstract: The delivery of key services in domains ranging from finance and manufacturing to healthcare and transportation is underpinned by a rapidly growing number of mission-critical enterprise applications. Ensuring the continuity of these complex applications requires the use of software-managed infrastructures called high-availability clusters (HACs). HACs employ sophisticated techniques to monitor the… ▽ More The delivery of key services in domains ranging from finance and manufacturing to healthcare and transportation is underpinned by a rapidly growing number of mission-critical enterprise applications. Ensuring the continuity of these complex applications requires the use of software-managed infrastructures called high-availability clusters (HACs). HACs employ sophisticated techniques to monitor the health of key enterprise application layers and of the resources they use, and to seamlessly restart or relocate application components after failures. In this paper, we first describe the manifold uses of HACs to protect essential layers of a critical application and present the architecture of high availability clusters. We then propose a taxonomy that covers all key aspects of HACs -- deployment patterns, application areas, types of cluster, topology, cluster management, failure detection and recovery, consistency and integrity, and data synchronisation; and we use this taxonomy to provide a comprehensive survey of the end-to-end software solutions available for the HAC deployment of enterprise applications. Finally, we discuss the limitations and challenges of existing HAC solutions, and we identify opportunities for future research in the area. △ Less

Submitted 21 September, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

Comments: Published in Journal of Systems and Software

Journal ref: Journal of Systems and Software, Volume 187, 2022, 111208

arXiv:2109.05636 [pdf, other]

IFogSim2: An Extended iFogSim Simulator for Mobility, Clustering, and Microservice Management in Edge and Fog Computing Environments

Authors: Redowan Mahmud, Samodha Pallewatta, Mohammad Goudarzi, Rajkumar Buyya

Abstract: Internet of Things (IoT) has already proven to be the building block for next-generation Cyber-Physical Systems (CPSs). The considerable amount of data generated by the IoT devices needs latency-sensitive processing, which is not feasible by deploying the respective applications in remote Cloud datacentres. Edge/Fog computing, a promising extension of Cloud at the IoT-proximate network, can meet s… ▽ More Internet of Things (IoT) has already proven to be the building block for next-generation Cyber-Physical Systems (CPSs). The considerable amount of data generated by the IoT devices needs latency-sensitive processing, which is not feasible by deploying the respective applications in remote Cloud datacentres. Edge/Fog computing, a promising extension of Cloud at the IoT-proximate network, can meet such requirements for smart CPSs. However, the structural and operational differences of Edge/Fog infrastructure resist employing Cloud-based service regulations directly to these environments. As a result, many research works have been recently conducted, focusing on efficient application and resource management in Edge/Fog computing environments. Scalable Edge/Fog infrastructure is a must to validate these policies, which is also challenging to accommodate in the real-world due to high cost and implementation time. Considering simulation as a key to this constraint, various software has been developed that can imitate the physical behaviour of Edge/Fog computing environments. Nevertheless, the existing simulators often fail to support advanced service management features because of their monolithic architecture, lack of actual dataset, and limited scope for a periodic update. To overcome these issues, we have developed multiple simulation models for service migration, dynamic distributed cluster formation, and microservice orchestration for Edge/Fog computing in this work and integrated with the existing iFogSim simulation toolkit for launching it as iFogSim2. The performance of iFogSim2 and its built-in policies are evaluated using three use case scenarios and compared with the contemporary simulators and benchmark policies under different settings. Results indicate that the proposed solution outperform others in service management time, network usage, ram consumption, and simulation time. △ Less

Submitted 15 September, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

Comments: The source code of the iFogSim2 simulator is accessible from: https://github.com/Cloudslab/iFogSim

arXiv:2108.03562 [pdf, other]

Master Graduation Thesis: A Lightweight and Distributed Container-based Framework

Authors: Qifan Deng, Rajkumar Buyya

Abstract: Edge/Fog computing is a novel computing paradigm that provides resource-limited Internet of Things (IoT) devices with scalable computing and storage resources. Compared to cloud computing, edge/fog servers have fewer resources, but they can be accessed with higher bandwidth and less communication latency. Thus, integrating edge/fog and cloud infrastructures can support the execution of diverse lat… ▽ More Edge/Fog computing is a novel computing paradigm that provides resource-limited Internet of Things (IoT) devices with scalable computing and storage resources. Compared to cloud computing, edge/fog servers have fewer resources, but they can be accessed with higher bandwidth and less communication latency. Thus, integrating edge/fog and cloud infrastructures can support the execution of diverse latency-sensitive and computation-intensive IoT applications. Although some frameworks attempt to provide such integration, there are still several challenges to be addressed, such as dynamic scheduling of different IoT applications, scalability mechanisms, multi-platform support, and supporting different interaction models. To overcome these challenges, we propose a lightweight and distributed container-based framework, called FogBus2. It provides a mechanism for scheduling heterogeneous IoT applications and implements several scheduling policies. Also, it proposes an optimized genetic algorithm to obtain fast convergence to well-suited solutions. Besides, it offers a scalability mechanism to ensure efficient responsiveness when either the number of IoT devices increases or the resources become overburdened. Also, the dynamic resource discovery mechanism of FogBus2 assists new entities to quickly join the system. We have also developed two IoT applications, called Conway's Game of Life and Video Optical Character Recognition to demonstrate the effectiveness of FogBus2 for handling real-time and non-real-time IoT applications. Experimental results show FogBus2's scheduling policy improves the response time of IoT applications by 53\% compared to other policies. Also, the scalability mechanism can reduce up to 48\% of the queuing waiting time compared to frameworks that do not support scalability. △ Less

Submitted 7 August, 2021; originally announced August 2021.

Comments: https://github.com/cloudslab/fogbus2

arXiv:2108.02328 [pdf, other]

A Distributed Application Placement and Migration Management Techniques for Edge and Fog Computing Environments

Authors: Mohammad Goudarzi, Marimuthu Palaniswami, Rajkumar Buyya

Abstract: Fog/Edge computing model allows harnessing of resources in the proximity of the Internet of Things (IoT) devices to support various types of real-time IoT applications. However, due to the mobility of users and a wide range of IoT applications with different requirements, it is a challenging issue to satisfy these applications' requirements. The execution of IoT applications exclusively on one fog… ▽ More Fog/Edge computing model allows harnessing of resources in the proximity of the Internet of Things (IoT) devices to support various types of real-time IoT applications. However, due to the mobility of users and a wide range of IoT applications with different requirements, it is a challenging issue to satisfy these applications' requirements. The execution of IoT applications exclusively on one fog/edge server may not be always feasible due to limited resources, while execution of IoT applications on different servers needs further collaboration among servers. Also, considering user mobility, some modules of each IoT application may require migration to other servers for execution, leading to service interruption and extra execution costs. In this article, we propose a new weighted cost model for hierarchical fog computing environments, in terms of the response time of IoT applications and energy consumption of IoT devices, to minimize the cost of running IoT applications and potential migrations. Besides, a distributed clustering technique is proposed to enable the collaborative execution of tasks, emitted from application modules, among servers. Also, we propose an application placement technique to minimize the overall cost of executing IoT applications on multiple servers in a distributed manner. Furthermore, a distributed migration management technique is proposed for the potential migration of applications' modules to other remote servers as the users move along their path. Besides, failure recovery methods are embedded in the clustering, application placement, and migration management techniques to recover from unpredicted failures. The performance results show that our technique significantly improves its counterparts in terms of placement deployment time, average execution cost of tasks, total number of migrations, total number of interrupted tasks, and cumulative migration cost. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Comments: Accepted as keynote paper in: 16th CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS FedCSIS 2021

Showing 1–50 of 198 results for author: Buyya, R