More so than ever, businesses need to ensure that their databases are resilient, secure, and always available to support their operations. Database-as-a-Service (DBaaS) solutions have become a popular way for organizations to manage their databases efficiently, leveraging cloud infrastructure and advanced set-and-forget automation.
However, consuming DBaaS from providers comes with many compromises. In this guide, we’ll show you how you can build your own flexible DBaaS, your way. We’ll demonstrate how it is possible to get the full spectrum of DBaaS capabilities along with workload access and portability, and avoid surrendering control to a third-party.
From architectural and design considerations to operational requirements, we’ll take you through the process step-by-step, providing all the necessary information and guidance to help you build a DBaaS solution that is tailor-made to your unique use case. So get ready to dive in and learn how to build your own custom DBaaS solution from scratch!
We created this guide to help developers understand:
- Traditional vs. Sovereign DBaaS implementation models
- The DBaaS environment, elements and design principles
- Using a Day 2 operations framework to develop your blueprint
- The 8 key operations that form the foundation of a complete DBaaS
- Bringing the Day 2 ops framework to life with a provisional architecture
- How you can abstract the orchestration layer with Severalnines solutions
Report
Share
Report
Share
1 of 66
Download to read offline
More Related Content
Similar to DIY DBaaS: A guide to building your own full-featured DBaaS
The cloud is all the rage. Does it live up to its hype? What are the benefits of the cloud? Join me as I discuss the reasons so many companies are moving to the cloud and demo how to get up and running with a VM (IaaS) and a database (PaaS) in Azure. See why the ability to scale easily, the quickness that you can create a VM, and the built-in redundancy are just some of the reasons that moving to the cloud a “no brainer”. And if you have an on-prem datacenter, learn how to get out of the air-conditioning business!
The document discusses big data and NoSQL technologies. It defines big data, discusses its key characteristics of volume, velocity, and variety. It then discusses NoSQL databases as an alternative to traditional SQL databases for handling big data workloads. Specific NoSQL technologies and how they provide more scalability and flexibility for big data are covered. The document also addresses whether NoSQL is replacing SQL databases and argues it depends on the specific use case.
This document provides an overview of a course on implementing a modern data platform architecture using Azure services. The course objectives are to understand cloud and big data concepts, the role of Azure data services in a modern data platform, and how to implement a reference architecture using Azure data services. The course will provide an ARM template for a data platform solution that can address most data challenges.
This document discusses Data as a Service (DaaS) in cloud computing. It defines DaaS and explains that it allows users to access data stored in the cloud from any location. The document outlines the components, architecture, pricing models, benefits and drawbacks of DaaS. It provides examples of companies that offer DaaS like Google, Windows Azure, and Amazon.
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
This is from the talk I gave at the 30th Anniversary NoCOUG meeting in San Jose, CA.
We all know that data warehouses and best practices for them are changing dramatically today. As organizations build new data warehouses and modernize established ones, they are turning to Data Warehousing as a Service (DWaaS) in hopes of taking advantage of the performance, concurrency, simplicity, and lower cost of a SaaS solution or simply to reduce their data center footprint (and the maintenance that goes with that).
But what is a DWaaS really? How is it different from traditional on-premises data warehousing?
In this talk I will:
• Demystify DWaaS by defining it and its goals
• Discuss the real-world benefits of DWaaS
• Discuss some of the coolest features in a DWaaS solution as exemplified by the Snowflake Elastic Data Warehouse.
Cisco Big Data Warehouse Expansion Featuring MapR DistributionAppfluent Technology
The document discusses Cisco's Big Data Warehouse Expansion solution featuring MapR Distribution including Apache Hadoop. The solution reduces data warehouse management costs by enabling organizations to store and analyze more data at lower costs. It does this by offloading infrequently used data from the existing data warehouse to low-cost big data stores running on Cisco UCS hardware optimized for MapR Distribution. This provides benefits like enhanced analytics, improved performance, reduced costs and risks, and competitive advantages from being able to utilize more company data assets.
This document discusses designing a modern data warehouse in Azure. It provides an overview of traditional vs. self-service data warehouses and their limitations. It also outlines challenges with current data warehouses around timeliness, flexibility, quality and findability. The document then discusses why organizations need a modern data warehouse based on criteria like customer experience, quality assurance and operational efficiency. It covers various approaches to ingesting, storing, preparing, modeling and serving data on Azure. Finally, it discusses architectures like the lambda architecture and common data models.
This document discusses designing a modern data warehouse in Azure. It provides an overview of traditional vs. self-service data warehouses and their limitations. It also outlines challenges with current data warehouses around timeliness, flexibility, quality and findability. The document then discusses why organizations need a modern data warehouse based on criteria like customer experience, quality assurance and operational efficiency. It covers various approaches to ingesting, storing, preparing and modeling data in Azure. Finally, it discusses architectures like the lambda architecture and common data models.
This document discusses different options for deploying a Hadoop cluster, including using an appliance like Oracle's Big Data Appliance, deploying on cloud infrastructure through Amazon EMR, or building your own "do-it-yourself" cluster. It provides details on the hardware, software, and costs associated with each option. The conclusion compares the pros and cons of each approach, noting that appliances provide high performance and integration but may be less flexible, while cloud deployments offer scalability and pay-per-use but require consideration of data privacy. Building your own cluster gives more control but requires more work to set up and manage.
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...NuoDB
Deploying an enterprise SQL database across geographically located OpenShift or Kubernetes clusters can be challenging. These deployments often require zero-downtime, ANSI standard SQL, ACID compliant transactions, seamless day-2 operations, and highly performant and durable persistent storage systems. How can your organization easily deploy container-native storage with a distributed SQL database to deliver containerized apps in the cloud?
In this webinar, NuoDB and MayaData guide you as you build containerized apps that check these critical boxes:
[✓] Always on
[✓] At scale
[✓] High performance persistent storage
---
Resources:
NuoDB & OpenEBS Solution Guide
https://mayadata.io/assets/pdf/nuodb-openebs-solution-docs.pdf
OpenEBS Documentation:
https://docs.openebs.io/docs/next/nuodb.html
OpenEBS Getting Started Workshop
https://www.katacoda.com/openebs/scenarios/openebs-intro
https://github.com/openebs/community/tree/master/workshop
OpenEBS & Litmus Repositories
https://github.com/openebs/openebs
https://github.com/openebs/litmus
NuoDB Documentation:
http://doc.nuodb.com/Latest/Default.htm
NuoDB CE Download:
https://www.nuodb.com/download
Cloud-Native Data: What data questions to ask when building cloud-native appsVMware Tanzu
While a number of patterns and architectural guidelines exist for cloud-native applications, a discussion about data often leads to more questions than answers. For example, what are some of the typical data problems encountered, why are they different, and how can they be overcome?
Join Prasad Radhakrishnan from Pivotal and Dave Nielsen from Redis Labs as they discuss:
- Expectations and requirements of cloud-native data
- Common faux pas and strategies on how you can avoid them
Presenters:
Prasad Radhakrishnan, Platform Architecture for Data at Pivotal
Dave Nielsen, Head of Ecosystem Programs at Redis Labs
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...MayaData Inc
Deploying an enterprise SQL database across geographically located OpenShift or Kubernetes clusters can be challenging. These deployments often require zero-downtime, ANSI standard SQL, ACID-compliant transactions, seamless day-2 operations, and highly performant and durable persistent storage systems. How can your organization easily deploy container-native storage with a distributed SQL database to deliver containerized apps in the cloud?
In this webinar, NuoDB and OpenEBS (MayaData) guide you as you build containerized apps that check these critical boxes:
[✓] Always on
[✓] At scale
[✓] High-performance persistent storage
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
Watch full webinar here: https://bit.ly/3oah4ng
Gartner a récemment qualifié la Data Virtualisation comme étant une pièce maitresse des architectures d’intégration de données.
Découvrez :
- Les bénéfices d’une plateforme de virtualisation de données
- La multiplication des usages : Lakehouse, Data Science, Big Data, Data Service & IoT
- La création d’une vue unifiée de votre patrimoine de données sans transiger sur la performance
- La construction d’une architecture d’intégration Agile des données : on-premise, dans le cloud ou hybride
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
What do you give up – and gain – when moving to a fully-managed cloud database?
Now that database-as-a-Service (DBaaS) offerings have been “battle tested” in production, how is the reality matching up to the expectation? What can teams thinking of adopting a fully-managed DBaaS can learn from teams who have years of experience working with this deployment model?
Join this webinar to dive into the reality of working with various high-performance DBaaS offerings. We’ll cover the following topics, all supported with real-world examples:
- Developer flexibility
- Cost variability
- Security & privacy
- Performance impact
- Transparency & troubleshooting
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
Postgres is the leading open source database management system that is being developed by a very active community for more than 15 years. Gaby Schilders is Sales Engineer at EnterpriseDB, supplier of the EDB Postgres data platform.
Gaby Schilders, Sales Engineer at EnterpriseDB, will be explaining why companies take open source as the centerpiece for modernising their IT infrastructure, thus increasing their scalability and taking full advantage today's technologies offer them.
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life EasierDataStax
Want help building applications with real-time value at epic scale? How about solving your database performance and availability issues? Then, you want to hear more about DataStax Enterprise 5.0. Join this webinar to learn what’s new in DSE 5.0 ‒ the largest software release to date at DataStax. DSE 5.0 introduces multi-model support including Graph and JSON data models along with a ton of new and enhanced enterprise database capabilities.
View webinar recording here: https://youtu.be/3pfm4ntASJ0
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
Data management may be the hardest part of making the transition to the cloud, but enterprises including Intuit and Macy’s have figured out how to do it right. So what do they know that you might not? Join Robin Schumacher, Chief Product Officer at DataStax as he explores best practices for defining and implementing data management strategies for the cloud. He outlines a four-step journey that will take you from your first deployment in the cloud through to a true intercloud implementation and walk through a real-world use case where a major retailer has evolved through the four phases over a period of four years and is now benefiting from a highly resilient multi-cloud deployment.
View webinar: https://youtu.be/RrTxQ2BAxjg
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
Similar to DIY DBaaS: A guide to building your own full-featured DBaaS (20)
LIVE DEMO: CCX for CSPs, a drop-in DBaaS solutionSeveralnines
This webinar aims to equip Cloud Service Providers (CSPs) with the knowledge and tools to differentiate themselves from hyperscalers by offering a Database-as-a-Service (DBaaS) solution. The session will introduce and demonstrate CCX, a drop-in, premium DBaaS designed for rapid adoption.
Learn more about CCX for CSPs here: https://bit.ly/3VabiDr
Cloud's future runs through Sovereign DBaaSSeveralnines
Sovereign DBaaS is a new way to do DBaaS that allows you to reliably scale your open-source database ops without being limited to a specific environment or ceding control of your infrastructure to third-party service providers.
With Sovereign DBaaS, users can leverage the benefits of modern deployment strategies, e.g. public cloud, hybrid, etc., with additional security, compliance, and risk mitigation. So what exactly is Sovereign DBaaS and why should you choose one?
Presented by Sanjeev Mohan, Principal Analyst at SanjMo and former Gartner Research VP, and Vinay Joosery, CEO of Severalnines, this webinar dives into the future of the cloud and database management and introduces a new solution, Sovereign DBaaS.
The state of the cloud and its current challenges
What is Sovereign DBaaS?
Agenda:
- Key features of Sovereign DBaaS
- Why you should choose a Sovereign DBaaS
- How you can implement Sovereign DBaaS with Severalnines
- Q&A
Tips to drive maria db cluster performance for nextcloudSeveralnines
200
● SSD
2000
● NVMe
4000
Tune for your hardware. Higher is better but avoid over-committing IOPS.
innodb_flush_log_at_trx_commit 1 Flush logs at each transaction commit for ACID compliance.
innodb_log_buffer_size 16M-64M Default is 8M. Increase for more transactions per second.
innodb_log_file_size 1G Default is 48M. Increase for more transactions per second.
innodb_flush_method O_DIRECT Bypass OS cache for better durability.
innodb_thread_concurrency 0 Allow InnoDB to manage thread concurrency level.
Working with the Moodle Database: The BasicsSeveralnines
Managing the database behind Moodle is key to improving performance and achieving uptime for your users. In this training video we will talk about the Moodle database including topics like configuration, monitoring, and schema management as well as show you how ClusterControl can help with the management of your eLearning LMS systems.
SysAdmin Working from Home? Tips to Automate MySQL, MariaDB, Postgres & MongoDBSeveralnines
Are you an SysAdmin who is now responsible for your companies database operations? Then this is the webinar for you. Learn from a Senior DBA the basics you need to know to keep things up-and-running and how automation can help.
(slides) Polyglot persistence: utilizing open source databases as a Swiss poc...Severalnines
This document discusses polyglot persistence, which is using multiple specialized databases rather than a single general-purpose database. It provides examples of VidaXL's use of polyglot persistence, including MySQL, MariaDB, PostgreSQL, SOLR, Elasticsearch, MongoDB, Couchbase, and Prometheus. The benefits discussed are using the right database for each job and gaining flexibility as the company transitioned to microservices. Challenges included increased complexity, and solutions involved automation, tooling, and hiring database experts.
Webinar slides: How to Migrate from Oracle DB to MariaDBSeveralnines
This document provides an overview and agenda for a webinar on migrating from Oracle DB to MariaDB. The webinar will cover why organizations are moving to open source databases, the benefits of migrating to MariaDB from Oracle, how to plan and execute the migration process, and post-migration management topics like monitoring, backups, high availability, and scaling in MariaDB. The presentation will include discussions of data type mapping, enabling PL/SQL syntax in MariaDB, available migration tools, and testing approaches.
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControlSeveralnines
Running PostgreSQL in production comes with the responsibility for a business critical environment; this includes high availability, disaster recovery, and performance. Ops staff worry whether databases are up and running, if backups are taken and tested for integrity, whether there are performance problems that might affect end user experience, if failover will work properly in case of server failure without breaking applications, and the list goes on.
ClusterControl can be used to operationalize your PostgreSQL footprint across your enterprise. It offers a standard way of deploying high-availability replication setups with auto-failover, integrated with load balancers offering a single endpoint to applications. It provides constant health and performance monitoring through rich dashboards, as well as backup management and point-in-time recovery
See how much time and effort can be saved, as well as risks mitigated, with the help of a unified management platform over the more traditional, manual methods.
We’ve seen a 152% increase in ClusterControl installations by PostgreSQL users last year, so make sure you don’t miss out on the trend!
AGENDA
- Managing PostgreSQL “the old way”:
- Common challenges
- Important tasks to perform
- Tools that are available to help
- PostgreSQL automation and management with ClusterControl:
- Deployment
- Backup and recovery
- HA setups
- Failover
- Monitoring
- Live Demo
SPEAKER
Sebastian Insausti, Support Engineer at Severalnines, has loved technology since his childhood, when he did his first computer course (Windows 3.11). And from that moment he was decided on what his profession would be. He has since built up experience with MySQL, PostgreSQL, HAProxy, WAF (ModSecurity), Linux (RedHat, CentOS, OL, Ubuntu server), Monitoring (Nagios), Networking and Virtualization (VMWare, Proxmox, Hyper-V, RHEV).
Prior to joining Severalnines, Sebastian worked as a consultant to state companies in security, database replication and high availability scenarios. He’s also a speaker and has given a few talks locally on InnoDB Cluster and MySQL Enterprise together with an Oracle team. Previous to that, he worked for a Mexican company as chief of sysadmin department as well as for a local ISP (Internet Service Provider), where he managed customers' servers and connectivity.
Webinar slides: How to Manage Replication Failover Processes for MySQL, Maria...Severalnines
Failover is the process of moving to a healthy standby component, during a failure or maintenance event, in order to preserve uptime. The quicker it can be done, the faster you can be back online. However, failover can be tricky for transactional database systems as we strive to preserve data integrity - especially in asynchronous or semi-synchronous topologies. There are risks associated, from diverging datasets to loss of data. Failing over due to incorrect reasoning, e.g., failed heartbeats in the case of network partitioning, can also cause significant harm.
This webinar replay gives a detailed overview of what failover processes may look like in MySQL, MariaDB and PostgreSQL replication setups. We’ve covered the dangers related to the failover process, and discuss the tradeoffs between failover speed and data integrity. We’ve found out about how to shield applications from database failures with the help of proxies. And we've finally had a look at how ClusterControl manages the failover process, and how it can be configured for both assisted and automated failover.
So if you’re looking at minimizing downtime and meet your SLAs through an automated or semi-automated approach, then this webinar replay is for you!
AGENDA
- An introduction to failover - what, when, how
- in MySQL / MariaDB
- in PostgreSQL
- To automate or not to automate
- Understanding the failover process
- Orchestrating failover across the whole HA stack
- Difficult problems
- Network partitioning
- Missed heartbeats
- Split brain
- From assisted to fully automated failover with ClusterControl
- Demo
SPEAKER
Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.
What if …
- Traditional, labour-intensive backup and archive practices for your MySQL, MariaDB, MongoDB and PostgreSQL databases were a thing of the past?
- You could have one backup management solution for all your business data?
- You could ensure integrity of all your backups?
- You could leverage the competitive pricing and almost limitless capacity of cloud-based backup while meeting cost, manageability, and compliance requirements from the business.
Welcome to our webinar on Backup Management with ClusterControl.
ClusterControl’s centralized backup management for open source databases provides you with hot backups of large datasets, point in time recovery in a couple of clicks, at-rest and in-transit data encryption, data integrity via automatic restore verification, cloud backups (AWS, Google and Azure) for Disaster Recovery, retention policies to ensure compliance, and automated alerts and reporting.
Whether you are looking at rebuilding your existing backup infrastructure, or updating it, this webinar is for you!
AGENDA
- Backup and recovery management of local or remote databases
- Logical or physical backups
- Full or Incremental backups
- Position or time-based Point in Time Recovery (for MySQL and PostgreSQL)
- Upload to the cloud (Amazon S3, Google Cloud Storage, Azure Storage)
- Encryption of backup data
- Compression of backup data
- One centralized backup system for your open source databases (Demo)
- Schedule, manage and operate backups
- Define backup policies, retention, history
- Validation - Automatic restore verification
- Backup reporting
SPEAKER
Bartlomiej Oles, Senior Support Engineer at Severalnines, is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.
Disaster Recovery Planning for MySQL & MariaDBSeveralnines
Bart Oles - Severalnines AB
Organizations need an appropriate disaster recovery plan to mitigate the impact of downtime. But how much should a business invest? Designing a highly available system comes at a cost, and not all businesses and indeed not all applications need five 9's availability.
We will explain fundamental disaster recovery concepts and walk you through the relevant options from the MySQL & MariaDB ecosystem to meet different tiers of disaster recovery requirements, and demonstrate how to automate an appropriate disaster recovery plan.
Krzysztof Ksiazek - Severalnines AB
So, you are a developer or sysadmin and showed some abilities in dealing with databases issues. And now, you have been elected to the role of DBA. And as you start managing the databases, you wonder…
* How do I tune them to make best use of the hardware?
* How do I optimize the Operating System?
* How do I best configure MySQL or MariaDB for a specific database workload?
If you're asking yourself the following questions when it comes to optimally running your MySQL or MariaDB databases, then this talk is for you!
We will discuss some of the settings that are most often tweaked and which can bring you significant improvement in the performance of your MySQL or MariaDB database. We will also cover some of the variables which are frequently modified even though they should not.
Performance tuning is not easy, especially if you're not an experienced DBA, but you can go a surprisingly long way with a few basic guidelines.
Performance Tuning Cheat Sheet for MongoDBSeveralnines
Bart Oles - Severalnines AB
Database performance affects organizational performance, and we tend to look for quick fixes when under stress. But how can we better understand our database workload and factors that may cause harm to it? What are the limitations in MongoDB that could potentially impact cluster performance?
In this talk, we will show you how to identify the factors that limit database performance. We will start with the free MongoDB Cloud monitoring tools. Then we will move on to log files and queries. To be able to achieve optimal use of hardware resources, we will take a look into kernel optimization and other crucial OS settings. Finally, we will look into how to examine performance of MongoDB replication.
Advanced MySql Data-at-Rest Encryption in Percona ServerSeveralnines
Iwo Panowicz - Percona & Bart Oles - Severalnines AB
The purpose of the talk is to present data-at-rest encryption implementation in Percona Server for MySQL.
Differences between Oracle's MySQL and MariaDB implementation.
- How it is implemented?
- What is encrypted:
- Tablespaces?
- General tablespace?
- Double write buffer/parallel double write buffer?
- Temporary tablespaces? (KEY BLOCKS)
- Binlogs?
- Slow/general/error logs?
- MyISAM? MyRocks? X?
- Performance overhead.
- Backups?
- Transportable tablespaces. Transfer key.
- Plugins
- Keyrings in general
- Key rotation?
- General-Purpose Keyring Key-Management Functions
- Keyring_file
- Is useful? How to make it profitable?
- Keyring Vault
- How does it work?
- How to make a transition from keyring_file
Polyglot Persistence Utilizing Open Source Databases as a Swiss Pocket KnifeSeveralnines
Art Van Scheppingen - vidaXL & Bart Oles - Severalnines AB
Over the past few years, VidaXL has become a European market leader in the online retail of slow moving consumer goods. When a company achieved over 50% year over year growth for the past 9 years, there is hardly enough time to overhaul existing systems. This means existing systems will be stretched to the maximum of their capabilities, and often additional performance will be gained by utilizing a large variety of datastores.
Polyglot persistence reigns in rapidly growing environments and the traditional one-size-fits-all strategy of monoglots is over.
VidaXL has a broad landscape of datastores, ranging from traditional SQL data stores, like MySQL or PostgreSQL alongside more recent load balancing technologies such as ProxySQL, to document stores like MongoDB and search engines such as SOLR and Elasticsearch.
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...Severalnines
Traditional server monitoring tools are not built for modern distributed database architectures. Let’s face it, most production databases today run in some kind of high availability setup - from simpler master-slave replication to multi-master clusters fronted by redundant load balancers. Operations teams deal with dozens, often hundreds of services that make up the database environment.
This is why we built ClusterControl - to address modern, highly distributed database setups based on replication or clustering. We wanted something that could provide a systems view of all the components of a distributed cluster, including load balancers.
Watch this replay of a webinar on free database monitoring using ClusterControl Community Edition. We show you how to monitor all your MySQL, MariaDB, PostgreSQL and MongoDB systems from a single point of control - whether they are deployed as Galera Clusters, sharded clusters or replication setups across on-prem and cloud data centers. We also see how to use Advisors in order to improve performance.
AGENDA
- Requirements for monitoring distributed database systems
- Cloud-based vs On-prem monitoring solutions
- Agent-based vs Agentless monitoring
- Deepdive into ClusterControl Community Edition
- Architecture
- Metrics Collection
- Trending
- Dashboards
- Queries
- Performance Advisors
- Other features available to Community users
SPEAKER
Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
To operate PostgreSQL efficiently, you need to have insight into database performance and make sure it is at optimal levels.
With that in mind, we dive into monitoring PostgreSQL for performance in this webinar replay.
PostgreSQL offers many metrics through various status overviews and commands, but which ones really matter to you? How do you trend and alert on them? What is the meaning behind the metrics? And what are some of the most common causes for performance problems in production?
We discuss this and more in ordinary, plain DBA language. We also have a look at some of the tools available for PostgreSQL monitoring and trending; and we’ll show you how to leverage ClusterControl’s PostgreSQL metrics, dashboards, custom alerting and other features to track and optimize the performance of your system.
AGENDA
- PostgreSQL architecture overview
- Performance problems in production
- Common causes
- Key PostgreSQL metrics and their meaning
- Tuning for performance
- Performance monitoring tools
- Impact of monitoring on performance
- How to use ClusterControl to identify performance issues
- Demo
SPEAKER
Sebastian Insausti, Support Engineer at Severalnines, has loved technology since his childhood, when he did his first computer course (Windows 3.11). And from that moment he was decided on what his profession would be. He has since built up experience with MySQL, PostgreSQL, HAProxy, WAF (ModSecurity), Linux (RedHat, CentOS, OL, Ubuntu server), Monitoring (Nagios), Networking and Virtualization (VMWare, Proxmox, Hyper-V, RHEV).
Prior to joining Severalnines, Sebastian worked as a consultant to state companies in security, database replication and high availability scenarios. He’s also a speaker and has given a few talks locally on InnoDB Cluster and MySQL Enterprise together with an Oracle team. Previous to that, he worked for a Mexican company as chief of sysadmin department as well as for a local ISP (Internet Service Provider), where he managed customers' servers and connectivity.
This webinar builds upon a related blog post by Sebastian: https://severalnines.com/blog/performance-cheat-sheet-postgresql.
Webinar slides: Our Guide to MySQL & MariaDB Performance TuningSeveralnines
If you’re asking yourself the following questions when it comes to optimally running your MySQL or MariaDB databases:
- How do I tune them to make best use of the hardware?
- How do I optimize the Operating System?
- How do I best configure MySQL or MariaDB for a specific database workload?
Then this replay is for you!
We discuss some of the settings that are most often tweaked and which can bring you significant improvement in the performance of your MySQL or MariaDB database. We also cover some of the variables which are frequently modified even though they should not.
Performance tuning is not easy, especially if you’re not an experienced DBA, but you can go a surprisingly long way with a few basic guidelines.
This webinar builds upon blog posts by Krzysztof from the ‘Become a MySQL DBA’ series.
AGENDA
- What to tune and why?
- Tuning process
- Operating system tuning
- Memory
- I/O performance
- MySQL configuration tuning
- Memory
- I/O performance
- Useful tools
- Do’s and do not’s of MySQL tuning
- Changes in MySQL 8.0
SPEAKER
Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxSynapseIndia
Your comprehensive guide to RPA in healthcare for 2024. Explore the benefits, use cases, and emerging trends of robotic process automation. Understand the challenges and prepare for the future of healthcare automation
Comparison Table of DiskWarrior Alternatives.pdfAndrey Yasko
To help you choose the best DiskWarrior alternative, we've compiled a comparison table summarizing the features, pros, cons, and pricing of six alternatives.
Implementations of Fused Deposition Modeling in real worldEmerging Tech
The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries:
1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes.
2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions.
3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines.
4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors.
5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering.
6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands.
7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems.
8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering.
9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively.
Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
How RPA Help in the Transportation and Logistics Industry.pptxSynapseIndia
Revolutionize your transportation processes with our cutting-edge RPA software. Automate repetitive tasks, reduce costs, and enhance efficiency in the logistics sector with our advanced solutions.
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Bert Blevins
Today’s digitally connected world presents a wide range of security challenges for enterprises. Insider security threats are particularly noteworthy because they have the potential to cause significant harm. Unlike external threats, insider risks originate from within the company, making them more subtle and challenging to identify. This blog aims to provide a comprehensive understanding of insider security threats, including their types, examples, effects, and mitigation techniques.
Blockchain technology is transforming industries and reshaping the way we conduct business, manage data, and secure transactions. Whether you're new to blockchain or looking to deepen your knowledge, our guidebook, "Blockchain for Dummies", is your ultimate resource.
Are you interested in dipping your toes in the cloud native observability waters, but as an engineer you are not sure where to get started with tracing problems through your microservices and application landscapes on Kubernetes? Then this is the session for you, where we take you on your first steps in an active open-source project that offers a buffet of languages, challenges, and opportunities for getting started with telemetry data.
The project is called openTelemetry, but before diving into the specifics, we’ll start with de-mystifying key concepts and terms such as observability, telemetry, instrumentation, cardinality, percentile to lay a foundation. After understanding the nuts and bolts of observability and distributed traces, we’ll explore the openTelemetry community; its Special Interest Groups (SIGs), repositories, and how to become not only an end-user, but possibly a contributor.We will wrap up with an overview of the components in this project, such as the Collector, the OpenTelemetry protocol (OTLP), its APIs, and its SDKs.
Attendees will leave with an understanding of key observability concepts, become grounded in distributed tracing terminology, be aware of the components of openTelemetry, and know how to take their first steps to an open-source contribution!
Key Takeaways: Open source, vendor neutral instrumentation is an exciting new reality as the industry standardizes on openTelemetry for observability. OpenTelemetry is on a mission to enable effective observability by making high-quality, portable telemetry ubiquitous. The world of observability and monitoring today has a steep learning curve and in order to achieve ubiquity, the project would benefit from growing our contributor community.
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
Best Programming Language for Civil EngineersAwais Yaseen
The integration of programming into civil engineering is transforming the industry. We can design complex infrastructure projects and analyse large datasets. Imagine revolutionizing the way we build our cities and infrastructure, all by the power of coding. Programming skills are no longer just a bonus—they’re a game changer in this era.
Technology is revolutionizing civil engineering by integrating advanced tools and techniques. Programming allows for the automation of repetitive tasks, enhancing the accuracy of designs, simulations, and analyses. With the advent of artificial intelligence and machine learning, engineers can now predict structural behaviors under various conditions, optimize material usage, and improve project planning.
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionBert Blevins
Cybersecurity is a major concern in today's connected digital world. Threats to organizations are constantly evolving and have the potential to compromise sensitive information, disrupt operations, and lead to significant financial losses. Traditional cybersecurity techniques often fall short against modern attackers. Therefore, advanced techniques for cyber security analysis and anomaly detection are essential for protecting digital assets. This blog explores these cutting-edge methods, providing a comprehensive overview of their application and importance.
Kief Morris rethinks the infrastructure code delivery lifecycle, advocating for a shift towards composable infrastructure systems. We should shift to designing around deployable components rather than code modules, use more useful levels of abstraction, and drive design and deployment from applications rather than bottom-up, monolithic architecture and delivery.
DIY DBaaS: A guide to building your own full-featured DBaaS
1. DBaaS
WHITEPAPER
DIY
More so than ever, businesses
need to ensure that their
databases are resilient,
secure, and always available
to support their operations.
Database-as-a-Service
(DBaaS) solutions have
become a popular way for
organizations to manage
their databases efficiently,
leveraging cloud infrastructure
and advanced set-and-forget
automation.
However, consuming DBaaS
from providers comes with
many compromises. In
this guide, we’ll show you
how you can build your
own flexible DBaaS, your
way. We’ll demonstrate
how it is possible to
get the full spectrum of
DBaaS capabilities along
with workload access
and portability, and avoid
surrendering control to a
third-party.
From architectural and design
considerations to operational
requirements, we’ll take you
through the process step-
by-step, providing all the
necessary information and
guidance to help you build a
DBaaS solution that is tailor-
made to your unique use case.
So get ready to dive in and
learn how to build your own
custom DBaaS solution from
scratch!
primary {
"id": "1",
"name": "db-node-1",
"hostname": "mysql01.example.com",
"ip_address": "192.168.1.101",
"port": 3306,
"database_name": "biling",
"status": "Online",
"uptime": "14 days, 6 hours",
"version": "MySQL 8.0.26",
"replication": {
"role": "Primary",
"replica_count": 2,
"replica_status": "Synced"
},
"connections": {
"current_connections": 25,
"max_connections": 100
},
"performance_metrics": {
2. Section I: DBaaS as an implementation model 4
Traditional DBaaS implementation model�����������������������������������������������������������������������������������������������������������������������4
Sovereign DBaaS implementation model�������������������������������������������������������������������������������������������������������������������������5
• Markers of Sovereign DBaaS����������������������������������������������������������������������������������������������������������������������������������6
• Principles of Sovereign DBaaS�������������������������������������������������������������������������������������������������������������������������������6
First principle: end-user independence������������������������������������������������������������������������������������������������������������������������������������������������6
Second principle: environment / ecosystem agnosticism���������������������������������������������������������������������������������������������������������������7
Third principle: embracing open-source software (OSS)���������������������������������������������������������������������������������������������������������������7
Option 1: independent�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������8
Option 2: interdependent��������������������������������������������������������������������������������������������������������������������������������������������������������������������������8
Section II: DIY DBaaS in practice 9
• Foundation points: DBaaS environment, elements and design principles�������������������������������������������10
Environment����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 10
Elements������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������ 10
Platform������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 11
Compute������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������ 11
Storage��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 11
Networking������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 12
Design principles�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 12
DBaaS routines and blueprint: the Day 2 framework������������������������������������������������������������������������������������������������13
• Day 2 ops routines��������������������������������������������������������������������������������������������������������������������������������������������������13
Scaling and high availability����������������������������������������������������������������������������������������������������������������������������������������������������������������� 14
Monitoring and alerting�������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 14
Backups for onsite and offsite storage���������������������������������������������������������������������������������������������������������������������������������������������� 14
Point-in-time recovery���������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 14
Upgrading and patching������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 14
Access control / user access������������������������������������������������������������������������������������������������������������������������������������������������������������������� 14
Data migration (on-premises to cloud)���������������������������������������������������������������������������������������������������������������������������������������������� 14
• Day 2 ops blueprint������������������������������������������������������������������������������������������������������������������������������������������������15
Platform architecture������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 15
Database provisioning���������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 15
Monitoring and alerting�������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 15
Backup and recovery������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 15
Scaling and high availability����������������������������������������������������������������������������������������������������������������������������������������������������������������� 15
Upgrade and patch management�������������������������������������������������������������������������������������������������������������������������������������������������������� 15
Security�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 16
API integration������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 16
Self-service user portal�������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 16
Solution spotlight — abstracting the event-driven architecture with Dapr�������������������������������������������������������������������������� 16
The Day 2 ops framework: operational guidelines�����������������������������������������������������������������������������������������������������18
• Op 1 — Database provisioning and deployment������������������������������������������������������������������������������������������19
• Op 2 — Lifecycle management and high availability using an autopilot pattern������������������������������20
Health checks��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 22
Automated failover���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 23
Primary and replica node and cluster state examples:���������������������������������������������������������������������������������������������������������������� 24
• Op 3 — Observability���������������������������������������������������������������������������������������������������������������������������������������������26
Logs (syslog)���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 26
3. Metrics and events (Telegraf, other exporters)������������������������������������������������������������������������������������������������������������������������������� 26
Observability spotlight: database query performance����������������������������������������������������������������������������������������������������������������� 30
• Op 4 — Backup and recovery������������������������������������������������������������������������������������������������������������������������������31
Data structures examples ��������������������������������������������������������������������������������������������������������������������������������������������������������������������� 32
Backup service architecture������������������������������������������������������������������������������������������������������������������������������������������������������������������ 35
Backup agent initialization and registration������������������������������������������������������������������������������������������������������������������������������������� 38
The backup process��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 39
Restoring backups������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������ 43
Verifying backups������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 48
• Op 5 — Scaling��������������������������������������������������������������������������������������������������������������������������������������������������������48
• Op 6 — Upgrades and patching�������������������������������������������������������������������������������������������������������������������������50
• Op 7 — Access control and multi-tenancy�����������������������������������������������������������������������������������������������������50
Access control�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 51
Multi-tenancy��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 51
• Op 8 — Data migration������������������������������������������������������������������������������������������������������������������������������������������53
Bringing Day 2 ops to life: a provisional architecture������������������������������������������������������������������������������������������������54
• Core services�������������������������������������������������������������������������������������������������������������������������������������������������������������54
Section III: abstracting the orchestration layer with Severalnines solutions 57
ClusterControl: DB ops automated, just add VMs������������������������������������������������������������������������������������������������������57
• ClusterControl operational features������������������������������������������������������������������������������������������������������������������58
• ClusterControl architecture����������������������������������������������������������������������������������������������������������������������������������60
Overview����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 60
Components����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 61
CCX Sovereign: your DBaaS, in your cloud(s)��������������������������������������������������������������������������������������������������������������62
• CCX features�������������������������������������������������������������������������������������������������������������������������������������������������������������63
Supports hyperscalers, local clouds and private environments������������������������������������������������������������������������������������������������ 63
Set and forget database deployments���������������������������������������������������������������������������������������������������������������������������������������������� 63
Granular observability���������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 63
Automated backups��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 63
Scaling and storage management������������������������������������������������������������������������������������������������������������������������������������������������������ 63
Granular user management������������������������������������������������������������������������������������������������������������������������������������������������������������������ 63
Plug-and-play integrations������������������������������������������������������������������������������������������������������������������������������������������������������������������� 63
Security�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� 64
CCX Cloud: from Severalnines, run by Severalnines��������������������������������������������������������������������������������������������������65
Choosing the correct solution for your use case����������������������������������������������������������������������������������������������������������65
Wrapping up����������������������������������������������������������������������������������������������������������������������������������������������������������������������������66
4. 4
Section I: DBaaS as an
implementation model
Although DBaaS is traditionally thought of as a business model whereby
end-users consume databases from 3rd-party providers who manage their
operations, DBaaS is an implementation concept at its core. Concepts,
platforms, and tooling have continued to evolve, giving organizations more
choices over how to implement their DB ops.
Traditional DBaaS implementation
model
In a Traditional DBaaS model, the provider is responsible for the entire lifecycle
of the data stack, including provisioning, configuration, monitoring, backup
and recovery, and patching. It is useful for teams that are responsible for
underlying products or projects, such as software applications, websites, or
online services, and whose primary goal is to ensure that their business-critical
services are managed and always fully operational.
The core characteristic of this model is its transfer of CAPEX to OPEX, i.e.
customers can avoid the upfront capital expenses associated with buying and
maintaining their data stack. Instead, they use metered services scaling up or
down as needed. There are three general categories of provider: DB, cloud, and
independent service vendors.
• DB vendor DBaaS refers to services provided by the creators and
maintainers of the database software, such as MongoDB and Elastic. These
providers offer fully managed services that are specifically designed to
work with their own database software, making them a good choice for
organizations that want to use those specific databases without the added
complexity of managing the underlying infrastructure.
• Cloud vendor DBaaS, on the other hand, are services provided by cloud
platform providers like Amazon Web Services (AWS), Google Cloud
Platform, and Microsoft Azure. These services provide organizations
with fully managed database services that run on their respective cloud
platforms. Cloud vendor DBaaS offers a high degree of scalability,
flexibility, and reliability, as well as easy integration with other cloud
services.
• Independent service vendor (ISV) DBaaS refers to services provided by
third-party vendors such as Severalnines, Instaclustr, Aiven, and others.
These vendors offer fully managed services that support a variety of
database engines, usually across multiple clouds (typically the big 3),
providing organizations with more flexibility in their choice of database
software and infrastructure provider.
5. replica {
id: 2,
node_id: db-node-2,
hostname: mysqlreplica01.
example.com,
ip_address: 192.168.1.102,
port: 3306,
database_name: my_database,
status: Online,
uptime: 7 days, 12 hours,
version: MySQL 8.0.26,
master: {
master_node_id: db-node-1,
master_hostname: mysql01.
example.com,
master_ip_address:
192.168.1.101,
master_port: 3306,
replication_status: Connected,
seconds_behind_master: 10
5
Sovereign DBaaS implementation
model
A Sovereign DBaaS model differs from its counterpart in that it offers
organizations complete control over their database layer operations, enabling
internal DevOps or Infrastructure teams to automate their database layer
operations using their own code, open-source tooling, and / or off-the-shelf
solutions in a vendor-neutral environment.
The DBaaS platform still provides a self-service model for developers,
enabling them to create, configure, and manage their own databases
independently, enforcing security policies, backup and recovery procedures,
and other governance and compliance requirements, ensuring that developers
adhere to best practices and policies. It allows developers to deploy and
consume databases efficiently while providing the infrastructure team with
the ability to enforce policies and ensure compliance.
The infrastructure can be hosted on-premises, in a colocation facility, or in a
hyperscale cloud provider facility as infrastructure-as-a-service (IaaS), giving
organizations the flexibility to choose where their data is stored and to change
their choices at any time for any reason. In this model, the primary goal of
DBaaS is to give developers autonomy, enforce processes, and allow them to
deploy persistent resources with ease.
A Sovereign DBaaS implementation
offers ultimate control over all
business risks related to data — it
mitigates vendor, environment, and
ecosystem lock-in, managed license
instability, key person dependencies,
data regulation changes, and cost
unpredictability. By rendering
organizations less reliant on
external providers, it reduces
the business risks associated
with traditional DBaaS, such as
regulatory compliance.
Below we will briefly describe
the markers and principles
of a Sovereign DBaaS
implementation. For more detail
on these concepts, read our
Sovereign DBaaS Guide.
6. 6
Markers of Sovereign DBaaS
• Control:
You are able to own and assert control over the data pipeline according to
your needs through your DBaaS implementation — from the underlying
infrastructure, databases and their operations, to workload location.
• Access
You have the level of access your use case requires to your data and the
technologies that handle that data. You can access the data plane, the
underlying infrastructure, and the data management system. You get
root access, allowing you to install, configure, and manage your stack
components.
• Portability
The traditional approach to DBaaS inevitably leads to organizations
becoming wholly dependent on a service, effectively trapping them in a
particular ecosystem. Conversely, being data sovereign means you aren’t
married to a specific vendor or environment. You can efficiently and cost-
effectively move databases from one cloud environment to another, or from
an on-prem to a cloud environment and vice versa with minimal difficulty.
• Licensing stability
A fundamental principle of sovereign data is the ability to roll your own
optimized DBaaS solutions without being subject to vendors’ licensing
restrictions. You can include source-available options like licensed
MongoDB and Elasticsearch that third-party service providers cannot.
• Budget efficiency
Third-party costs are difficult and sometimes impossible to model, not to
mention expensive at scale. With Sovereign DBaaS, you can form a clear
understanding of costs because you have greater visibility into and control
over inputs, e.g. infrastructure, databases, and tools. You can better manage
and track them because you can consolidate your database layer into a true
single pane of glass. And you can implement FinOps practices and tools
into your stack more precisely to help you better model and predict your
spend.
Principles of Sovereign DBaaS
First principle: end-user independence
The first condition of end-user independence is full visibility into the database
layer, including end-to-end visibility into the tech and software the DBaaS
uses. Sovereign DBaaS can offer complete data transparency with no
intermediaries (e.g., vendors) withholding information about the components
and processes being used to implement the stack. Traditional DBaaS is a
veritable black box — you can’t see into it, i.e. the data management software,
security configurations, or privacy protocols, etc., just the output.
7. 7
From visibility, comes the second condition, control, which requires the following:
• DB and infrastructure access
You can modify the database / infra configuration and everything that the
configuration entails. This is made possible by the direct use of open-
source software, unmediated by a vendor’s implementation, enabling you
to better tune your databases to support your workloads.
• Location choice
You decide where and how data is processed and stored. For instance, you
can place workloads with stringent requirements in one environment, such
as on-premises, and those with fewer in another, such as public cloud.
These requirements don’t just have to revolve around compliance and
security, but performance, cost, and other variables that influence your
workloads as well.
Second principle: environment / ecosystem agnosticism
Sovereign DBaaS enforces the idea of environment agnosticism and extends
it to the ecosystem. It means that end-users have the freedom to choose
different infrastructure environments and the ability to combine multiple
underlying environments into a unified control plane. They get environment
agnosticism – which enables location control. You can choose one environment
or select from a mix of environments such as private cloud (e.g., VMware,
Nutanix, OpenStack), public cloud (e.g. AWS, GCP, Azure, etc.), on-premises,
co-location, and hybrid.
Sovereign DBaaS means having the freedom to go beyond any one ecosystem.
For example, AWS Outposts lets you run on-premises. However, this setup
is not truly sovereign because, aside from the managed service aspect, you’re
locked into the AWS ecosystem.
Third principle: embracing open-source software (OSS)
A crucial principle of Sovereign DBaaS is the unrestricted use of open-source
software. OSS allows you to avoid many of the issues you see with proprietary
cloud vendor solutions, such as vendor lock-in. You have the ability to freely utilize
the best OSS databases available, without worrying about managed providers’
APIs, nomenclature and semantics, e.g. interacting with managed PostgreSQL
from one provider is a different experience than another, or license changes that
render a database unavailable for third-party offering, such as Elasticsearch.
Additionally, when you buy a packaged solution from vendors, the database is
more open-source adjacent, it’s tied to the infrastructure which they determine,
often only available in one environment (often a handful of clouds), and you
aren’t given full access to the database because of their SLA requirements.
Open-source software also potentially unlocks cost efficiency because, a)
it’s free, b) it decouples the database from the infrastructure, enabling you to
place them where you want, and c) you have full access to be able to tune and
optimize their configuration.
8. 8
DIY DBaaS options:
independent or interdependent
There are several ways to approach creating a DBaaS, each with its own trade-
offs. Below, we delve into each option with recommendations and technical details
to help you make the right choice for your organization.
Option 1: independent
This option involves procuring your own infrastructure, building custom software to
handle each job within the DBaaS framework, and building a custom management
layer to act as a control plane. You have complete control over your infrastructure
and software but requires expertise and significant investment in time and resources.
Pros:
• Full control: You can tailor your solution to meet specific requirements .
• Sovereignty: You own your data, end-to-end.
• Intellectual property: Any custom software developed in-house remains your
intellectual property.
Cons:
• Complexity: This approach requires widely varied expertise in hardware,
networking, software development, and database management.
• Difficulty: Creating your own control layer software requires substantial effort.
• Maintenance overhead: You’re completely responsible for managing, securing,
and updating your infrastructure.
• Cost: The initial investment and ongoing maintenance costs can be substantial.
Option 2: interdependent
This option offers a middle ground between buying a solution and building
everything from scratch. In this approach, you would combine your choice of infra,
code, and tooling with off-the-shelf software to act as the control plane.
Pros:
• Flexibility: Choose between cloud, on-premises, or hybrid environments.
• Simplified control plane management: Rather than building your own control
plane, you can leverage existing software to provide a centralized interface for
managing your database resources.
• Vendor-agnostic: The software can manage various database technologies,
allowing you to mix and match as needed.
• Sovereignty: You own your data, end-to-end.
Cons:
• Partial lock-in: As always, introducing off-the-shelf components poses lock-in,
however partial.
• Learning curve: You’ll need to become familiar with the features and
capabilities of your stack and its components.
• Shared responsibility: While off-the-shelf components greatly simplify DBaaS
management, the shared responsibility model continues.
9. 9
Section II:
DIY DBaaS in practice
Creating a Do-It-Yourself Database as a Service (DIY DBaaS) platform is
a significant endeavor that can provide a flexible and scalable solution for
managing databases. There are more choices to be made here than in a
choose your own adventure book.
In our Developers Guide to Sovereign DBaaS, we cover each point with
recommendations and technical detail. Here, we will discuss the actual
building of your own Sovereign DBaaS from the ground up, from the
fundamental points you need to consider when building a DBaaS to system
design considerations (using Dapr to illustrate) and what a provisional
architecture will actually look like when developed.
To determine our architectural choices, we will consider this prospective
DBaaS through the lens of Day 2 operations so that we are left with a
reliable and scalable DBaaS.
The independent route entails procuring your own infrastructure, developing
the software and managing the entire solution yourself giving you full control, but
also all that it entails. The interdependent one gives you greater flexibility and faster
time-to-value by allowing you to incorporate off-the-shelf infra and components
but still presents some lock-in and can require additional knowledge in handling the
components themselves.
Now that we understand that DBaaS is an implementation model, the differences
between the traditional and Sovereign models, and the pros and cons of going the
independent or interdependent route when choosing the latter, let’s get to the actual
building of your platform, starting with environment, elements and design principles.
10. 10
Foundation points: DBaaS environment, elements
and design principles
Environment
Where your DBaaS will live breaks out into three categories that can be
selected for use as mono-environments or as hybrid ones:
• Physical (owned)
Physical locations offer more control over the infrastructure but are
often implemented regionally due to cost and may require additional
maintenance and security measures.
• Co-location (leased)
Leasing space in one (or more) data centers allows you to own, provide and
configure your own hardware as well as benefit from the management of
the hardware by experienced staff.
• Public cloud (PAYG)
Cloud-based solutions provide scalability and lower upfront costs, but
you’ll need to trust a third-party provider with your data, as well as a
higher likelihood of using proprietary technologies.
Choosing your environment is no easy decision, as each has its own up- and
downsides. For instance, your own data centers give you maximum control,
but the capital and operational expenses can be prohibitively substantial,
especially if you have a geographically spread customer base. Going with a
public cloud environment provides maximum flexibility and transfers CAPEX
to OPEX but then the providers’ shared responsibility model may represent
an intolerable risk profile, not to mention potential regulatory issues that are
constantly shifting, especially with regard to data sovereignty.
And then there is the co-location facility, which could represent the ideal
middle ground because you’re mitigating CAPEX while enjoying some of the
control features of the on-prem environment along with the management and
elasticity benefits of the public cloud. Either way, it is likely that you will be
best served implementing a hybrid model.
Elements
The environment you ground your DBaaS in and the components you use to
actually animate it influence one another, so it’s important to consider the
latter while you are determining where you want to host it. Ultimately, you
want to weave in sovereign principles so that you are environment agnostic,
i.e. you want to ensure that whatever elements you choose and however you
implement them are not absolutely dependent on the environment/s. Let’s
make a quick pass over the fundamental elements you’ll use to actually create
and operationalize the service itself:
11. 11
Platform
The platform, e.g. Kubernetes and OpenStack, will not only dictate how
you design your DBaaS but influence how you manage and orchestrate its
underlying components.
Kubernetes continues to increase in adoption and is available on almost every
public cloud provider and all have the same core APIs available as the open-
source tool. This also allows K8s to be installed on-prem or even on developer
machines for reproducible environments.
The growth of Kubernetes in the past decade makes it a fairly common skill
among developers; ensuring that organizations can grow their experienced
engineering teams with the right skills, instead of asking for experience with
a particular cloud or subset of features that are inconsistently named and not
equally implemented between providers.
Compute
• Bare metal
These are physical servers dedicated entirely to your DBaaS, offering
maximum performance and control. However, they can be more expensive
and harder to scale.
• Virtual machines (VM)
VMs run on shared hardware, offering a balance between performance
and cost. They are popular because they are a standard compute resource
in public clouds and leased data centers, which helps in avoiding vendor
lock-in. Additionally, VMs can be easily scaled but their performance may
be affected by other VMs running on the same host.
• Containers
Containers are lightweight and fast, making them ideal for quickly
deploying and scaling instances. They can be easily managed using
platforms like Kubernetes but may have limitations in terms of isolation
compared to VMs.
Storage
Storage types
• Attached: This refers to storage directly connected to the server or
VM, offering high performance but limited scalability.
• Network: Network storage is accessed over a network, providing
greater scalability but potentially lower performance due to latency.
• Hot/Cold/Warm: These terms refer to the speed and accessibility of
data. Hot storage is readily accessible and offers high performance,
while cold storage is slower and more cost-effective for long-term
data storage. Warm storage is a middle ground between the two.
12. 12
Storage configuration options
• Clustered access filesystems: Clustered filesystems allow multiple
servers to access the same storage simultaneously, improving
redundancy and fault tolerance.
• Single-access filesystems are the most common option and are
designed to be accessed by one server at a time.
Networking
• Public/Private: Public networks are accessible to anyone, while private
networks are restricted to specific users or devices. Your choice depends on
the level of security and access control you require.
• VPN/VPC: These are different methods of creating secure connections
between networks or devices. VPNs (Virtual Private Networks) and VPCs
(Virtual Private Clouds) create secure connections between networks,
while Wireguard is a modern VPN protocol that offers improved
performance and security.
Design principles
To build a system that aligns with Day 0 requirements and user objectives,
we need a high-level system architecture that encapsulates a set of crucial
architectural decisions, which will serve as the cornerstone of our design,
fostering a platform that is agile, responsive, and efficient. While we will not
delve deeply into every aspect, these principles will guide the architectural
choices we make to build a modern system:
• Cloud native
Embrace cloud-native principles, leveraging the inherent advantages
offered by cloud computing. Prioritize scalability, resilience, agility, and the
concept of immutable infrastructure. By harnessing cloud services, we can
optimize performance and cost-efficiency.
• Event-driven
Adopt an event-driven architecture to ensure loose coupling, scalability,
and real-time responsiveness. This approach empowers us with the
flexibility to construct and maintain distinct services, enhancing modularity
and facilitating seamless communication through events.
• Independently deployable services
Clearly define the responsibilities and boundaries of each service to foster
agility, isolation, and straightforward development and deployment of new
features.
• Service discovery
Implement service discovery mechanisms to enable services to dynamically
locate and communicate with one another. Eliminate the need for
hardcoding network addresses or specific locations, promoting adaptability
and flexibility in the system.
13. 13
• Agent-based
Embrace an agent-based approach to infuse the system with autonomous
edge intelligence and decentralized decision-making. This may involve
the integration of AI and other intelligent agents, which can operate
independently to enhance system performance and adaptability.
• Monitoring and observability
Prioritize comprehensive monitoring and observability by implementing
continuous and systematic data collection and metrics tracking. This
data-driven approach is essential for gaining insights into the behavior
and performance of the platform, facilitating issue identification, resource
optimization, and reliability assurance.
• DevOps and CI/CD
Seamlessly integrate DevOps practices and continuous integration
and continuous deployment (CI/CD) pipelines into the development
and deployment workflows. This streamlined approach ensures rapid
development cycles, rigorous testing, and efficient delivery of new features
and updates.
Now that we understand the environment, elements and underlying principles
that you will use to inform your architectural decisions, you need a rubric for
making the practical decisions while you build. We will start with the end
state, what does day 2 look like?
To determine that, we need to know the purpose of what we’re trying to build.
Implementation details will vary, but there is usually a fundamental ground
truth that every implementation builds off of. For a DBaaS, we’re ultimately
trying to achieve efficient, reliable database operations at scale through the
use of automation.
Utilizing a Day 2 approach is practical as it allows you to focus on automating
operational tasks and gradually build a comprehensive, robust, extensible
platform.
DBaaS routines and blueprint:
the Day 2 framework
What are Day 2 operations? They are
the ongoing and challenging aspects of
maintaining the reliability, performance,
and security of your databases in a
production environment. Here’s a closer
look at some of the essential ‘Day 2’
routines:
14. 14
Day 2 ops routines
Scaling and high availability
As your data and workload grow, scaling is necessary to ensure performance.
Implement mechanisms for horizontal scaling (adding more nodes or instances)
and vertical scaling (increasing resources on existing nodes). Ensure database
high availability by leveraging monitoring and alerting tools alongside
automated failover and recovery mechanisms.
Monitoring and alerting
Continuous monitoring of your databases is crucial to identify performance
issues, bottlenecks, and potential security threats. Implement monitoring
agents that collect data on various aspects of database health and
performance.
Set up alerts and notifications to proactively detect and respond to potential
issues. Alerts should be configured for specific thresholds and critical events.
Backups for onsite and offsite storage
Regular backups are essential to protect your data. Implement automated
backup processes with options for both onsite and offsite storage to ensure
data recovery in case of data loss or disasters.
Point-in-time recovery
Point-in-Time Recovery allows you to restore a database to a specific moment
in time. Develop mechanisms to support this, especially for databases with
stringent recovery point objectives (RPOs).
Upgrading and patching
Stay up-to-date with the latest patches and upgrades for your database
software. Develop a process for testing and rolling out updates, ideally with
minimal downtime.
Access control / user access
Control and manage user access to databases by implementing robust access
control measures. This includes user authentication, authorization, and role-
based access controls.
Data migration (on-premises to cloud)
If your databases need to migrate from on-premises to the cloud or between
cloud providers, a strategy is needed and tools for efficient data migration
while minimizing downtime and data loss.
‘Day 2’ operations require ongoing attention, and it’s advisable to use
automation wherever possible to streamline them. Additionally, documenting
processes and creating runbooks will help ensure clear procedures to follow in
various scenarios.
15. 15
Day 2 ops blueprint
Here’s a high-level blueprint for developing a DBaaS from a Day 2 operational
aspect:
Platform architecture
Beginning with the overall vision for what type of system we want to build
starts with Day 0 requirements and user objectives, as serviced by the
following components:
• Control plane
The central management and orchestration layer.
• Data plane
The layer responsible for hosting and managing the actual databases.
• Agents
Agents installed on database nodes for monitoring, patching, and
management.
• Authentication and authorization
Implement user access controls and security measures.
Database provisioning
Develop a provisioning system that allows users to create new database
instances and use various vendors.
Monitoring and alerting
Implement monitoring agents that collect data on database performance,
resource utilization, and security. Set up alerts to notify administrators or users
of potential issues.
Backup and recovery
Create a backup and recovery system that automates regular backups,
retention policies, and restoration processes.
Scaling and high availability
Design mechanisms for horizontal scaling and high availability to ensure
database performance and uptime.
Upgrade and patch management
Develop a system for managing database software updates and patches,
including rolling upgrades.
16. 16
Security
Implement security measures, such as access controls, encryption, and
vulnerability assessments, to protect data and ensure compliance.
API integration
Consider integrating your platform with other tools and services, such
as container orchestration platforms, identity management systems, and
monitoring solutions.
Self-service user portal
Create a user-friendly web portal or API that allows users to provision and
manage databases, set configurations, and access performance metrics.
Lastly, because we are proposing a loosely decoupled event-driven services
architecture, we will leverage a Dapr runtime (profiled below), which provides
building blocks that are designed to simplify common challenges in application
development and services architecture.
Solution spotlight — abstracting the event-driven
architecture with Dapr
Dapr, which stands for Distributed Application Runtime, is a versatile and
event-driven runtime designed to simplify the development of applications.
Originally incubated by Microsoft, it has since become a part of the Cloud
Native Computing Foundation (CNCF), underscoring its relevance and adoption
in the cloud-native ecosystem.
Dapr offers a collection of building blocks that empower developers to
create resilient, stateless, and stateful applications more easily. The blocks
are fundamental components that streamline various aspects of application
development and include:
• Service invocation
Simplifies the process of invoking services, whether they are running
locally or remotely, without having to deal with complex service discovery
or network communication logic.
• State management
Offers a straightforward and consistent way to manage application state,
regardless of where it’s stored (e.g., databases, caches, or file systems).
This makes building stateful applications more intuitive.
• Publish-subscribe messaging
Enables seamless communication between application components using
publish-subscribe patterns, enhancing event-driven architecture and loose
coupling.
17. 17
• Resource bindings
Abstracts the integration with external resources such as databases,
message queues, and storage systems. This allows developers to access
these resources without worrying about the underlying specifics.
• Secrets management
Provides a secure and unified approach to manage application secrets,
ensuring that sensitive information like API keys and passwords remain
protected.
• Actors
Implements the actor model to simplify the development of stateful
applications by offering a higher-level, object-oriented abstraction for
managing state and processing.
• Virtual actors
Extends the actor model by introducing the concept of virtual actors, which
can be used to build stateful, distributed, and scalable applications with
automatic sharding and activation.
• Observability
Enhances application monitoring and debugging by offering built-in
instrumentation and observability features that facilitate the collection of
metrics, traces, and logs.
• Bindings for external systems
Provides a variety of pre-built bindings for popular external systems,
enabling easy integration with services like Azure Functions, AWS Lambda,
and more.
• Middleware
Offers middleware components that can be used to enhance request
and response processing in the application, supporting features like
authentication, retries, and tracing.
By using Dapr components, developers can focus on building application
logic rather than dealing with the intricacies of distributed systems, making it
easier to create robust, cloud-native applications that can scale and adapt to
changing requirements.
18. 18
Dapr is platform-agnostic, allowing you to run your applications in various
environments, including local development machines, Kubernetes clusters, and
other hosting platforms where Dapr is installed — this versatility gives you
the flexibility to create adaptable services that can operate seamlessly in both
cloud and edge computing scenarios.
Naturally, you don’t have to incorporate any particular solution into your stack
and you could build everything from scratch; but, that is not feasible for most
or even preferable for any. The goal is not to remove all dependencies, which is
impossible, but to weave sovereignty into your stack so you can configure and
move your workloads at will.
Therefore, pick and choose off-the-shelf solutions where and when they make
sense. Now that you understand the Day 2 Ops framework and what the high-
level blueprint looks like when building from it, we can look at implementing
the specific ops in detail.
The Day 2 ops
framework:
operational
guidelines
The operational routine that kicks off
the DBaaS ops milieu is provisioning
and deployment. At its most
essential, it involves provisioning the
infrastructure resources that your
database will live on and deploying
your database atop them.
19. 19
Op 1 — Database provisioning and deployment
Provisioning can be performed in on-prem, cloud and hybrid environments, and
should include:
• Resource allocation
Determining and assigning the necessary hardware resources (such as
servers, storage, and networking equipment) and software resources (such as
operating systems and databases) to support a specific application or service.
• Configuration
Defining the configuration settings, security policies, and performance
parameters required for the infrastructure components. This may involve
setting up virtual machines, configuring network devices, and tuning
hardware to meet specific requirements.
• Software installation
Installing and configuring the necessary software components, including
application software, middleware, and system software. This step ensures
that all required software dependencies are in place.
• Network configuration
Configuring network connectivity, including IP addresses, subnets, firewall
rules, load balancers, and other network-related settings to ensure that
applications and services can communicate effectively.
• Security setup
Implementing security measures such as access control, encryption,
authentication, and auditing to protect the infrastructure and data from
unauthorized access and potential threats.
• Monitoring and management
Integrating tools and systems for monitoring and managing the
infrastructure. This includes setting up monitoring agents, alerts, and
performance tracking to ensure the infrastructure operates efficiently.
• Scaling and elasticity
Depending on the requirements, provisioning may include configuring the
infrastructure for scalability and elasticity, enabling it to handle changing
workloads and resource demands effectively.
• Automation
In modern IT operations, automation plays a significant role in
infrastructure provisioning. Tools like configuration management systems
and infrastructure as code (IaC) scripts enable automated, repeatable
provisioning processes.
• User interfaces
Interfaces fall under three types: CLIs, APIs, and GUIs. Including all three
is standard for retail DBaaS. For an internal DBaaS, your customers will
be your own engineering teams, so providing an API-first approach when
developing the platform will be key.
20. 20
The ‘Infrastructure Service’ is primarily responsible for provisioning virtual
machines or system containers that form the basis for the database nodes.
These virtual machines are created from preconfigured image templates
preinstalled with the exact software versions of database vendor packages and
agents that provide features such as backups and restore, automatic failover,
upgrades, monitoring, and more. Additionally, the Infrastructure Service
handles the provisioning and management of other resources in private or
public cloud infrastructure, which includes virtual private networks, storage
volumes, and their continued maintenance.
The ‘Service Catalog Service’ provides a range of preconfigured and user-
generated image templates used for launching a Database Service. Its
primary aim is to maintain consistency in the deployment and management of
database services.
A prospective, developer-friendly provisioning workflow could look like this:
Infrastructure teams can integrate their DBaaS with existing Git workflows,
which means no additional users to manage, no additional interfaces to
develop or services to deploy.
Developers would request a new resource by creating, or modifying, a
Terraform plan that is reviewed by a member of the Infrastructure team and
deployed, once approved. Monitoring is automatically set up and automated
rules are put in place for teams and projects to ensure the correct hardware,
regions and security rules are used.
Op 2 — Lifecycle management and high
availability using an autopilot pattern
Utilizing a default autopilot pattern with a set of generic handlers is a
structured and flexible approach to managing the lifecycle and health of
database servers, for example with MySQL primary-replica deployments.
Below is a breakdown of the key handlers and their functions:
21. 21
• preStart
This handler is invoked before starting the targeted service or application.
It serves as a preparatory phase, allowing for any necessary actions or
configurations to be applied in advance of service initiation.
• health
The health handler performs periodic health checks on the service or
application. It assesses the system’s well-being, ensuring that it is in a
good and operational state. Health checks can include checks for database
connectivity, resource availability, or other crucial factors.
• onChange
The onChange handler is called when changes occur in a subscribed state.
This handler is instrumental in maintaining real-time responsiveness and
adaptability. It can trigger actions in response to dynamic changes in the
environment, such as failover events in a primary-replica cluster.
• preStop
Before stopping the service or application, the preStop handler is executed.
It provides an opportunity to perform any cleanup or finalization tasks to
ensure a graceful shutdown.
• postStop
After the service or application has been successfully stopped, the
postStop handler is invoked. This phase can be used for additional cleanup
or post-shutdown activities.
The beauty of this approach is its flexibility. Each handler can be configured
to run any external application or script, and this configuration is simplified
through the use of YAML. This means that your system can adapt and evolve
by defining custom actions or processes for each handler, tailoring them to
your specific needs.
Classic primary-replica deployments, grouped into clusters with unique global
names, ensures that the approach is well-suited for managing database
service replication, high availability, and dynamic changes.
Bootstrapping a database node
The subsequent steps provide an overview of what the agents undertake to
determine their roles in a cluster setup.
Upon startup, the agent will do the following actions:
1. Subscribe to state changes for the cluster
2. Get the latest stored cluster state and check if there is a primary node
• Start the database node as a primary if there is no cluster state or if
there is no active primary
• Attain a lock to update the cluster state so no other nodes can
update it until this node has become the primary
• Check if there is a backup that should be used to restore/rebuild the
node otherwise just initialize as new primary
3. Update the cluster state again with new updated state, i.e., the primary
node and replication info
22. 22
4. Unlock the cluster state so that other nodes can write to it
5. Write a ‘lock file’ on the host which indicates that it has been initialized /
bootstrapped
6. Mark the node as primary and post a cluster state change event for the
cluster
7. Primary node is now active and running
Replicas will bootstrap with a similar process as the primary:
1. Subscribe to state changes for the cluster
2. Get the latest stored cluster state and check if there is a primary node
3. Wait until the cluster state lock is unlocked. Wait for a new cluster state
change event.
4. Get the primary node and replication info from the cluster state
5. Check if there is a backup that should be used to restore/rebuild the node
otherwise just initialize as new replica
6. Start the database node as a replica and set up replication with the
primary node
7. Mark the node as replica, lock and update the cluster state with the
replica node info
8. Write a ‘lock file’ on the host which indicates that it has been initialized /
bootstrapped
9. Unlock the cluster state so that it can be written to
10. Replica node is now active and running
Health checks
The health handler plays a pivotal role in determining whether the node should
undergo the bootstrapping process or proceed with standard health checks.
• Check if this node has been bootstrapped/initialized by searching for
the ‘lock file’ on the host:
If not found, initiate the node bootstrap as previously demonstrated.
• Perform regular health checks at specified intervals:
Monitor the node’s health by assessing its process status, connection
status, and replication status.
• Update the cluster and node’s state with a Time-to-Live (TTL) of, for
example, 10 seconds:
POST requests to update the state, including cluster state and individual
node state.
• If I am the primary node, update the primary state before the TTL
expires:
POST request to update the primary state of the node.
• If I am the primary node, publish any state changes that may affect the
replicas:
POST request to broadcast state changes that could impact replica nodes.
23. cluster {
id: 1,
namespace: production,
project: bluebird,
cluster_name: mybillingapp,
last_updated: 1696494655,
ttlseconds: 10,
nodes: [
{
id: 1,
name: db-node-1,
status: Online,
ip_address: 192.168.1.101,
role: Primary
},
{
id: 2,
name: db-node-2,
status: Online,
ip_address: 192.168.1.102
role: Replica
},
{
id: 3,
name: db-node-3,
status: Offline,
ip_address: 192.168.1.103,
23
• If I am a replica node, regularly check the primary node state at TTL
intervals for any signs of failure:
If there is no available primary state to retrieve, initiate a failover procedure.
• If I am a replica node, monitor primary state event changes with a
locally cached version:
If changes are detected, trigger a failover procedure to address the evolving
state of the primary node.
Automated failover
The agents running on the replicas continuously monitor the primary node for
any changes by subscribing to state changes. In the event of a change on the
primary node, such as an IP address modification, the ‘onChange handler’ is
triggered to execute a failover procedure.
Given that all replica nodes will be notified of the state change, it becomes
crucial to establish a mechanism for coordination to ensure that only one
node initiates the failover. A straightforward solution is to employ a global
or distributed lock for synchronization purposes. This lock ensures that only
a single node is authorized to execute the failover, preventing conflicts and
ensuring a smooth transition in the event of primary node changes.
The first replica node that is able to acquire the lock will become the primary.
1. Marks the node that has the lock as primary and updates the cluster state
2. After trying and failing to acquire the primary lock, the other replica nodes
will wait until a new primary state event is received
3. The agent then changes the replication source to the new primary node
26. 26
Op 3 — Observability
Observability (O11y) is a crucial aspect of building a DBaaS solution because
it enables organizations to effectively monitor, understand, and optimize their
database infrastructure. Observability goes beyond basic monitoring and
alerting and focuses on understanding the behavior and performance of your
systems, services, and applications in real-time and through historical analysis.
It can be broadly classified into two main areas: Compute and Software.
1. Compute o11y concerns the performance of the underlying hardware
infrastructure, such as CPU, RAM and disk usage.
2. Software o11y concerns the performance and behavior of the services
and applications running on your hardware. Metrics of interest here might
include memory consumption by various processes and the number of open
network connections.
By implementing a robust observability framework, businesses gain valuable
insights into their database’s performance, identify and troubleshoot issues
quickly, and make data-driven decisions to enhance the overall efficiency,
reliability, and security of their DBaaS. Embracing observability principles
ensures that organizations can maintain a high-quality database service,
ultimately contributing to improved application performance and end-user
experience. O11y practices span from basic best practices (logging, metrics,
alerting) to more advanced options specific to each type of database.
Logs (syslog)
Logging is a fundamental aspect of observability. Syslog is a widely-used
standard for message logging in DBaaS solutions, providing a consistent
format for log messages and enabling the efficient management and analysis
of log data.
Metrics and events (Telegraf, other exporters)
Metrics are essential for monitoring the performance and health of a DBaaS
solution. Collecting and storing various metrics at regular intervals, such as
resource utilization, throughput, etc., provides ongoing insights into the entire
system’s overall performance.
27. 27
Metrics
Datadog for example taxonomizes metrics out into two types: work and
resource metrics. The former help teams assess and intervene on the
performance and reliability of the system. They are broken out into four
subtypes:
• Throughput
A measure of capacity, this measures how much work a system can execute
within a specified amount of time.
• Success metrics
A measure of reliability, these measure the proportion of work that was
executed successfully without errors or issues.
• Error metrics
Another measure of reliability, these are measured separately from success
metrics to help isolate, diagnose and intervene on problems.
• Performance metrics
A measure of system responsiveness and efficiency, these are various metrics,
such as latency, which can be presented as an average or percentile.
On the other hand, resource metrics focus on the underlying infrastructure’s
health and efficiency. Here are the key areas to consider when collecting
resource metrics:
• Utilization
A time or capacity-based reliability measurement, these metrics can
indicate whether or not a resource is operating near or at its limits.
• Saturation
Measuring back-pressure, or the amount of requests that haven’t been
serviced yet, these can indicate constraints and scalability issues.
SUBTYPE DESCRIPTION VALUE
THROUGHPUT REQUESTS PER SECOND 312
SUCCESS PERCENTAGE OF RESPONSES THAT ARE 2XX SINCE LAST MEASURMENT 99.1
ERROR PERCENTAGE OF RESPONSES THAT ARE 5XX SINCE LAST MEASUREMENT 0.1
PERFORMANCE 90TH PERCENTILE RESPONSE TIME IN SECONDS 0.4
SUBTYPE DESCRIPTION VALUE
THROUGHPUT QUERIES PER SECOND 949
SUCCESS PERCENTAGE OF QUERIES SUCCESSFULLY EXECUTED SINCE LAST MEASUREMENT 100
ERROR PERCENTAGE OF QUERIES YIELDING EXPECTATIONS SINCE LAST MEASUREMENT 0
ERROR PERCENTAGE OF QUERIES RETURNING STALE DATA SINCE LAST MEASUREMENT 4.2
PERFORMANCE 90TH PERCENTILE RESPONSE TIME IN SECONDS 0.02
EXAMPLE WORK METRICS: WEB SERVER (AT TIME 2016-05-24 08:13:01 UTC)
EXAMPLE WORK METRICS: DATA STORE (AT TIME 2016-05-24 08:13:01 UTC)
Source: Datadog
28. 28
• Errors
These measure internal errors that may not be immediately observable in
the resource’s output, allowing for proactive intervention.
• Availability
An accessibility measurement, these show the percentage of time that a
resource is responsive and able to fulfill requests.
Events
Unlike continuous metrics, events capture notable points in time, such as
changes and anomalies, that can provide essential context for diagnosis and
response. They are especially valuable because they pinpoint what happened
at a specific point in time and can be interpreted on their own. Here are some
examples of noteworthy events:
• Changes
Events related to code releases and builds provide insights into the
evolution of your software and can help track the impact of changes on
system behavior.
• Alerts
Alerts notify relevant parties when something requires immediate
attention.
• Scaling events
These help track resource provisioning and scaling activities.
RESOURCES
DISK IO
MEMORY
MICROSERVICE
DATABASE
UTILIZATION
% TIME THAT
DEVICE WAS BUSY
% OF TOTAL MEMORY
CAPACITY IN USE
AVERAGE % TIME
EACH REQUEST
SERVICING THREAD
WAS BUSY
AVERAGE % TIME
EACH CONNECTION
WAS BUSY
SATURATION
WAIT QUEUE LENGTH
SWAP USAGE
# ENQUEUED
REQUESTS
# ENQUEUED QUERIES
ERRORS
# DEVICE ERRORS
N/A (NOT USUALLY
OBSERVABLE?
# INTERNAL ERRORS
SUCH AS CAUGHT
EXCEPTIONS
# INTERNAL ERRORS,
E.G. REPLICATION
ERRORS
AVAILABILITY
% TIME WRITABLE
N/A
% TIME SERVICE IS
REACHABLE
% TIME DATABASE IS
REACHABLE
Source: Datadog
WHAT HAPPENED
HOTFIX F464BFE RELEASED
TO PRODUCTION
PULL REQUEST 1630
MERGED
NIGHTLY DATA ROLLUP
FAILED
TIME
2016-04-15 04:13:25 UTC
2016-04-19 14:22:20 UTC
2016-04-27 00:03:18 UTC
ADDITIONAL INFO
TIME ELAPSED: 1.2 SECONDS
COMMITS: EA72D6
LINK TO LOGS OF FAILED JOB
Source: Datadog
29. 29
Alerting
Implementing automated alerting helps monitor the DBaaS solution
continuously, detecting and notifying the relevant personnel of any anomalies
or issues that may require immediate attention. The key principles for effective
alerting are as follows:
• Page on symptoms, rather than causes
Alerts are meant for intervention, not diagnosis. An example of a useful
alert is, Two MySQL nodes are down.
• Alert liberally; page judiciously
Not all alerts should result in immediate intervention, you should create a
tiered system based on their severity.
Following these principles will ultimately prevent alert fatigue and increase
their utility.
Authoring your monitoring solution is unnecessary. Instead, we will opt for a
specialized performance monitoring vendor, which could be an open-source or
commercial provider offering an agent-based solution. This approach allows us
to include a monitoring agent with each node on our platform.
For instance, DataDog is a suitable example as it supports the OpenTelemetry
framework. It can be seamlessly integrated with Dapr to transmit telemetry
data to a Datadog backend while also monitoring key metrics for hosts and
databases.
DATA ALERT TRIGGER
WORK METRIC: THROUGHPUT PAGE VALUE IS MUCH HIGHER OR LOWER THAN USUAL OR THERE IS AN ANOMALY
WORK METRIC: SUCCESS PAGE PERCENTAGE OF WORK THAT IS SUCCESSFUL DROPS BELOW THRESHOLD
WORK METRIC: ERRORS PAGE THE ERROR RATE EXCEEDS A THRESHOLD
WORK METRIC: PERFORMANCE PAGE WORK TAKES TOO LONG TO COMPLETE (PERFORMANCE VIOLATES SLA)
RESOURCE METRIC: UTILIZATION NOTIFICATION APPROACHING CRITICAL RESOURCE LIMIT
RESOURCE METRIC: SATURATION RECORD NUMBER OF WAITING PROCESSES EXCEEDS A THRESHOLD
RESOURCE METRIC: ERRORS RECORD NUMBER OF INTERNAL ERRORS DURING EXCEEDS THRESHOLD
RESOURCE METRIC: AVAILABILITY RECORD RESOURCE IS UNAVAILABLE LONGER THAN THRESHOLD
EVENT: WORK-RELATED PAGE CRITICAL WORK THAT SHOULD HAVE BEEN COMPLETED IS REPORTED AS
FAILED OR INCOMPLETE
Source: Datadog
30. 30
Observability spotlight: database query performance
Database queries are a key influencer of database and resource performance;
therefore you should not forget to include their tracking in your observability
plan. To get you started, we’ve included a selection of common databases and
their tooling here:
• MySQL
MySQL’s query performance can be monitored using the Performance
Schema that provides detailed statistics on performance and resource
usage. It helps in identifying and troubleshooting performance bottlenecks.
Another useful tool is MySQL Enterprise Monitor, which offers real-time
monitoring, performance analysis, and security features specific to MySQL.
Additionally, the open-source Percona Monitoring and Management (PMM)
tool can be leveraged to gain insights into MySQL’s performance and
resource utilization.
• MariaDB
MariaDB’s query performance can be monitored using tools like the
Performance Schema and the Slow Query Log. These tools help identify
slow queries, track query execution times, and gather other performance-
related metrics. MariaDB also offers advanced observability features and
tools to ensure optimal database performance.
• PostgreSQL
PostgreSQL provides tools like pg_stat_statements and the built-in
extension pg_stat_activity for monitoring query performance. These
tools track query execution times, slow queries, and other performance
metrics. PostgreSQL also provides advanced observability tools and
practices tailored to its specific architecture. For example, pgBadger
analyzes PostgreSQL log files and generates detailed reports on database
performance.
31. 31
• MongoDB
MongoDB offers a variety of tools to monitor query performance, like the
built-in MongoDB Database Profiler, which provides detailed information
about the execution of database operations. The MongoDB Management
Service (MMS) is also available and provides a web interface for monitoring
performance metrics in real-time. It allows users to visualize slow queries
and aids in identifying potential bottlenecks in the system.
• Redis
Redis offers the MONITOR command and the INFO command with various
sections like commandstats and latency for monitoring query performance.
These commands offer insights into command execution, latency, and
other performance-related metrics. Furthermore, Redis Monitor, a built-
in command, provides real-time insights into Redis commands being
executed, enabling users to detect performance issues and bottlenecks.
Op 4 — Backup and recovery
Implementing a robust backup and recovery solution is of paramount
importance for any database infrastructure. An agent-based backup solution,
designed to be self-sustainable and independent, exhibits key principles
and decisions to ensure its resilience. Here’s a breakdown of the decisions to
achieve this:
• Local persistent storage
Storing backup schedules and backup job configurations locally ensures
that your backup agent can function autonomously, even if the central
control plane becomes unavailable. This local storage provides resilience
and allows scheduled backups to continue without interruption.
• Encrypted credentials
Encrypting and storing credentials locally on the host is a security measure
that minimizes external dependencies. This approach mitigates the risk
associated with a remote secrets management solution and enhances data
security. In the event of a security breach on the control plane, only the
database credentials stored locally are potentially exposed, limiting the
impact of such an incident.
• Dedicated backup database user
The use of a dedicated backup database user with appropriate permissions
is crucial for the agent to execute backup and restore operations. This user
should have the necessary access to perform these tasks while minimizing
potential security risks.
• Flexibility in backup methods
The backup agent is designed to be flexible and versatile, capable of
supporting a range of different backup methods and parameters. This
adaptability allows it to cater to the diverse backup requirements of various
database technologies and open-source alternatives.
32. 32
• Domain knowledge
In some cases, the agent might need to possess domain knowledge of the
specific database technology being backed up or restored. This expertise
ensures that the backup process is tailored to the intricacies of the
database system, optimizing the integrity and efficiency of the backups.
• Local embedded database (e.g., SQLite)
The use of a local embedded database, such as SQLite, for storing
schedules, job configurations, logs, and backup records, further enhances
the autonomy and resilience of the agent. This database provides a
reliable repository for critical information, even when the control plane is
unavailable.
• Data synchronization
To ensure data integrity and facilitate collaboration with other clients and
services in the platform, the agent periodically sends logs and records
back to the control plane. This synchronization process enables other
components of the system to access and utilize the collected data for
various purposes.
The backup agent achieves a level of self-sufficiency and independence that
is crucial for robust backup and recovery processes. It ensures that backup
operations continue seamlessly, even in the face of potential control plane
disruptions.
Data structures examples
A backup job for the agent could have the following structure:
Backup job schema:
Job name: # A unique name for the backup job.
Description: # An optional description of the job.
Schedule:
Frequency: # How often the backup job runs (e.g., daily, weekly, monthly).
Timing: # Specific time or timing window for the job (e.g., 2:00 AM UTC).
Retention policy: # How long backups are retained (e.g., 7 days, 30 days, indefinitely).
Source:
Data source type: # Type of data or resource being backed up
(e.g., file system, database, virtual machine).
Source location: # Path or location of the data/resource to be backed up.