This document discusses high availability strategies for MariaDB databases. It defines high availability as a system that is continuously operational for a desirably long period of time. It then examines different levels of availability based on uptime percentages and corresponding downtime. Various high availability components are described, including data redundancy, failover/switchover solutions, and monitoring. Asynchronous and synchronous replication techniques are summarized, along with how MaxScale can implement read/write splitting and failover automation. The benefits and limitations of Galera cluster synchronous replication are also provided.
Hhm 3479 mq clustering and shared queues for high availabilityPete Siddall
we review clustering and shared queue technologies, their differences and synergies, as a foundation for building a highly available messaging service with resilience during both planned and unplanned outages of z Systems components.
Höchste Datenbankleistung durch Anpassung und OptimierungMariaDB plc
This document discusses various techniques for maximizing database performance through tuning and optimization. It begins by covering general best practices like defining service level agreements. It then discusses optimizing the server, storage, network and operating system configuration. Next, it addresses optimizing connection pooling and MariaDB configuration settings. The document concludes by providing recommendations for query tuning techniques like analyzing slow queries, improving indexing strategies, and using tools like the slow query log and performance schema.
Training Slides: Basics 102: Introduction to Tungsten ClusteringContinuent
This document provides an introduction to Continuent Tungsten clustering. It discusses key benefits like high availability, multi-site deployment, and ease of use. It examines the clustering architecture including topologies, automatic and manual failover, and rolling maintenance procedures. Commands for monitoring and managing the cluster are also reviewed, including cctrl and tpm diag. A demo shows using cctrl to perform a manual failover by promoting a slave to master.
Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.Suneet Grover
The document summarizes 24/7 Customer's experience migrating from Apache Kafka 0.7 and 0.8 to the newer 0.10.0.1 version. It describes the challenges faced with sticky partitions and range-based mirror makers in 0.8. It details 24/7 Customer's upgrade path from 0.8 to 0.8.2.2 to 0.9 to the current 0.10.0.1 version. It also discusses the configurations, monitoring, and design considerations for running Kafka reliably across multiple data centers.
Database and Public Endpoints redundancy on AzureRadu Vunvulea
The document discusses various techniques for implementing redundancy on Azure platforms like SQL Database, storage, endpoints, and virtual machines. It defines redundancy as duplicating critical components to increase reliability. Specific strategies covered include AlwaysOn for SQL, locally redundant and geo-redundant storage, load balancers and Traffic Manager for endpoints, and availability sets and failover clusters for virtual machines. The document emphasizes calculating required uptime, automating processes, and having disaster recovery plans to ensure redundancy meets reliability goals.
EDBT2015: Transactional Replication in Hybrid Data Store Architecturestatemura
This document discusses transactional replication from an RDBMS to a key-value store (KVS) to offload read-only workloads. It presents a solution that allows concurrent execution of transactions on the KVS while guaranteeing the predefined execution order from the transaction log. An implementation was created using MySQL, ActiveMQ, and Voldemort and experiments showed increased throughput from concurrency up to a point and more conflicts with more concurrent threads, though using more KVS nodes reduced conflicts.
Continuent Tungsten - Scalable Saa S Data Managementguest2e11e8
The key needs of SaaS vendors include:
i) managing multi-tenant architectures with shared DBMS, ii) maintaining customer SLAs for uptime and performance and iii) optimized, efficient operations.
The key benefits Continuent Tungsten offers SaaS vendors are:
i) high availability and protection from data loss, ii) simple, efficient cluster management and iii) enable complex database topologies.
Tungsten offers high-availability, database cluster management and management of complex topologies for multi-tenant architectures.
Tungsten high availability and data protection features include maintaining live copies with data consistency checking and tightly coupled backup/restore integration with cluster management tools.
Tungsten cluster management allows SaaS vendors to migrate customers and perform system upgrades without downtime, thus enabling these maintenance operations during normal business hours.
Tungsten also enables complex replication topologies, including data filtering and data archiving strategies, maintaining extra data copies for data-marts, routing different customers to different DBMS copies, and providing cross-site multi-master replication.
Hhm 3474 mq messaging technologies and support for high availability and acti...Pete Siddall
The document discusses concepts of business continuity including high availability, continuous serviceability, and continuous availability across sites. It then discusses how messaging technologies like IBM MQ can provide various levels of business continuity. Specifically, it provides examples of how MQ can enable active-active configurations across multiple sites for continuous availability through data synchronization and workload distribution. This allows no downtime even during planned or unplanned events.
Architecting for the cloud cloud providersLen Bass
The document discusses cloud providers and services available on Amazon Web Services. It provides an overview of compute, storage, database, and other services and how they can provide redundancy across availability zones and regions. Examples are given of different outage scenarios that can occur at the zone, region, or provider level and strategies for architecting applications to mitigate risks from these outages.
Net flix embracingfailure re-invent2014-141113085858-conversion-gate02~Eric Principe
This document discusses Netflix's approach to embracing failure through fault injection testing. Netflix has over 50 million members across 50+ countries and streams over 1 billion hours of content per month. To ensure high availability, Netflix designs its complex distributed systems for failure by implementing exception handling, fault tolerance, fallbacks, auto-scaling, and redundancy. However, testing failures at such a large scale is challenging. Netflix developed several "monkey" tools that randomly inject failures like instance reboots or availability zone outages to validate system resilience. More advanced tools like Fault Injection Testing (FIT) allow simulating specific failure scenarios by injecting errors or latency at various points. This helps Netflix continuously validate assumptions and discover issues to further improve availability.
Riding the Stream Processing Wave (Strange loop 2019)Samarth Shetty
At LinkedIn, we run several thousands of stream processing applications which, coupled with our scale, has exposed us to some unique challenges. We will talk about the 3 kinds of applications that have made the most impact on our stream processing platform.
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...Microsoft Technet France
Attention, Session en Anglais. Attention Session en 2 parties. Ceci est la première partie. Cette session sera animée par Scott Schnoll, Senior Content Developer chez Microsoft Corp et veritable Gourou Exchange. La messagerie est un élément ultra critique du système d'information : Elle ne DOIT PAS tomber. Pour cela, Exchange 2013 intègre les toutes dernières technologies en terme de tolérance de panne et de haute disponibilité. Scott Schnoll vous expliquera la mécanique de l'intérieur ! Cette session vous donne accès à l'état de l'art sur Exchange. C'est LA session à suivre pour découvrir la mécanique de haute disponibilité d'Exchange 2013.
Speaker : Scott Schnoll (Microsoft)
Learn about z/OS Workload Management Update for z/OS V1.11 and V1.12. For more information on IBM System z, visit http://ibm.co/PNo9Cb.
Visit the official Scribd Channel of IBM India Smarter Computing at http://bit.ly/VwO86R to get access to more documents.
Comparing high availability solutions with percona xtradb cluster and percona...Marco Tusa
Percona XtraDB Cluster (PXC) is currently the most popular solution for HA in the MySQL ecosystem, and any solutions Galera-based as PXC have been the only viable option when looking for a high grade of HA using synchronous replication.
But Oracle had intensively worked on making Group Replication more solid and easy to use.
It is time to identify if Group Replication and attached solutions, like InnoDB cluster, can compete or even replace solutions based on Galera.
This presentation will focus on comparing the two solutions and how they behave when serving basic HA problems.
Attendees will be able to get a clearer understanding of which solutions will serve them better, and in which cases.
IBM MQ - High Availability and Disaster RecoveryMarkTaylorIBM
IBM MQ provides capabilities to keep data safe and businesses running in the event of failures. This includes solutions for high availability (HA) and disaster recovery (DR) whether running on-premises or in hybrid cloud environments. HA aims to keep systems running through failures while DR focuses on recovering after an HA failure. Key HA technologies in IBM MQ include queue manager clusters, queue sharing groups, multi-instance queue managers, and HA clusters. These solutions provide redundancy to prevent single points of failure and enable fast failover. DR requires replicating data to separate sites which IBM MQ supports through various backup and replication features.
This document discusses key factors of real-time distributed database systems. It defines hard and soft real-time systems and explains how concurrency control is more challenging in a distributed real-time environment. Both pessimistic and optimistic concurrency control approaches are covered. Replication strategies are also discussed, including full replication with eager vs lazy updating, and primary vs update-anywhere models. Partial replication is presented as an alternative to reduce overhead. The conclusion emphasizes that replication strategies must adapt to real-time constraints.
Architecting for the cloud elasticity securityLen Bass
Concurrency and state management are important considerations for achieving elasticity in cloud systems. There are three types of state: session state kept by clients, server-side state kept in processes, and persistent state stored externally. Server-side state makes scaling difficult, while stateless servers allow elasticity. Memcached provides a way to synchronize small amounts of in-memory state across servers to support stateless services running elastically in the cloud.
Learn strategies to maintain your database's high availability even during peak use periods. MariaDB's Field CTO Max Mether offers best practices for high availability, disaster recovery and more.
M|18 Choosing the Right High Availability Strategy for YouMariaDB plc
This document discusses MariaDB high availability strategies including replication, failover, and clustering. It defines key HA terminology and describes different replication topologies like asynchronous, semi-synchronous, and synchronous replication using Galera cluster. Use cases provided show how geographically distributed and production control systems benefit from MariaDB HA features.
Presentation was delivered in a fault tolerance class which talk about the achieving fault tolerance in databases by making use of the replication.Different commercial databases were studied and looked into the approaches they took for replication.Then based on the study an architecture was suggested for military database design using an asynchronous approach and making use of the cluster patterns.
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterContinuent
Galera Cluster vs. Continuent Tungsten Clusters
Building a Geo-Scale, Multi-Region and Highly Available MySQL Cloud Back-End
This second installment of our High Noon series of on-demand webinars is focused on Galera Cluster (including MariaDB Cluster & Percona XtraDB Cluster). It looks at some of the key characteristics of Galera Cluster and how it fares as a MySQL HA / DR / Geo-Scale solution, especially when compared to Continuent Tungsten Clustering.
Watch this webinar to learn how to do better MySQL HA / DR / Geo-Scale.
AGENDA
- Goals for the High Noon Webinar Series
- High Noon Series: Tungsten Clustering vs Others
- Galera Cluster (aka MariaDB Cluster & Percona XtraDB Cluster)
- Key Characteristics
- Certification-based Replication
- Galera Multi-Site Requirements
- Limitations Using Galera Cluster
- How to do better MySQL HA / DR / Geo-Scale?
- Galera Cluster vs Tungsten Clustering
- About Continuent & Its Solutions
PRESENTER
Matthew Lang - Customer Success Director – Americas, Continuent - has over 25 years of experience in database administration, database programming, and system architecture, including the creation of a database replication product that is still in use today. He has designed highly available, scaleable systems that have allowed startups to quickly become enterprise organizations, utilizing a variety of technologies including open source projects, virtualization and cloud.
Replication involves creating multiple copies of data across distributed systems to improve reliability, performance, and scalability. There are key issues in replicating data like where to place replicas and how to keep them consistent. Replication can be server-initiated to enhance performance or client-initiated to improve access times. Different replication schemes like full, partial, and no replication involve tradeoffs between consistency, availability, and performance.
Cinder enhancements are proposed to better support replication and other long-running volume operations using stateless snapshots. The enhancements include allowing volume drivers to report capabilities like stateless snapshots, tracking task status separately from volume status, and replicating snapshots between backends. This would enable optimizations like transferring snapshot data directly between storage controllers instead of through Cinder.
High availability and disaster recovery in IBM PureApplication SystemScott Moonen
This document discusses high availability and disaster recovery strategies for IBM PureApplication System. It begins with definitions of key terms like HA, DR, RTO, and RPO. It then outlines the various tools in PureApplication System that can be used to achieve HA and DR, such as compute node availability, block storage, storage replication, and external storage. The document provides examples of how to compose these tools to meet different HA and DR scenarios, like handling compute node failures, database updates, and site failures. It concludes with some caveats around networking considerations and middleware-specific factors.
IBM MQ High Availabillity and Disaster Recovery (2017 version)MarkTaylorIBM
This document discusses high availability and disaster recovery strategies for IBM MQ. It describes technologies like queue manager clusters, multi-instance queue managers, and HA clusters that can be used to provide high availability when failures occur across datacenters and clouds. Multi-instance queue managers provide basic failover of a queue manager between two systems without an HA cluster. HA clusters coordinate failover of resources like the queue manager, shared storage, and IP address across multiple machines for increased reliability. The IBM MQ Appliance also supports high availability between two appliances.
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Continuent
Amazon Web Services (AWS) are gaining popularity, and for good reasons. The Amazon Relational Database Service (AWS RDS) is getting a lot of attention, also for very good reasons. It is quite a compelling idea to have on-demand data services that do not require hiring DBA staff. The expectation is set that everything works like magic and will satisfy all of your enterprise database availability needs.
If you want to build high-volume, business-critical applications, possibly with geographically-distributed audiences, you really want to think twice about using RDS. Continuent customers have a large number deployments in AWS running MySQL on AWS EC2 instances and they choose to rely upon Tungsten Clustering to provide high availability (HA) and disaster recovery (DR). We also support multi-site/multi-master operations and offer true zero-downtime MySQL operations.
AGENDA
- How does RDS handle failover? (Hint: Not very quickly)
- How does RDS handle read scaling? (Hint: Not very well)
- Can you do zero-downtime maintenance with RDS? (Hint: No)
- Is RDS cheaper? (Hint: No, not really)
Maria DB Galera Cluster for High AvailabilityOSSCube
Want to understand how to set high availability solutions for MySQL using MariaDB Galera Cluster? Join this webinar, and learn from experts. During this webinar, you will also get guidance on how to implement MariaDB Galera Cluster.
This document discusses configuring and implementing a MariaDB Galera cluster for high availability on 3 Ubuntu servers. It provides steps to install MariaDB with Galera patches, configure the basic Galera settings, and start the cluster across the nodes. Key aspects covered include state transfers methods, Galera architecture, and important status variables for monitoring the cluster.
This is the Complete Information about Data Replication you need, i am focused on these topics:
What is replication?
Who use it?
Types ?
Implementation Methods?
This document summarizes migrating from MySQL replication to Galera Cluster. It describes Galera Cluster as providing synchronous multi-master replication with automatic failover. The migration procedure involves converting an existing asynchronous slave to Galera, building up the Galera Cluster to the desired size, switching readers and writers over to the cluster, and then deactivating the original asynchronous replication. Key benefits of Galera Cluster include strong consistency, high availability, and ability to join new nodes automatically.
Hochverfügbarkeit mit MariaDB Enterprise
Presented by Ralf Gebhardt at the MariaDB Roadshow Germany: 4.7.2014 in Hamburg, 8.7.2014 in Berlin and 11.7.2014 in Frankfurt.
MySQL 5.7 clustering: The developer perspectiveUlf Wendel
(Compiled from revised slides of previous presentations - skip if you know the old presentations)
A summary on clustering MySQL 5.7 with focus on the PHP clients view and the PHP driver. Which kinds on MySQL clusters are there, what are their goal, how does wich one scale, what extra work does which clustering technique put at the client and finally, how the PHP driver (PECL/mysqlnd_ms) helps you.
This document discusses challenges and solutions related to distributed relational database management systems (RDBMS). It covers topics like fallacies of distributed computing, ensuring ACID properties, replication strategies like master-slave and multi-master, partitioning techniques like sharding, and tradeoffs between consistency, availability, latency and throughput. It also provides examples of distributed RDBMS providers and how to address aggregation queries, joins, and other challenges in a distributed context.
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...Microsoft Technet France
La nouvelle version d'Exchange Server 2013 intègre une foule de nouveautés lui permettant d'être aujourd'hui le serveur de messagerie le plus sécurisé et le plus fiable sur le marché. L'expérience acquise par la gestion des solutions de messagerie Cloud par les équipes Microsoft a été directement intégrée dans cette nouvelle version du produit ce qui va vous permettre la mise en place d'un système de messagerie ultra résilient. Scott Schnoll, Principal Technical Writer dans l'équipe Exchange à Microsoft Corp va vous expliquer de manière didactique l'ensemble des mécanismes de haute disponibilité et les solutions de resilience inter sites dans les plus petits détails. Venez apprendre directement par l'expert qui a travaillé sur ces sujets chez Microsoft ! Attention, session très technique, en anglais.
Webinar Slides: Geo-Distributed MySQL Clustering Done Right!Continuent
With Multiple Active Primary MySQL Databases
Watch this on-demand webinar to learn the right way to deploy geo-distributed databases. We look at the pitfalls of deploying a single site and passive sites, and from there we show how to provide the best user experience by leveraging geo-distributed MySQL.
When considering geo-distributed MySQL database environments it is important to understand the nuances of having multiple active clusters deployed across sites and clouds. This webinar walks through the proper planning of geo-distributed MySQL for success.
Finally, you’ll learn about our best practices for multiple primary clusters, as well as failover and disaster recovery for MySQL.
AGENDA
- Why Geo-Distributed Databases
- Geo-Distributed MySQL Starts With High Performance Local Clusters
- Extend The Cluster To Multiple Datacenters/Clouds
- Best Practices For Multiple Primary Clusters
- Failover & Disaster Recovery
- Key Benefits
PRESENTER
Matthew Lang, Customer Success Director – Americas, Continuent, has over 25 years of experience in database administration, database programming, and system architecture, including the creation of a database replication product that is still in use today. He has designed highly available, scaleable systems that have allowed startups to quickly become enterprise organizations, utilizing a variety of technologies including open source projects, virtualization and cloud.
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Antonios Chatzipavlis
Azure SQL Database is a managed database service hosted in Microsoft's Azure cloud. Some key differences from SQL Server include: the service is paid by the hour based on the selected service tier; users can dynamically scale resources up or down; backups and high availability are managed by the service provider; and common administration tasks are handled by the provider rather than the user. The service offers automatic backups, point-in-time restore, and geo-restore capabilities along with built-in high availability through replication across three copies in the primary region.
This document discusses high availability architectures for MySQL and MSSQL databases. For MySQL, it describes replication, clustering and virtualization, and shared-nothing failover clusters. For MSSQL, it outlines log shipping, transactional replication, database mirroring, and failover clustering. Replication and failover clustering provide the highest availability, with automatic failover and detection, but are more complex to implement.
Amazon Aurora 클러스터를 초당 수백만 건의 쓰기 트랜잭션으로 확장하고 페타바이트 규모의 데이터를 관리할 수 있으며, 사용자 지정 애플리케이션 로직을 생성하거나 여러 데이터베이스를 관리할 필요 없이 Aurora에서 관계형 데이터베이스 워크로드를 단일 Aurora 라이터 인스턴스의 한도 이상으로 확장할 수 있는 Amazon Aurora Limitless Database를 소개합니다.
LLM powered contract compliance application which uses Advanced RAG method Self-RAG and Knowledge Graph together for the first time.
It provides highest accuracy for contract compliance recorded so far for Oil and Gas Industry.
3. 1 Year has
525,949
minutes
High Availability
HOW MANY 9s?
N x 9 downtime means:
99% → 5,259.49 minutes (~88hs)
99.9% → 525.95 minutes (~9hs)
99.99% → 52.6 minutes
99.999% → 5.3 minutes
Weekly 15 min
maintenance windows
780 minutes
or
13 hours
4. Uptime,
Downtime, 9s
• 90% -> 36.5 days/year or 72 hours/month
• 99% -> 3.65 days/year or 7.2 hours/month
• 99.9% -> 8.76 hours/year or 43.8 minutes/month
• 99.99% -> 52.56 minutes/year or 4.38 minutes/month
• 99.999% -> 5.26 minutes/year or 25.9 seconds/month
• 99.9999% -> 31.5 seconds/year or 2.59 seconds/month
Availability = uptime /
(uptime + downtime)
Availability and HIGH Availability
Source: http://en.wikipedia.org/wiki/High_availability
5. Approach to HA
Backup /
Restore
1
< 99.9%
Replication /
Automatic
failover
3
~ 99.99%
Simple
replication /
manual
failover
2
~ 99.9%
3 nodes Galera
Cluster
~ 99.999%
4 5
Other
Strategies for High Availability
6. An average of 80 percent of mission-critical application service
downtime is directly caused by people or process failures. The
other 20 percent is caused by technology failure, environmental
failure or a disaster
Gartner Research
7. High Availability Background
• High Availability isn’t always equal to long Uptime
– A system is “up” but it might not be accessible
– A system that is “down” just once, but for a long time, is NOT highly available
• High Availability rather means
– Long Mean Time Between Failures (MTBF)
– Short Mean Time To Recover (MTTR)
• High availability is:
– a system design protocol and associated implementation that ensures a certain degree of
operational continuity during a given measurement period.
8. High Availability Components
• Monitoring and Management
– Availability of the services needs to be monitored, to be able to take action when there is a
failure or even to prevent them
– A failover can be manual or automatic, but it has to be managed
• Failover or Switchover Solution
– Some mechanism to redirect traffic from the failed server or datacenter to a working one
• Data Redundancy
– For resilient services, we need to make sure that data are redounded
– Note: availability solutions do not replace backups
9. High Availability Components
High availability is a system design protocol and associated implementation that
ensures a certain degree of operational continuity during a measurement period.
For stateful services, we
need to make sure that
data is made redundant.
It is not a replace for
backups!
Data Redundancy
Some mechanism to
redirect traffic from the
failed server or
Datacenter to a working
one
Failover or Switchover
Solution
Availability of the
services needs to be
monitored, to take
action when there is a
failure or even to
prevent them
Monitoring and
Management
11. General Terms
• Single Point of Failure (SPOF)
– An element is a SPOF when its failure results in a full stop of the service as no other element
can take over (storage, WAN connection, replication channel)
– It is important to evaluate the costs for eliminating the SPOF, the likehood that it fails, the time
required to bring it into service again
• Downtime
– the period of time a service is down regardless if planned or unplanned. Planned downtime is
part of the overall availability
• Shared vs. Local Storage
– Shared storage systems like SANs can provide built-in high availability, though this comes with
equally high costs
– Not really suitable for Disaster Recover scenario on multiple Data Center
– local storage comes with low cost but we need to implement ways for replication/mirroring
12. General Terms
• Switchover
– When a manual process is used to switch from one system to a redundant or standby system in
case of a failure
• Failover
– Automatic switchover, without human intervention
• Failback
– A (often-underestimated) task to handle the recovery of a failed system and how to fail-back to
this system after recovery
14. HA Begins from Data Replication
• Replication enables data from one MariaDB server (the master) to be replicated to one or
more MariaDB servers (the slaves).
• MariaDB Replication is:
– very easy to setup
– used to scale out read workloads
– provide a first level of high availability and geographic redundancy
– offload backups and analytic jobs.
15. Replication Scheme
All nodes are masters
and applications can read
and write from any node
Synchronous Replication
The Master does not
confirm transactions to
the client application until
at least one slave has
copied the change to its
relay log, and flushed it to
disk
Semi-Syncronous
Replication
The Master does not
wait for Slave, the
master writes events to
its binary log and
slaves request them
when they are ready
Asynchronous
Replication
16. Asynchronous Replication
• MariaDB Replication is asynchronous by default.
• Slave determines how much to read and from which point in the binary log
• Slave can be behind master in reading and applying changes
• If the master crashes, transactions might not have been transmitted to any slave
• Asynchronous replication is great for read scaling as adding more replicas does not
impact replication latency
17. Asynchronous Replication-Switch Over
1. The master server is taken down or we encounter a fault by our monitoring
2. The slave server is updated to the last position in the relay log
3. The clients point at the designated slave server
4. The designated slave server becomes the master server
5. All steps are manual
Master and Slaves
ReadOnly Slaves
Master and Slaves
ReadOnly Slaves
19. Semi-synchronous Replication
• MariaDB supports semi-synchronous replication:
– the master does not confirm transactions to the client application until at least one slave has
copied the change to its relay log, and flushed it to disk.
– In semi-synchronous replication, only after the events have been written to the relay log and
flushed does the slave acknowledge receipt of a transaction's events
– Semi-synchronous is a practical solution for many cases where high availability and no data-loss
is important.
– When a commit returns successfully, it is known that the data exists in at least two places (on the
master and at least one slave).
– Semi- synchronous has a performance impact due to the additional round trip
20. MariaDB Enhanced Semi-synchronous Replication
• One or more slaves can be defined as working semi-synchronously.
• For these slaves, the master waits until the I/O thread on one or more of the semi-synch slaves
has flushed the transaction to disk.
• This ensures that all committed transactions are at least stored in the relay log of the slave.
• Standard semi-synchronous replication would commit the transaction before it gets the
acknowledge of the binlog event from a slave
21. Semi-synchronous Replication – Switch Over
• The steps for a failover are the same as when using the standard replication
• but in Step 2, a slave should be chosen among those (if many) that are be semi- synched
with the master
Master and Slaves
Semi-Sync
Slave
Async Slaves
Master and Slaves
Async Slaves
22. Semi-Sync Replication Topologies
• Semi- synchronous replication is used between master
and backup master
• Semi- sync replication has a performance impact, but the
risk for data loss is minimized.
• This topology works well when performing master
failover
– The backup master acts as a warm-standby server
– it has the highest probability of having up-to-date data if
compared to other slaves.
Semi_sync
Asynchronous
ReadOnly/
Backup Master
ReadOnly
23. MariaDB Multi-Source Replication
• It enables a slave to receive transactions from
multiple sources simultaneously.
• It can be used to backup multiple servers to a
single server, to merge table shards, and
consolidate data from multiple servers to a single
server.
Master 2Master 1 Master 3
Slave
24. Synchronous Replication (Galera)
• Galera Replication is a synchronous multi-master
replication plug-in that enables a true master-master
setup for InnoDB.
• Every component of the cluster (node) is a share
nothing server
• All nodes are masters and applications can read and
write from any node
• A minimal Galera cluster consists of 3 nodes:
– A proper cluster needs to reach a quorum (i.e. the
majority of the nodes of the cluster)
• Transactions are synchronously committed on all
nodes.
MariaDB
MariaDB
MariaDB
25. Synchronous Replication (Galera)
• PROS
– A high availability solution with synchronous
replication, failover and resynchronization
– No loss of data
– All servers have up-to-date data (no slave lag)
– Read scalability
– 'Pretty good' write scalability
– High availability across data centers
MariaDB
MariaDB
MariaDB
26. Synchronous Replication (Galera)
• CONS
– It only supports InnoDB
– The transaction rollback rate and hence the
transaction latency, can increase with the number of
the cluster nodes
– The cluster performs as its less performing note: an
overloaded master affects the performance of the
Galera cluster
MariaDB
MariaDB
MariaDB
28. MDBE
Cluster Failover
Clustered nodes cooperate
to remain in sync
With multiple master nodes,
reads and updates both scale*
Synchronous replication with
optimistic locking delivers high
availability with little overhead
Fast failover because all
nodes remains synchronizedMariaDB
MariaDB
MariaDB
Load Balancing
and Failover
Application /
App Server
29. MaxScale Use Case
MDBE Cluster
Synchronous Replication
Each application server
uses only 1 connection
MaxScale selects one node
as “master” and the other
nodes as “slaves”
If the “master” node fails,
a new one can be elected
immediately
Galera Cluster + R/W split routing
Max
Scale
30. MaxScale Use Case
Master/Slaves Async
Replication
MaxScale monitors a MariaDB Topology
Master/Slaves + R/W split routing
Max
Scale
MariaDB
32. MaxScale Use Case
Master/Slaves Async
Replication
1 . Master failure
2 . MaxScale Monitor detects the master_down
event
Master/Slaves + R/W split routing
Max
Scale
MariaDB
script
Failover
Script
master_down event
2
33. MaxScale Use Case
Master/Slaves Async
Replication
1 . Master failure
2 . MaxScale Monitor detects the master_down
event
3 . In case it is configured, MaxScale launches a
Failover Script that promotes a slave as a new
Master
Master/Slaves + R/W split routing
Max
Scale
MariaDB
script
Failover
Script
master_down event
2
Promote as master3
34. MaxScale Use Case
Master/Slaves Async
Replication
1 . Master failure
2 . MaxScale Monitor detects the master_down
event
3 . In case it is configured, MaxScale launches a
Failover Script that promotes a slave as a new
Master
Master/Slaves + R/W split routing
Max
Scale
MariaDB
script
Failover
Script
master_down event
2
Promote as master3
35. MaxScale Use Case
Master/Slaves Async
Replication
1 . Master failure
2 . MaxScale Monitor detects the master_down
event
3 . In case it is configured, MaxScale launches a
Failover Script that promotes a slave as a new
Master
4 . MaxScale monitor automatically detects new
replication topology after the switch
Master/Slaves + R/W split routing
Max
Scale
MariaDB
2
4
36. MariaDB HA: MaxScale
• Re-route traffic between
master and slave(s)
• Does not manage servers
• Failover / slave promotion
is an external process
• Implemented for Booking.com
• Part of a future MaxScale release
• All slaves are in sync,
easy to promote any slave
Read / Write Splitter
Detects Active Master
Binary Log
Server
37. HA / Scalability with MaxScale 2.1 (Sneak Peek)
Existing in MaxScale 2.0
New in MaxScale 2.1
Aurora
Cluster Monitor
Multi-master and
Failover Mode for
MySQL Monitor
Read-write
Splitting with
Master Pinning
Transaction Scaling to support user
growth and simplify applications
MariaDB Master/Slave and MariaDB Galera Cluster
– Load balancing
– Database aware dynamic query routing
– Traffic profile based routing
Replication Scaling to support
web-scale applications’ user base
Binlog Server for horizontal scaling of slaves in Master/Slave architecture
Multi-tenant database scaling to transparently
grow tenants and data volume
Schema sharding
Connection Rate Limitation