Galera is a MySQL replication technology that can simplify the design of a high availability application stack. With a true multi-master MySQL setup, an application can now read and write from any database instance without worrying about master/slave roles, data integrity, slave lag or other drawbacks of asynchronous replication. And that all sounds great until it’s time to go into production. Throw in a live migration from an existing database setup and devops life just got a bit more interesting ... So if you are in devops, then this webinar is for you! Operations is not so much about specific technologies, but about the techniques and tools you use to deploy and manage them. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider – preferably before going live. Let us guide you through 9 key tips to consider before taking Galera Cluster into production.
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format addresses some of these problems but falls short at data, user, and application scale. So what is the answer? Apache Iceberg. Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS. Watch Alex Merced, Developer Advocate at Dremio, as he describes the open architecture and performance-oriented capabilities of Apache Iceberg. You will learn: • The issues that arise when using the Hive table format at scale, and why we need a new table format • How a straightforward, elegant change in table format structure has enormous positive effects • The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it • The resulting benefits of this architectural design
MaxScale uses an asynchronous and multi-threaded architecture to route client queries to backend database servers. Each thread creates its own epoll instance to monitor file descriptors for I/O events, avoiding locking between threads. Listening sockets are added to a global epoll file descriptor that notifies threads when clients connect, allowing connections to be distributed evenly across threads. This architecture improves performance over the previous single epoll instance approach.
The document discusses Apache Kudu, an open source storage layer for Apache Hadoop that enables fast analytics on fast data. Kudu is designed to fill the gap between HDFS and HBase by providing fast analytics capabilities on fast-changing or frequently updated data. It achieves this through its scalable and fast tabular storage design that allows for both high insert/update throughput and fast scans/queries. The document provides an overview of Kudu's architecture and capabilities, examples of how to use its NoSQL and SQL APIs, and real-world use cases like enabling low-latency analytics pipelines for companies like Xiaomi.
In this tutorial, we cover the different deployment possibilities of the MySQL architecture depending on the business requirements for the data. We also deploy some architecture and see how to evolve to the next one. The tutorial covers the new MySQL Solutions like InnoDB ReplicaSet, InnoDB Cluster, and InnoDB ClusterSet.
In the first part of Galera Cluster best practices series, we will discuss the following topics: * ongoing monitoring of the cluster and detection of bottlenecks; * fine-tuning the configuration based on the actual database workload; * selecting the optimal State Snapshot Transfer (SST) method; * backup strategies (video:http://galeracluster.com/videos/2159/)
The Delta Architecture pattern has made the lives of data engineers much simpler, but what about improving query performance for data analysts? What are some common places to look at for tuning query performance? In this session we will cover some common techniques to apply to our delta tables to make them perform better for data analysts queries. We will look at a few examples of how you can analyze a query, and determine what to focus on to deliver better performance results.
The document discusses running MariaDB across multiple data centers. It begins by outlining the need for multi-datacenter database architectures to provide high availability, disaster recovery, and continuous operation. It then describes topology choices for different use cases, including traditional disaster recovery, geo-synchronous distributed architectures, and how technologies like MariaDB Master/Slave and Galera Cluster work. The rest of the document discusses answering key questions when designing a multi-datacenter topology, trade-offs to consider, architecture technologies, and pros and cons of different approaches.
Introduce SeaweedFS for beginners. SeaweedFS implements an object storage layer modeled after Facebook Haystack paper, a filer layer, supports S3 APIs, compatible with Hadoop file system.
Automatic Storage Management (ASM) provides a simple way to manage Oracle database files across disk storage. ASM uses disk groups and metadata to distribute data extents across disks for redundancy. Rebalancing operations redistribute extents to maintain even distribution as disks are added or removed, and the estimated time for rebalancing can be found in V$ASM_OPERATION. ASM supports different redundancy levels including external, normal, and high redundancy.
We will show how Galera Cluster executes DDLs in a safe, consistent manner across all the nodes in the cluster, and the differences with stand-alone MySQL. We will discuss how to prepare for and successfully carry out a schema upgrade and the considerations that need to be taken into account during the process.
Presentation on Apache Iceberg for the February 2021 St. Louis Big Data IDEA. Apache Iceberg is an alternative database platform that works with Hive and Spark.
The document discusses performance aspects of etcd and Raft consensus algorithm. It begins with an introduction to state machine replication techniques and why they are important for building highly available and consistent distributed systems. It then provides some tips for managing etcd clusters and developing applications using the Raft consensus package, including how compaction can impact performance and reducing execution time of state machine operations. It also discusses an idea for optimization based on group commit.
Automated, Non-Stop MySQL Operations and Failover discusses automating master failover in MySQL to minimize downtime. The goal is to have no single point of failure by automatically promoting a slave as the new master when the master goes down. This is challenging due to asynchronous replication and the possibility that not all slaves have received the same binary log events from the crashed master. Differential relay log events must be identified and applied to bring all slaves to an eventually consistent state.
MySQL performance can be improved by tuning queries, server options, and hardware. Traditionally it was an area of responsibility for three different roles: Development, DBA, and System Administrators. Now DevOps handle these all. But there is a gap. Knowledge gained by MySQL DBAs after years or focusing on a single product is hard to gain when you focus on more than one. This is why I am doing this session. I will show a minimal but most effective set of options to improve MySQL performance. For illustrations, I will use real user stories gained from my Support experience and Percona Kubernetes operators for PXC and MySQL.
Galera replication works by synchronizing data across multiple database servers so that any server can accept writes and all servers instantly reflect the new data. It uses global transaction IDs and group communication to replicate write sets in parallel to all nodes, ensuring consistency. Any node can join the cluster as long as it knows the cluster name and can find an active member to bootstrap from.
The document provides an overview of high availability and configuration management options for ProxySQL. It discusses deploying ProxySQL locally on application servers, in a dedicated layer, or using both approaches. When deploying in a dedicated layer, options for high availability include keepalived, load balancers, Consul, and Kubernetes. Configuration can be managed through tools like Ansible, Puppet, or by loading SQL files. ProxySQL Cluster enables syncing configuration across nodes.
This presentation is aim to give an initial understanding of how MySQL/Galera works, and some advice.
MySQL replication is a widely known and proven solution to build scalable clusters of databases. It is very easy to deploy, even easier with GTID. Easy deployment doesn't mean you don't need knowledge and skills to operate it correctly. If you'd like to learn what is needed to build a stable environment using MySQL replication, this webinar is for you. AGENDA 1. Sanity checks before migrating into MySQL replication setup 2. Operating system configuration 3. Replication 4. Backup 5. Provisioning 6. Performance 7. Schema changes 8. Reporting 9. Disaster recovery SPEAKER Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.
This document provides an overview and agenda for a webinar on managing and monitoring MySQL clusters using ClusterControl by Severalnines. The webinar host is introduced and instructions are provided for asking questions. The webinar will cover topics such as operating system configuration, backup strategies, replication, query performance, schema changes, security, reporting, and disaster recovery. Case studies and customers are also briefly mentioned.
RDS for MySQL provides a fully managed MySQL database in the cloud. It handles backups, provisioning, patching, and failover automatically. While convenient, RDS has some limitations like inability to choose database versions, limited control over maintenance windows, and downtime required for migrations or upgrades. Careful planning is needed for workloads with high availability or latency requirements. Overall RDS reduces DBA overhead but still requires expertise for design, tuning, and automation.
We continuously see great interest in MySQL load balancing and HAProxy, so we thought it was about time we organised a live webinar on the topic! Here is the replay of that webinar! As most of you will know, database clusters and load balancing go hand in hand. Once your data is distributed and replicated across multiple database nodes, a load balancing mechanism helps distribute database requests, and gives applications a single database endpoint to connect to. Instance failures or maintenance operations like node additions/removals, reconfigurations or version upgrades can be masked behind a load balancer. This provides an efficient way of isolating changes in the database layer from the rest of the infrastructure. In this webinar, we cover the concepts around the popular open-source HAProxy load balancer, and show you how to use it with your SQL-based database clusters. We also discuss HA strategies for HAProxy with Keepalived and Virtual IP. Agenda: * What is HAProxy? * SQL Load balancing for MySQL * Failure detection using MySQL health checks * High Availability with Keepalived and Virtual IP * Use cases: MySQL Cluster, Galera Cluster and MySQL Replication * Alternative methods: Database drivers with inbuilt cluster support, MySQL proxy, MaxScale, ProxySQL
This document discusses backup solutions for MySQL databases. It begins by defining logical and physical backups. For logical backups, it recommends mysqldump and mydumper/myloader. For physical backups, it recommends xtrabackup and snapshots. It provides details on using these tools and best practices like regular testing of backups. It gives examples of setups for on-premises, Amazon Web Services, and using Cluster Control for managing backups.
You’re running MySQL as backend database, how do you tune it to make best use of the hardware? How do you optimize the Operating System? How do you best configure MySQL for a specific database workload? Do these questions sound familiar to you? Maybe you’re having to deal with that type of situation yourself? In this webinar, we’ve discussed some of the settings that are most often tweaked and which can bring you significant improvement in the performance of your MySQL database. We also covered some of the variables which are frequently modified even though they should not. Performance tuning is not easy, but you can go a surprisingly long way with a few basic guidelines. AGENDA Database tuning - the what and why Principles of the tuning process Tuning the Operating System configuration Tuning the MySQL configuration Useful tools pt-summary pt-mysql-summary What to avoid when tuning OS and MySQL configuration SPEAKER Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. This webinar is based on our popular blog series ‘Become a MySQL DBA’.
Verisure migrated their data warehouse from using Tungsten Replicator to native multi-source replication in MySQL 5.7 to simplify operations. They loaded data from production shards into the new data warehouse setup using XtraBackup backups and improved replication capacity with MySQL's parallel replication features. Some issues were encountered with replication lag reporting and crashes during the upgrade but most were resolved. Monitoring and management tools also required updates to support the new multi-source replication configuration.
In this webinar we cover one of the most basic, but essential tasks of the DBA: minor and major database upgrades in production environments. AGENDA What types of upgrades are there? How do I best prepare for the upgrades? Best practices for: Minor version upgrades - MySQL & Galera Major version upgrades - MySQL & Galera SPEAKER Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. This webinar builds upon recent blog posts and related webinar series by Krzysztof on how to become a MySQL DBA. To view all the blogs of the ‘Become a MySQL DBA’ series visit: http://www.severalnines.com/blog-categories/db-ops
There are many many approaches to MySQL high availability - from traditional, loosely-coupled database setups based on asynchronous replication to more modern, tightly-coupled architectures based on synchronous replication. These offer varying degrees of protection, and DBAs almost always have to choose a trade-off between high-availability and cost. In this webinar, we looked at some of the most widely used HA alternatives in the MySQL world and discuss their pros and cons. AGENDA - HA - what is it? - Caching layer - HA solutions • MySQL Replication • MySQL Cluster • Galera Cluster • Hybrid Replication - Proxy layer • HAProxy • MaxScale • Elastic Load Balancer (AWS) - Common issues • Split brain scenarios • GTID-based failover and Errant Transactions
This document discusses redundancy models for MySQL, MariaDB, MongoDB and TokuMX databases. It covers asynchronous replication used in MySQL replication and MongoDB/TokuMX compared to synchronous replication in Galera and NDB Cluster. The document then zooms into recovery procedures for Galera clusters and discusses how to prevent split-brain situations in multi-datacenter setups through the use of additional nodes and assigning node weights.