Nine DevOps Tips For Going Into 
Production With Galera Cluster 
November 11, 2014 
Johan Andersson & Jean-Jérôme Schmidt 
Severalnines &
! I'm Jean-Jérôme and I’ll be your host for today's 
! Feel free to ask any questions in the Questions 
section of this application or via the Chat box 
! You can also contact me directly via the chat 
box or via email: during or 
after the webinar 
Copyright Severalnines AB
In a nutshell 
Copyright 2012 Severalnines AB 
Manage Scale 
Deploy Monitor
Supported Databases 
! MariaDB Cluster 
! MySQL Galera Cluster 
! Percona XtraDB Cluster 
! MySQL Cluster (NDB) 
! MySQL Replication 5.6 
! Standalone MySQL/MariaDB 
! MongoDB Sharded Cluster 
! MongoDB Replica Set 
! TokuMX Cluster 
Copyright Severalnines AB 

Copyright Severalnines AB
! 101 Sanity Check 
! Operating System 
! Backup Strategies 
! Galera Recovery 
! Query Performance 
! Schema Changes 
! Security / Encryption 
! Reporting 
! Protecting from Disasters 
Copyright Severalnines AB
#1 – 101 Sanity Check (1/4) 
! Ensure ALL tables are InnoDB or XtraDB 
! Innodb supports FULLTEXT indexes in MySQL5.6 
! Ensure ALL tables have a PRIMARY KEY 
! If no PRIMARY KEY is defined you can do: 
! Ensure you have NO unbound queries 
! E.g UPDATE table SET x=x+1 (and there are many rows) 
! Update/delete in smaller batches (e.g 1000 records). 
! Better support for huge (unbound) queries are in the pipe 
Copyright Severalnines AB
#1 – 101 Sanity Check (2/4) 
! Ensure that the application can tolerate non-sequential 
auto increments. 
! Redirect deadlock prone update queries on hot tables 
and rows to one of the Galera nodes: 
! E.g UPDATE counter_tbl SET counter = counter +1; 
! Use wsrep_sst_method=xtrabackup-v2 
Copyright Severalnines AB

#1 – 101 Sanity Check (3/4) 
! WAN environment? 
! [Note] WSREP: (0b066c90-d4fb-11e1-0800-96b2cf43aaf6, 'tcp://') turning message relay requesting on, nonlive 
peers: tcp:// 
[Note] WSREP: (0b066c90-d4fb-11e1-0800-96b2cf43aaf6, 'tcp://') reconnecting to df4e387f-d4e2- 
11e1-0800-2e6080299165 (tcp://, attempt 0 
! Increase timeouts 
! wsrep_provider_options=‘evs.keepalive_period=PT3S; 
! This will relax how fast a node will be evicted from the cluster. 
! Usually default values are good if networks with a ping time of 
<10-15 ms. 
Copyright Severalnines AB
#1 – 101 Sanity Check (4/4) 
! There is no reason to use any 5.5 based MySQL variants 
! Use MySQL 5.6 / MariaDB 10.x 
! Lots of bug fixes 
! Performance optimizations 
! Write intensive workloads 
! Query optimizer enhancements 
! A good foundation for later upgrading to 5.7 based 
! Use Galera Version 3.x series 
Copyright Severalnines AB
#2 – Operating System (1/2) 
! Swapping 
! echo “1” > /proc/sys/vm/swappiness 
! NUMA on Multi-socket 
! Can lead to contention and strange lock ups 
! Is it enabled: 
dmesg | grep –i numa 
! Grub boot option ”numa=off” 
! … and other possibilities 
! Filesystem 
! Reduce writes by mounting with 
! noatime 
Copyright Severalnines AB
#2 – Operating System (2/2) 
! In Virtualized environments it is easy to over-commit 
resources on a single Host. 
! Keep track on the Host hosting the VMs 
! Is it heavily loaded? 
! CPU Steal (check on the VMs)? 
! Is it swapping? 
Copyright Severalnines AB

#3 – Backup 
! Percona XtraBackup 
! Online consistent backup 
! Full and Incremental backups 
! Possible to backup databases and tables when 
innodb_file_per_table is used. 
! Parallelism & compression & encryption 
! mysqldump 
! Use with –single-transaction, consistent / online for innodb tables 
! May require tweaking of innodb_old_blocks_time and 
innodb_old_blocks_pct (default values in 5.6 are quite good). 
! S3 / Glazier or Swift can be used for offline/offsite storage 
Copyright Severalnines AB
Copyright Severalnines AB 
#4 – Galera Recovery (IST) (1/3) 
! IST (Incremental State Transfer) is faster than SST (Snapshot 
State Transfer). 
! Each Galera node has a cache, gcache. 
! Stores committed write sets 
! Circular buffer 
! If a node is down (crash, maint window) and then 
becomes a JOINER: 
! Send ID of last applied write set to the DONOR 
! DONOR checks if it can send the next events from the 
! Yes == IST (fast) 
! No == SST (slow). E.g 3TB of data is no fun to SST.
#4 – Galera Recovery (IST) (2/3) 
! Dimension the gcache, example to handle a 
maintenance window of 6 hours: 
! Writes to cluster per second: 1MB/s 
! Maintenance window (seconds) = 6 hours *60*60 = 21600s 
! gcache size = 1 MB/s x 21600 s = 21GB 
! 1.5x or 2x the value to have margins: 
! gcache.size=42G 
! wsrep_provider_options=‘gcache.size=42G;…’ 
Copyright Severalnines AB
Copyright Severalnines AB 
#4 – Galera Recovery (IST) (3/3) 
! How much do you write to the 
Galera Cluster? 
! Look at the sum of 
and get the rate between 
two points in time. 
! Here we can see that a node 
109575 + 33758 bytes / second 
= ~140KB/s 

Copyright Severalnines AB 
#4 – Galera Recovery (SST) 
! Two pitfalls to be avoided with SSTs: 
! wsrep_sst_method=rsync 
! you may have to change in /usr/bin/wsrep_sst_rsync: 
! timeout = 300 ! 3600 (or bigger) 
! Else the rsync daemons may timeout when initializing the SST. 
! wsrep_sst_method=xtrabackup[-v2] 
! Uses mysql tmpdir by default. 
! If tmpdir is too small SST may fail on the donor. The transaction 
log simply does not fit. 
! You can set in my.cnf: 
! How big do I need tmpdir to be? [KB writes to node ] x [ backup 
time ]. Similar to the gcache.
#5 – Query Performance (1/5) 
! A number of things to watch out for: 
! Badly written queries or missing indexes 
! DDL locking many record (BEGIN; SELECT * FROM t1 FOR UPDATE; 
… ) 
! DDL updating/deleting many records in one chunk 
! wsrep_max_ws_rows/wsrep_max_ws_size sets upper limits 
! Update/delete “small” batches of 1000-10000 records. Do 
not update 100000 records. 
! Deadlocks and deadlock prone code 
! E.g running two mysqldumps at the same time 
! Updating the very same record in a very hot table from 
multiple threads on multiple hosts 
! Use your favorite tool to detect the problems 
Copyright Severalnines AB
#5 – Query Performance (2/5) 
! When Performance grinds to a halt you want to know! 
Copyright Severalnines AB
#5 – Query Performance (3/5) 
! You may want to have an Alarm notification (in 1.2.9) 
Copyright Severalnines AB

#5 – Query Performance (4/5) 
! If a dead-lock happens, you want to tell your developers 
(in 1.2.9) 
Copyright Severalnines AB
#5 – Query Performance (5/5) 
! And see if Galera is clogged up 
Copyright Severalnines AB
#6 – Schema Changes (1/4) 
! Consider an upgrade from schema version V1 to version V2 
! There are two principal types of schema changes that can 
be performed: 
! Compatible 
! Application(s) will still continue with V1 
! Upgrade schema first, then applications 
! Incompatible 
! Application(s) cannot use V2 
! Must upgrade applications first to support V2, and upgrade 
schema to V2 
Copyright Severalnines AB
#6 – Schema Changes (2/4) 
! Galera supports multiple ways for upgrading schema 
from V1 to V2 
! Total Order Isolation (TOI) 
! wsrep_osu_method=TOI 
! Rolling Schema Upgrade (RSU) 
! wsrep_osu_method=RSU 
! Desynching nodes (not covered here),but check out 
Copyright Severalnines AB

#6 – Schema Changes (3/4) 
! Total Order Isolation (TOI) 
! Default method 
! Executed in the same order wrt to other transactions on all 
Galera nodes. 
! Cluster behaves like a single mysql server 
! Ok for non-copying ALTER TABLE or tiny seldomly used tables 
(100s of records) or if application traffic is disabled. 
! Not ok for copying ALTERs, since table is LOCKED. 
! May wreak havoc 
Copyright Severalnines AB
#6 – Schema Changes (4/4) 
! Rolling Schema Upgrade (RSU) 
! DDL is not replicated, executed on one node at a time 
! Executed on one node at a time: 
! node1> SET GLOBAL wsrep_OSU_method=RSU; 
node1> ALTER TABLE …. 
// check that node1 is SYNCED 
node2> SET GLOBAL wsrep_OSU_method=RSU; 
node2> ALTER TABLE …. 
! The change MUST be a compatible schema change. 
! E.g ALTER TABLE .. DROP COLUMN will wreak havoc. 
Copyright Severalnines AB
#7 – Security / Encryption (1/2) 
! Encrypt replication links between Galera nodes. 
Especially in public cloud environments / WAN setups. 
! Create a certificate and a key 
! wsrep_provider_options=’socket.ssl_cert=galera_rep.crt; 
socket.ssl_key=galera_rep.key;<rest of wsrep provider 
! Requires a complete stop of all nodes. 
! Don’t forget to set this option for the garbd (arbitrator)! 
Copyright Severalnines AB
#7 – Security / Encryption (2/2) 
! Nothing to do with Galera but… 
! Encrypt application links between Galera nodes and 
applications, especially in public/untrusted environments: 
! Dense howto: 
Copyright Severalnines AB

#8 - Reporting 
! Try to separate OLTP and OLAP if possible 
! Run reports off an async slave or dedicated node 
! Remember: huge queries eat CPU, RAM and DISK. 
! Galera is not faster than its slowest node. 
! Watch out for reports with side effects 
! Large updates writing back? 
! Consider using Amazon Redshift (Data Warehouse) 
! Upload CSV files for processing. 
Copyright Severalnines AB
#9 – Protecting from Disasters (1/5) 
! Eventually a disaster will happen 
! Software bugs 
! Network / router upgrades 
! AZ down 
! Schema / software / hardware upgrade going wrong 
! Too many connections 
Copyright Severalnines AB
#9 – Protecting from Disasters (1/5) 
! One way of protecting from Cluster failures is to use an 
asynchronous slave replicating from the Galera Cluster. 
! If the Cluster would fail, the asynchronous slave could 
take over and handle the application work load until the 
cluster error has been resolved. 
Copyright Severalnines AB 
Galera Cluster in DC1 Async Slave DC2 
#9 – Protecting from Disasters (2/5) 
! Asynchronously replicated slave, benefits: 
! + Decoupled from the Cluster 
! - Data loss is possible (minimize with semi-sync replication, but write 
performance will suffer) 
! Setup one or more Galera nodes to be Replication Masters (RM) 
and connect the Replication Slave (RS) 
Copyright Severalnines AB 
Galera Cluster in DC1 
Async Slave DC2 
Semi sync is optional 

#9 – Protecting from Disasters (3/5) 
! Using GTIDs (available in MySQL 5.6 and MariaDB 10.0) 
allows for easy fail-over from RM2 to RM1: 
Copyright Severalnines AB 
Galera Cluster in DC1 
Async Slave DC2 
Semi sync is optional 
#9 – Protecting from Disasters (4/5) 
! Prepare each Galera Replication Master (MySQL 5.6 style): 
server_id=X # must be a unique number 
! Prepare each Slave (MySQL 5.6 style): 
server_id=Y ## must be a unique number 
Copyright Severalnines AB
Copyright Severalnines AB 
#9 – Protecting from Disasters (5/5) 
! Enabling the binlog (log_slave_updates and log_bin) on 
the Galera nodes allows you to do Point In Time 
! Restore last backup (must be newer than 
expire_logs_days ) 
! Use the binary logs to roll forward until last transaction.
#9 – Protecting from Disasters (1/5) 
! A common problem is overload situations, which can 
originate from: 
! Website is loading slow, user reload, creating more and more 
! Eventually the MySQL server runs out of connections 
! 5.6 only scales to 40 cores anyways 
! A good way of alleviating this is to use a Proxy, e.g 
Copyright Severalnines AB

Become a MySQL DBA: performing live database upgrades - webinar slides
Become a MySQL DBA: performing live database upgrades - webinar slidesBecome a MySQL DBA: performing live database upgrades - webinar slides
Become a MySQL DBA: performing live database upgrades - webinar slides

In this webinar we cover one of the most basic, but essential tasks of the DBA: minor and major database upgrades in production environments. AGENDA What types of upgrades are there? How do I best prepare for the upgrades? Best practices for: Minor version upgrades - MySQL & Galera Major version upgrades - MySQL & Galera SPEAKER Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. This webinar builds upon recent blog posts and related webinar series by Krzysztof on how to become a MySQL DBA. To view all the blogs of the ‘Become a MySQL DBA’ series visit:

mysql clusterwebinargalera cluster
Become a MySQL DBA - webinar series - slides: Which High Availability solution?
Become a MySQL DBA - webinar series - slides: Which High Availability solution?Become a MySQL DBA - webinar series - slides: Which High Availability solution?
Become a MySQL DBA - webinar series - slides: Which High Availability solution?

There are many many approaches to MySQL high availability - from traditional, loosely-coupled database setups based on asynchronous replication to more modern, tightly-coupled architectures based on synchronous replication. These offer varying degrees of protection, and DBAs almost always have to choose a trade-off between high-availability and cost. In this webinar, we looked at some of the most widely used HA alternatives in the MySQL world and discuss their pros and cons. AGENDA - HA - what is it? - Caching layer - HA solutions • MySQL Replication • MySQL Cluster • Galera Cluster • Hybrid Replication - Proxy layer • HAProxy • MaxScale • Elastic Load Balancer (AWS) - Common issues • Split brain scenarios • GTID-based failover and Errant Transactions

Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...
Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...
Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...

This document discusses redundancy models for MySQL, MariaDB, MongoDB and TokuMX databases. It covers asynchronous replication used in MySQL replication and MongoDB/TokuMX compared to synchronous replication in Galera and NDB Cluster. The document then zooms into recovery procedures for Galera clusters and discusses how to prevent split-brain situations in multi-datacenter setups through the use of additional nodes and assigning node weights.

mysqlmysql clustermongodb
#9 – Protecting from Disasters (3/5) 
Copyright Severalnines AB 
Galera Cluster 
Limit the # of 
HAProxy queues 
Confidential 38
Thank You! 
! Cluster Configurator 
! ClusterControl 
! Severalnines Blog 
! Contact: 

