9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides

Nine DevOps Tips For Going Into
Production With Galera Cluster
Confidential
November 11, 2014
Johan Andersson & Jean-Jérôme Schmidt
Severalnines
johan@severalnines.com & jj@severalnines.com

Confidential
Logistics
! I'm Jean-Jérôme and I’ll be your host for today's
webinar
! Feel free to ask any questions in the Questions
section of this application or via the Chat box
! You can also contact me directly via the chat
box or via email: jj@severalnines.com during or
after the webinar
2
Copyright Severalnines AB

Confidential
ClusterControl
In a nutshell
3
Copyright 2012 Severalnines AB
Manage Scale
Deploy Monitor

Supported Databases
Confidential
SQL
! MariaDB Cluster
! MySQL Galera Cluster
(Codership)
! Percona XtraDB Cluster
! MySQL Cluster (NDB)
! MySQL Replication 5.6
! Standalone MySQL/MariaDB
NoSQL
! MongoDB Sharded Cluster
! MongoDB Replica Set
! TokuMX Cluster
4

Confidential
Customers
5

Confidential
Agenda
! 101 Sanity Check
! Operating System
! Backup Strategies
! Galera Recovery
! Query Performance
! Schema Changes
! Security / Encryption
! Reporting
! Protecting from Disasters
6

#1 – 101 Sanity Check (1/4)
! Ensure ALL tables are InnoDB or XtraDB
! Innodb supports FULLTEXT indexes in MySQL5.6
! Ensure ALL tables have a PRIMARY KEY
! If no PRIMARY KEY is defined you can do:
ALTER TABLE table ADD COLUMN pkid BIGINT
AUTO_INCREMENT PRIMARY KEY;
! Ensure you have NO unbound queries
! E.g UPDATE table SET x=x+1 (and there are many rows)
! Update/delete in smaller batches (e.g 1000 records).
! Better support for huge (unbound) queries are in the pipe
Confidential
7

! Ensure that the application can tolerate non-sequential
Confidential
auto increments.
! Redirect deadlock prone update queries on hot tables
and rows to one of the Galera nodes:
! E.g UPDATE counter_tbl SET counter = counter +1;
! http://www.severalnines.com/blog/avoiding-deadlocks-galera-
set-haproxy-single-node-writes-and-multi-node-reads
! Use wsrep_sst_method=xtrabackup-v2
8

Confidential
! WAN environment?
! [Note] WSREP: (0b066c90-d4fb-11e1-0800-96b2cf43aaf6, 'tcp://
0.0.0.0:4567') turning message relay requesting on, nonlive
peers: tcp://10.0.1.2:4567
[Note] WSREP: (0b066c90-d4fb-11e1-0800-96b2cf43aaf6, 'tcp://
0.0.0.0:4567') reconnecting to df4e387f-d4e2-
11e1-0800-2e6080299165 (tcp://10.0.1.2:4567), attempt 0
! Increase timeouts
! wsrep_provider_options=‘evs.keepalive_period=PT3S;
evs.inactive_check_period=PT10S;
evs.suspect_timeout=PT30S;
evs.inactive_timeout=PT1M;
evs.install_timeout=PT1M;
evs.send_window=1024;
evs.user_send_window=512’;
! This will relax how fast a node will be evicted from the cluster.
! Usually default values are good if networks with a ping time of
<10-15 ms.
9

! There is no reason to use any 5.5 based MySQL variants
! Use MySQL 5.6 / MariaDB 10.x
! Lots of bug fixes
! Performance optimizations
! Write intensive workloads
! Query optimizer enhancements
! A good foundation for later upgrading to 5.7 based
Confidential
MySQL
! Use Galera Version 3.x series
10

#2 – Operating System (1/2)
Confidential
! Swapping
! echo “1” > /proc/sys/vm/swappiness
! NUMA on Multi-socket
! Can lead to contention and strange lock ups
! Is it enabled:
dmesg | grep –i numa
! Grub boot option ”numa=off”
! … and other possibilities
! Filesystem
! Reduce writes by mounting with
! noatime
11

#2 – Operating System (2/2)
! In Virtualized environments it is easy to over-commit
resources on a single Host.
! Keep track on the Host hosting the VMs
! Is it heavily loaded?
! CPU Steal (check on the VMs)?
! Is it swapping?
Confidential
12

Confidential
#3 – Backup
! Percona XtraBackup
! Online consistent backup
! Full and Incremental backups
! Possible to backup databases and tables when
innodb_file_per_table is used.
! Parallelism & compression & encryption
! mysqldump
! Use with –single-transaction, consistent / online for innodb tables
! May require tweaking of innodb_old_blocks_time and
innodb_old_blocks_pct (default values in 5.6 are quite good).
! S3 / Glazier or Swift can be used for offline/offsite storage
13

#4 – Galera Recovery (IST) (1/3)
! IST (Incremental State Transfer) is faster than SST (Snapshot
Confidential
State Transfer).
! Each Galera node has a cache, gcache.
! Stores committed write sets
! Circular buffer
! If a node is down (crash, maint window) and then
becomes a JOINER:
! Send ID of last applied write set to the DONOR
! DONOR checks if it can send the next events from the
gcache.
! Yes == IST (fast)
! No == SST (slow). E.g 3TB of data is no fun to SST.

! Dimension the gcache, example to handle a
maintenance window of 6 hours:
! Writes to cluster per second: 1MB/s
! Maintenance window (seconds) = 6 hours *60*60 = 21600s
! gcache size = 1 MB/s x 21600 s = 21GB
! 1.5x or 2x the value to have margins:
Confidential
! gcache.size=42G
! wsrep_provider_options=‘gcache.size=42G;…’
15

Confidential
! How much do you write to the
Galera Cluster?
! Look at the sum of
wsrep_replicated_bytes
and
wsrep_received_bytes
and get the rate between
two points in time.
! Here we can see that a node
handles:
109575 + 33758 bytes / second
= ~140KB/s
16

#4 – Galera Recovery (SST)
! Two pitfalls to be avoided with SSTs:
! wsrep_sst_method=rsync
! you may have to change in /usr/bin/wsrep_sst_rsync:
! timeout = 300 ! 3600 (or bigger)
! Else the rsync daemons may timeout when initializing the SST.
! wsrep_sst_method=xtrabackup[-v2]
! Uses mysql tmpdir by default.
! If tmpdir is too small SST may fail on the donor. The transaction
Confidential
log simply does not fit.
! You can set in my.cnf:
[xtrabackup]
tmpdir=/a/bigger/partition
! How big do I need tmpdir to be? [KB writes to node ] x [ backup
time ]. Similar to the gcache.

#5 – Query Performance (1/5)
! A number of things to watch out for:
! Badly written queries or missing indexes
! DDL locking many record (BEGIN; SELECT * FROM t1 FOR UPDATE;
Confidential
… )
! DDL updating/deleting many records in one chunk
! wsrep_max_ws_rows/wsrep_max_ws_size sets upper limits
! Update/delete “small” batches of 1000-10000 records. Do
not update 100000 records.
! Deadlocks and deadlock prone code
! E.g running two mysqldumps at the same time
! Updating the very same record in a very hot table from
multiple threads on multiple hosts
! Use your favorite tool to detect the problems
18

! When Performance grinds to a halt you want to know!
Confidential
19

! You may want to have an Alarm notification (in 1.2.9)
Confidential
20

! If a dead-lock happens, you want to tell your developers
Confidential
(in 1.2.9)
21

! And see if Galera is clogged up
Confidential
22

#6 – Schema Changes (1/4)
! Consider an upgrade from schema version V1 to version V2
! There are two principal types of schema changes that can
Confidential
be performed:
! Compatible
! E.g ALTER TABLE .. ADD COLUMN … , CREATE INDEX ..
! Application(s) will still continue with V1
! Upgrade schema first, then applications
! Incompatible
! E.g ALTER TABLE .. DROP COLUMN …
! Application(s) cannot use V2
! Must upgrade applications first to support V2, and upgrade
schema to V2
23

! Galera supports multiple ways for upgrading schema
Confidential
from V1 to V2
! Total Order Isolation (TOI)
! wsrep_osu_method=TOI
! Rolling Schema Upgrade (RSU)
! wsrep_osu_method=RSU
! Desynching nodes (not covered here),but check out
http://www.severalnines.com/blog/webinar-replay-slides-galera-
cluster-best-practices-zero-downtime-schema-changes
24

! Total Order Isolation (TOI)
Confidential
! Default method
! Executed in the same order wrt to other transactions on all
Galera nodes.
! Cluster behaves like a single mysql server
! Ok for non-copying ALTER TABLE or tiny seldomly used tables
(100s of records) or if application traffic is disabled.
! ALTER TABLE … ADD INDEX.. / CREATE INDEX..
! Not ok for copying ALTERs, since table is LOCKED.
! May wreak havoc
25

! Rolling Schema Upgrade (RSU)
! DDL is not replicated, executed on one node at a time
! Executed on one node at a time:
! node1> SET GLOBAL wsrep_OSU_method=RSU;
node1> ALTER TABLE ….
// check that node1 is SYNCED
node2> SET GLOBAL wsrep_OSU_method=RSU;
node2> ALTER TABLE ….
! The change MUST be a compatible schema change.
! E.g ALTER TABLE .. DROP COLUMN will wreak havoc.
Confidential
26

#7 – Security / Encryption (1/2)
! Encrypt replication links between Galera nodes.
Especially in public cloud environments / WAN setups.
! Create a certificate and a key
http://www.severalnines.com/blog/performance-impact-encrypted-replication-
galera-cluster-mysql
! wsrep_provider_options=’socket.ssl_cert=galera_rep.crt;
socket.ssl_key=galera_rep.key;<rest of wsrep provider
options>’
! Requires a complete stop of all nodes.
! Don’t forget to set this option for the garbd (arbitrator)!
Confidential
27

#7 – Security / Encryption (2/2)
! Nothing to do with Galera but…
! Encrypt application links between Galera nodes and
applications, especially in public/untrusted environments:
! Dense howto:
http://www.vmadmin.co.uk/linux/44-redhat/145-
linuxmysqlencryption
Confidential
28

Confidential
#8 - Reporting
! Try to separate OLTP and OLAP if possible
! Run reports off an async slave or dedicated node
! Remember: huge queries eat CPU, RAM and DISK.
! Galera is not faster than its slowest node.
! Watch out for reports with side effects
! Large updates writing back?
! Consider using Amazon Redshift (Data Warehouse)
! Upload CSV files for processing.
! http://www.severalnines.com/blog/data-warehouse-cloud-how-
upload-mysql-data-amazon-redshift-reporting-and-analytics
29

#9 – Protecting from Disasters (1/5)
! Eventually a disaster will happen
! Software bugs
! Network / router upgrades
! AZ down
! Schema / software / hardware upgrade going wrong
! Too many connections
Confidential
30

! One way of protecting from Cluster failures is to use an
asynchronous slave replicating from the Galera Cluster.
! If the Cluster would fail, the asynchronous slave could
take over and handle the application work load until the
cluster error has been resolved.
Confidential
31
Galera Cluster in DC1 Async Slave DC2
READ ONLY!

! Asynchronously replicated slave, benefits:
! + Decoupled from the Cluster
! - Data loss is possible (minimize with semi-sync replication, but write
Confidential
performance will suffer)
! Setup one or more Galera nodes to be Replication Masters (RM)
and connect the Replication Slave (RS)
32
Galera Cluster in DC1
Async Slave DC2
RM1
RM2
Asynchronous
Replication,
Semi sync is optional
RS1
READ ONLY!

! Using GTIDs (available in MySQL 5.6 and MariaDB 10.0)
allows for easy fail-over from RM2 to RM1:
! slave> CHANGE MASTER TO MASTER_HOST=’RM1’,
MASTER_AUTO_POSITION=1;
START SLAVE;
Confidential
33
Galera Cluster in DC1
Async Slave DC2
RM1
RM2
Asynchronous
Replication,
Semi sync is optional
RS1

! Prepare each Galera Replication Master (MySQL 5.6 style):
gtid_mode=ON
enforce_gtid_consistency=1
log_bin=binlog
log_slave_updates=1
expire_logs_days=7
#loose_rpl_semi_sync_master_enabled=1
server_id=X # must be a unique number
! Prepare each Slave (MySQL 5.6 style):
gtid_mode=ON
enforce_gtid_consistency=1
log_bin=binlog
relay_log=relay-bin
log_slave_updates=1
expire_logs_days=7
#loose_rpl_semi_sync_slave_enabled=1
server_id=Y ## must be a unique number
Confidential

! Enabling the binlog (log_slave_updates and log_bin) on
the Galera nodes allows you to do Point In Time
Recovery
! Restore last backup (must be newer than
Confidential
expire_logs_days )
! Use the binary logs to roll forward until last transaction.

! A common problem is overload situations, which can
originate from:
! DDOS
! Website is loading slow, user reload, creating more and more
Confidential
connections
! Eventually the MySQL server runs out of connections
(max_connections)
! 5.6 only scales to 40 cores anyways
! A good way of alleviating this is to use a Proxy, e.g
HAProxy.
36

Confidential
37
Galera Cluster
Limit the # of
backend
connections
HAProxy queues
incoming
connections

Confidential
Thank You!
! Cluster Configurator
! www.severalnines.com/config
! ClusterControl
! www.severalnines.com/clustercontrol
! Severalnines Blog
! www.severalnines.com/blog
! Contact: jj@severalnines.com
39

9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to 9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides

Similar to 9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides (20)

More from Severalnines

More from Severalnines (20)

Recently uploaded

Recently uploaded (20)

9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides