MySQL HA Presentation

MySQL HA Using different solutions Robert Krzykawski DB Team Coordinator, bwin games. Anders Karlsson Principal Sales Engineer, MySQL

Agenda Who are we? HA Basics – Anders How we did it; Success or failure – Robert Summary Questions?

Anders Karlsson Sales Engineer with Sun / MySQL for 5+ years I have been in the RDBMS business for 20+ years I have worked for many of the major vendors and with most of the vendor products I’ve been in roles as Sales Engineer Consultant Porting engineer Support engineer Etc. Outside MySQL I build websites (www.papablues.com), develop Open Source software (MyQuery, ndbtop etc), am a keen photographer and drives sub-standard cars, among other things. Also: www.makezfsgpl.com ! Right now!

Robert Krzykawski DB Team Coordinator @ bwin Games AB Have been working with MySQL in every way from system admin, DBA, DBD and now taking a more system architectural role. Been involved in building both small and big web based solutions since 1998 using MySQL. My roles throughout my professional life have varied. System administrator, Technical Sales support, DBA, DBD, Programmer, Application architect and System architect. Off work I am trying to automate things with scripts and programs to off load myself when “on work”.  I am also trying to find time to snowboard, play some paintball and a recently introduced hobby is our Maine Coon kittens. 

Why do you need HA Something can break. It usually will, eventually You will need to maintain your database eventually, without shutting the whole system down Adding HA to an existing running system is difficult, Much more so than to provide HA from the start You want a good nights sleep! You want failover to be automatic!

HA Concepts Fault tolerant architectures These are hardware architectures with supporting software that prevents against even individual component failures Single Point of Failure (SPOF) In any fault tolerant setup, you want to avoid a SPOF, as a link is not better than it’s weakest link Fail over and Fail back Fail over is the process of switching from a failed component to another component, dormant or also active. Fail back is the process of failing back from the backup component to the original one.

Some HA Components Heartbeat Heartbeat is an HA component that checks that the services that are being failed over, are alive. Heartbeat can check individual servers, software services, networking etc. HA Monitor The HA Monitor has different names in different frameworks. This is the component that allows configuration of the services, ensures proper shutdown and startup and allows manual control Replication Replication is a common component that ensures that the data content of managed data rich components are in sync

What should I require? Don’t aim too high, aim for what is reasonable for your needs Aim to ensure that no important data is lost What is “important data”? You decide! Different data means different “needs”! Aim to ensure that the solution can be automated. You will want this eventually anyway Aim to ensure a solution that can easily be tested and administered Aim to ensure that the solution is performant and scalable

MySQL Replication Easy to use and set up. Low performance impact Asynchronous only. Failback can be difficult. Need additional components MySQL with DRBD / ZFS / AVS Easy to use. Low cost software only. Synchronous. Good HA software integration. Certain performance impact. Limited data size and transaction rates. HA with MySQL – In short

MySQL with Shared storage Good performance. Eases hardware management. Good integration with HA software. Costly. SAN itself is a SPOF. MySQL Cluster Very good performance. Self contained. Very short fail-over times. Software only solution. Needs several physical servers. Not optimized for all MySQL applications. HA with MySQL – In short

Our goal at bwin We were faced with a requirement; establish a highly available database platform. We had some rules to follow from management. interruptions due to hardware failure should not require hands-on work. Downtime should be minimized during interruptions. Performance of DB platform should not decrease when operating as usual Performance can decrease if a failure has occurred but should not deem the service unusable. Implementation should be done by the operations department. Developers should not be involved.

What solutions did we consider? Master/Master Linux HA HP Service Guard Sun Cluster Combination of the above MySQL Cluster Will walk through all of the above

Master/Master Master/Master with two active nodes would give us a seamless switch if we have a good load balancer. Will give us the ability to do schema changes “on line” Not only higher availability when both nodes are up, but better performance. Can eliminate the use of production slaves. One entry point for application when using “LB”

Linux HA/ServiceGuard/SunCluster Service IP switch will cause a glitch in service. Since we are running 4.0 we can’t really do a master/master setup with service IP switching. Slave integrity is important and we are running 4.0; One master data. Can’t switch to slave and hope that everything was replicated. We are using SAN – Shared storage possible. One instance, two machines – One active, one standby. Innodb log size will be a problem. Timeout during recovery can cause problems during switch.

MySQL Cluster High availability built in if implemented correct Requires more hardware. More complex solution Requires application to support NDB Not full feature set.

Obstacles We are using MySQL 4.0 in our biggest database Master/Master scenario on 4.0 requires higher level of application awareness. LinuxHA/ServiceGuard/Sun Cluster will cause small glitch when we move resources. MySQL Cluster will require even more application changes in our case.

Our Choice LinuxHA because it is GPL/LGPL. Free and not owned by an organization. Fastest way to implement, did not require any support from dev. Department. All other ways required changes in application.

We do.. Use Linux HA 2.0. Needed for setup of “cluster” Use SAN. Shared storage is easier and faster, but Expensive. DRBD can be used but saves the same data twice Also comes with a performance decrease. Heartbeat on two bonds. Primary database interconnect network, secondary on database service network We have LUNs presented to multiple hosts Services have rules to be run on specific hosts only. We fence using RiLOE Have plans to fence on port level in FC switches.

What’s good and what’s bad.. Easy and fast implementation Our config does not increase/decrease performance. Innodb log size causes long recovery times. Testing to decrease it has caused performance penalties. Our solution is not fool proof because of long recovery times. It causes interruption of service. We can say it’s HA, but true HA solution would give us 100% uptime. 2nd Setup is complicated. We should aim for having simple setups. More common

What can we do better. Fine tune config for faster recovery/startup Add better fencing Monitor failover in case recovery takes long Master/Master or Multi master. If application can reconnect or if we have a smart load balancer we have no outages. Upgrades or schema changes can be made “online” No separation between writes and reads. Less complicated for developers. One entry point.

Summary Concepts Components Requirements Technologies Your goal Considerations Obstacles How we did it @ bwin games AB HA recommendations

Questions The question is not, ‘What is the answer?’ The question is, ‘What is the question?’ Henri Poincaré

Thank you for your time! And thank you for listening so kindly. We can be found on: Robert Krzykawski – http://krzykawski.com Anders Karlsson – http://papablues.com http://karlssonondatabases.blogspot.com /

MySQL HA Presentation

More Related Content

MySQL HA Presentation