High Availability and Scalability: Too Expensive! Architectures for Future Enterprise Systems

High Availability and
Scalability: Too Expensive!–
Architectures for Future
Enterprise Systems

Eberhard Wolff
Freelance Consultant / Trainer 
Head Technolocy Advisory Board adesso AG

Eberhard Wolff - @ewolff

The Dream

Foto: http://www.vaxman.de/


Where Are We?


Non-functional
Requirements

Availability
Performance

Availability: 
Traditional
Approach


•  Buy highly reliable
hardware
•  Built a small cluster
•  2 machines
•  Maybe add a stand-by
data center

•  Eventually system will fail
•  …and you are in real trouble


True Story
• 
• 
• 
• 

“Machine rebooted over night.”
“Several times.”
“No idea how often.”
“No idea why…”


Let’s look at an
example


•  Server fails
•  Application fails
•  No service to the customer
•  Can we do better?


What You Have
Just Seen


•  Failing systems do not impact user
•  Failing systems are just restarted
•  Restarts happen automatically
•  System run in different data centers
•  i.e. eu-west-1a / b / c

System
EU West 1a
Elastic
Load
Balancer

System
EU West 1b
System
EU West 1c

What It Takes…
•  Virtualization
•  +API to start new servers
•  Watchdog to detect failed servers
•  Redundant data centers if needed


Can be implemented
in your datacenter!
I have none.

So I used the Amazon Cloud

Alternatives


Hardware
•  As cheap as it gets
•  Not highly available
•  Availability in Software


Traditional Servers


Highly
customized
Hard to
reproduce

•  Depends on details
•  True story:
•  Order of patch
installations matter

Stateful

Redundancy in
Hardware

Phoenix Servers


Easy to create a
new server

Reliably
reproducible

Stateless

Stateless
•  No data is lost
•  New server can take load
immediately


Redundancy in
Software

Implementations
•  Might use a VM image
•  …or a PaaS
•  …or provisioning tools


Provisioning Tools


•  Easy to create test environments
•  …with other software version


Chaos Monkey
•  Tool by Netflix
•  Video streaming
•  #1 in Internet usage in the US


Chaos Monkey
•  Kill random machines
•  To ensure system survives
hardware failures


Would you rather rely on…
…highly available hardware
…or a Chaos Monkey tested
system?

Resilience

Performance:
Traditional
Approach


• 
• 
• 
• 
• 

Estimate
#Users
Use Cases
Data volume
Etc.

•  Add a little bit
•  Order servers


Performance:
Problems


Problem: Estimate & Scaling
•  Performance hard to estimate
•  Coarse grained scaling
•  Backfires


True Story
• 
• 
• 
• 
• 
• 
• 

Initial estimate wrong
Just need a little more
Cluster: two servers
Add one
About 50% higher costs
Order / install server takes time
Bad performance until server
delivered

Problem: Load Peak
•  Business has load peaks
•  i.e. events that people register for
•  Need to have enough hardware for
load peaks
•  Costly

Problem: Testing
•  Testing
•  Need production-like infrastructure
•  Prohibitive costs
•  Only needed during tests


System
EU West 1b
Elastic
Load
Balancer

System
EU West 1c
System
EU West 1c
System
EU West 1c

What You Have Just Seen
•  System tunes itself depending on
load
•  Same approach as for availability
•  +Watchdog for load


Easy to create a new server
Redundancy in Software
Reliably reproducible

✔

✔

✔

Stateless ?

Stateless
•  Stateless web servers: best practice
•  Some Java framework don’t follow
the approach
•  Can store HTTP session externally
•  i.e. RDBMS, NoSQL, Cache

What about
Databases?

Databases

•  Often assumed to be
just “fast and scalable”
•  Large scale doable i.e.
Data Warehouse
•  Often use traditional
approach
•  Cluster with two nodes
•  Highly available
hardware


Database: Problems
•  Availability
•  Highly available hardware
•  Performance
•  Limited scaling
•  Costly

Databases
•  New approaches
•  Used by NoSQL databases
•  But also i.e. MySQL
•  …or in system architecture

Databases
•  Replication
•  Read performance
•  Availability
•  Sharding
•  Spread data across servers
•  Write performance

Scaling MongoDB
Replica 1

Replica 1

Replica 2

Replica 2

Replica 3

Replica 3

Shard 1

Shard 2

Availability
Replica 1

Replica 1

Replica 2

Replica 2

Replica 3

Replica 3

Shard 1

Shard 2

Scaling MongoDB
Replica 1

Replica 1

Replica 1

Replica 2

Replica 2

Replica 2

Replica 3

Replica 3

Replica 3

Shard 1

Shard 2

Shard 3

Scaling MongoDB
Replica 1
Replica 2

Replica 1

?

Replica 2

Replica 3

Replica 3

Shard 1

Shard 2

Replicas & Shards
•  Easy to understand
•  But: Coarse grained scaling
•  Adding another shard means
•  Moving lots of data
•  Add quite some servers

Amazon Dynamo Model
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4

Server D
Shard2
Shard4

Server C
Shard2
Shard3

Shard3

Shard1

Amazon Dynamo Model
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4
New Server

Server D
Shard2
Shard4

Server C
Shard2
Shard3

Shard3

Shard1

Amazon Dynamo Model
•  Published in the Dynamo paper
•  Implementations:
Riak, Cassandra etc
•  Fine grained scaling
•  Can immediately write to new node

Hardware
•  Not highly reliable
•  Scales by distributing load across
servers
•  No NAS, SAN, RAID…
•  As cheap as it gets

Sum Up
• 
• 
• 
• 
• 
• 
• 

Virtualization
+ Phoenix server
= Better availability
= Better performance
= Lower costs
Stateless servers
NoSQL

Thank You!

High Availability and Scalability: Too Expensive! Architectures for Future Enterprise Systems

Related slideshows

More Related Content