SlideShare a Scribd company logo
High Availability and
Scalability: Too Expensive!–
Architectures for Future
Enterprise Systems

Eberhard Wolff
Freelance Consultant / Trainer

Head Technolocy Advisory Board adesso AG

Eberhard Wolff - @ewolff
The Dream

Foto: http://www.vaxman.de/

Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
Where Are We?

Eberhard Wolff - @ewolff
Non-functional
Requirements
Eberhard Wolff - @ewolff
Availability
Performance
Eberhard Wolff - @ewolff
Availability
Performance
Eberhard Wolff - @ewolff
Availability:

Traditional
Approach

Eberhard Wolff - @ewolff
•  Buy highly reliable
hardware
•  Built a small cluster
•  2 machines
•  Maybe add a stand-by
data center
Eberhard Wolff - @ewolff
•  Eventually system will fail
•  …and you are in real trouble

Eberhard Wolff - @ewolff
True Story
• 
• 
• 
• 

“Machine rebooted over night.”
“Several times.”
“No idea how often.”
“No idea why…”

Eberhard Wolff - @ewolff
Let’s look at an
example

Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
•  Server fails
•  Application fails
•  No service to the customer
•  Can we do better?

Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
What You Have
Just Seen

Eberhard Wolff - @ewolff
•  Failing systems do not impact user
•  Failing systems are just restarted
•  Restarts happen automatically
•  System run in different data centers
•  i.e. eu-west-1a / b / c
Eberhard Wolff - @ewolff
System
EU West 1a
Elastic
Load
Balancer

System
EU West 1b
System
EU West 1c
Eberhard Wolff - @ewolff
What It Takes…
•  Virtualization
•  +API to start new servers
•  Watchdog to detect failed servers
•  Redundant data centers if needed

Eberhard Wolff - @ewolff
Can be implemented
in your datacenter!
I have none.

So I used the Amazon Cloud
Eberhard Wolff - @ewolff
Alternatives

Eberhard Wolff - @ewolff
Hardware
•  As cheap as it gets
•  Not highly available
•  Availability in Software

Eberhard Wolff - @ewolff
Traditional Servers

Eberhard Wolff - @ewolff
Traditional Servers

Eberhard Wolff - @ewolff
Highly
customized
Hard to
reproduce
Eberhard Wolff - @ewolff
•  Depends on details
•  True story:
•  Order of patch
installations matter
Eberhard Wolff - @ewolff
Stateful
Eberhard Wolff - @ewolff
Redundancy in
Hardware
Eberhard Wolff - @ewolff
Traditional Servers

Eberhard Wolff - @ewolff
Phoenix Servers

Eberhard Wolff - @ewolff
Easy to create a
new server
Eberhard Wolff - @ewolff
Reliably
reproducible
Eberhard Wolff - @ewolff
Stateless
Eberhard Wolff - @ewolff
Stateless
•  No data is lost
•  New server can take load
immediately

Eberhard Wolff - @ewolff
Redundancy in
Software
Eberhard Wolff - @ewolff
Implementations
•  Might use a VM image
•  …or a PaaS
•  …or provisioning tools

Eberhard Wolff - @ewolff
Provisioning Tools

Eberhard Wolff - @ewolff
•  Easy to create test environments
•  …with other software version

Eberhard Wolff - @ewolff
Chaos Monkey
•  Tool by Netflix
•  Video streaming
•  #1 in Internet usage in the US

Eberhard Wolff - @ewolff
Chaos Monkey
•  Kill random machines
•  To ensure system survives
hardware failures

Eberhard Wolff - @ewolff
Would you rather rely on…
…highly available hardware
…or a Chaos Monkey tested
system?
Eberhard Wolff - @ewolff
Resilience
Eberhard Wolff - @ewolff
Availability
Performance
Eberhard Wolff - @ewolff
Availability
Performance
Eberhard Wolff - @ewolff
Performance:
Traditional
Approach

Eberhard Wolff - @ewolff
• 
• 
• 
• 
• 

Estimate
#Users
Use Cases
Data volume
Etc.

•  Add a little bit
•  Order servers

Eberhard Wolff - @ewolff
Performance:
Problems

Eberhard Wolff - @ewolff
Problem: Estimate & Scaling
•  Performance hard to estimate
•  Coarse grained scaling
•  Backfires

Eberhard Wolff - @ewolff
True Story
• 
• 
• 
• 
• 
• 
• 

Initial estimate wrong
Just need a little more
Cluster: two servers
Add one
About 50% higher costs
Order / install server takes time
Bad performance until server
delivered
Eberhard Wolff - @ewolff
Problem: Load Peak
•  Business has load peaks
•  i.e. events that people register for
•  Need to have enough hardware for
load peaks
•  Costly
Eberhard Wolff - @ewolff
Problem: Testing
•  Testing
•  Need production-like infrastructure
•  Prohibitive costs
•  Only needed during tests

Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
System
EU West 1b
Elastic
Load
Balancer

System
EU West 1c
System
EU West 1c
System
EU West 1c
Eberhard Wolff - @ewolff
What You Have Just Seen
•  System tunes itself depending on
load
•  Same approach as for availability
•  +Watchdog for load

Eberhard Wolff - @ewolff
Easy to create a new server
Redundancy in Software
Reliably reproducible

✔

✔

✔

Stateless ?
Eberhard Wolff - @ewolff
Stateless
•  Stateless web servers: best practice
•  Some Java framework don’t follow
the approach
•  Can store HTTP session externally
•  i.e. RDBMS, NoSQL, Cache
Eberhard Wolff - @ewolff
What about
Databases?
Eberhard Wolff - @ewolff
Databases

•  Often assumed to be
just “fast and scalable”
•  Large scale doable i.e.
Data Warehouse
•  Often use traditional
approach
•  Cluster with two nodes
•  Highly available
hardware

Eberhard Wolff - @ewolff
Database: Problems
•  Availability
•  Highly available hardware
•  Performance
•  Limited scaling
•  Costly
Eberhard Wolff - @ewolff
Databases
•  New approaches
•  Used by NoSQL databases
•  But also i.e. MySQL
•  …or in system architecture
Eberhard Wolff - @ewolff
Databases
•  Replication
•  Read performance
•  Availability
•  Sharding
•  Spread data across servers
•  Write performance
Eberhard Wolff - @ewolff
Scaling MongoDB
Replica 1

Replica 1

Replica 2

Replica 2

Replica 3

Replica 3

Shard 1

Shard 2
Eberhard Wolff - @ewolff
Availability
Replica 1

Replica 1

Replica 2

Replica 2

Replica 3

Replica 3

Shard 1

Shard 2
Eberhard Wolff - @ewolff
Scaling MongoDB
Replica 1

Replica 1

Replica 1

Replica 2

Replica 2

Replica 2

Replica 3

Replica 3

Replica 3

Shard 1

Shard 2

Shard 3
Eberhard Wolff - @ewolff
Scaling MongoDB
Replica 1
Replica 2

Replica 1

?

Replica 2

Replica 3

Replica 3

Shard 1

Shard 2
Eberhard Wolff - @ewolff
Replicas & Shards
•  Easy to understand
•  But: Coarse grained scaling
•  Adding another shard means
•  Moving lots of data
•  Add quite some servers
Eberhard Wolff - @ewolff
Amazon Dynamo Model
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4

Server D
Shard2
Shard4

Server C
Shard2
Shard3

Shard3

Shard1
Eberhard Wolff - @ewolff
Amazon Dynamo Model
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4

Server D
Shard2
Shard4

Server C
Shard2
Shard3

Shard3

Shard1
Eberhard Wolff - @ewolff
Amazon Dynamo Model
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4
New Server

Server D
Shard2
Shard4

Server C
Shard2
Shard3

Shard3

Shard1
Eberhard Wolff - @ewolff
Amazon Dynamo Model
•  Published in the Dynamo paper
•  Implementations:
Riak, Cassandra etc
•  Fine grained scaling
•  Can immediately write to new node
Eberhard Wolff - @ewolff
Hardware
•  Not highly reliable
•  Scales by distributing load across
servers
•  No NAS, SAN, RAID…
•  As cheap as it gets
Eberhard Wolff - @ewolff
Sum Up
• 
• 
• 
• 
• 
• 
• 

Virtualization
+ Phoenix server
= Better availability
= Better performance
= Lower costs
Stateless servers
NoSQL
Eberhard Wolff - @ewolff
Thank You!
Eberhard Wolff - @ewolff

More Related Content

High Availability and Scalability: Too Expensive! Architectures for Future Enterprise Systems