High Availability and Scalability: Too Expensive! Architectures for Future Enterprise Systems
- 1. High Availability and
Scalability: Too Expensive!–
Architectures for Future
Enterprise Systems
Eberhard Wolff
Freelance Consultant / Trainer
Head Technolocy Advisory Board adesso AG
Eberhard Wolff - @ewolff
- 11. • Buy highly reliable
hardware
• Built a small cluster
• 2 machines
• Maybe add a stand-by
data center
Eberhard Wolff - @ewolff
- 16. • Server fails
• Application fails
• No service to the customer
• Can we do better?
Eberhard Wolff - @ewolff
- 19. • Failing systems do not impact user
• Failing systems are just restarted
• Restarts happen automatically
• System run in different data centers
• i.e. eu-west-1a / b / c
Eberhard Wolff - @ewolff
- 21. What It Takes…
• Virtualization
• +API to start new servers
• Watchdog to detect failed servers
• Redundant data centers if needed
Eberhard Wolff - @ewolff
- 22. Can be implemented
in your datacenter!
I have none.
So I used the Amazon Cloud
Eberhard Wolff - @ewolff
- 24. Hardware
• As cheap as it gets
• Not highly available
• Availability in Software
Eberhard Wolff - @ewolff
- 28. • Depends on details
• True story:
• Order of patch
installations matter
Eberhard Wolff - @ewolff
- 36. Stateless
• No data is lost
• New server can take load
immediately
Eberhard Wolff - @ewolff
- 40. • Easy to create test environments
• …with other software version
Eberhard Wolff - @ewolff
- 41. Chaos Monkey
• Tool by Netflix
• Video streaming
• #1 in Internet usage in the US
Eberhard Wolff - @ewolff
- 42. Chaos Monkey
• Kill random machines
• To ensure system survives
hardware failures
Eberhard Wolff - @ewolff
- 43. Would you rather rely on…
…highly available hardware
…or a Chaos Monkey tested
system?
Eberhard Wolff - @ewolff
- 50. Problem: Estimate & Scaling
• Performance hard to estimate
• Coarse grained scaling
• Backfires
Eberhard Wolff - @ewolff
- 51. True Story
•
•
•
•
•
•
•
Initial estimate wrong
Just need a little more
Cluster: two servers
Add one
About 50% higher costs
Order / install server takes time
Bad performance until server
delivered
Eberhard Wolff - @ewolff
- 52. Problem: Load Peak
• Business has load peaks
• i.e. events that people register for
• Need to have enough hardware for
load peaks
• Costly
Eberhard Wolff - @ewolff
- 53. Problem: Testing
• Testing
• Need production-like infrastructure
• Prohibitive costs
• Only needed during tests
Eberhard Wolff - @ewolff
- 56. What You Have Just Seen
• System tunes itself depending on
load
• Same approach as for availability
• +Watchdog for load
Eberhard Wolff - @ewolff
- 57. Easy to create a new server
Redundancy in Software
Reliably reproducible
✔
✔
✔
Stateless ?
Eberhard Wolff - @ewolff
- 58. Stateless
• Stateless web servers: best practice
• Some Java framework don’t follow
the approach
• Can store HTTP session externally
• i.e. RDBMS, NoSQL, Cache
Eberhard Wolff - @ewolff
- 60. Databases
• Often assumed to be
just “fast and scalable”
• Large scale doable i.e.
Data Warehouse
• Often use traditional
approach
• Cluster with two nodes
• Highly available
hardware
Eberhard Wolff - @ewolff
- 68. Replicas & Shards
• Easy to understand
• But: Coarse grained scaling
• Adding another shard means
• Moving lots of data
• Add quite some servers
Eberhard Wolff - @ewolff
- 69. Amazon Dynamo Model
Server A
Shard3
Shard1
Server B
Shard1
Shard2
Shard4
Shard4
Server D
Shard2
Shard4
Server C
Shard2
Shard3
Shard3
Shard1
Eberhard Wolff - @ewolff
- 70. Amazon Dynamo Model
Server A
Shard3
Shard1
Server B
Shard1
Shard2
Shard4
Shard4
Server D
Shard2
Shard4
Server C
Shard2
Shard3
Shard3
Shard1
Eberhard Wolff - @ewolff
- 71. Amazon Dynamo Model
Server A
Shard3
Shard1
Server B
Shard1
Shard2
Shard4
Shard4
New Server
Server D
Shard2
Shard4
Server C
Shard2
Shard3
Shard3
Shard1
Eberhard Wolff - @ewolff
- 72. Amazon Dynamo Model
• Published in the Dynamo paper
• Implementations:
Riak, Cassandra etc
• Fine grained scaling
• Can immediately write to new node
Eberhard Wolff - @ewolff
- 73. Hardware
• Not highly reliable
• Scales by distributing load across
servers
• No NAS, SAN, RAID…
• As cheap as it gets
Eberhard Wolff - @ewolff