Deploying at will - SEI
- 1. 25th November | UPTEC
Opening Event
Devops and Engineering Best Practices
- 6. Why Does Placing a System into
Production Take so Long?
• One reason is that changes are packaged into
releases.
• The CEO’s project has to be combined with
other efforts to make up a release
• Releases are scheduled
6
- 7. Managing releases
• Releases are stressful.
• Releases take careful management
– Errors in deployed code are a major source of
outages.
– So much so that organizations have formal release
plans.
– There is a position called a “Release Engineer”
that has responsibility for managing releases.
7
- 8. Suppose releases did not have to be
scheduled?
• Release when a code segment is complete
• Small releases
• Less stress
• Less waiting for a release to be complete
Shorter time to market for new features, fixes
• Happier CEO!
8
- 9. Ad hoc releases exist
• Many companies now release to production
multiple times per day.
– Etsy releases 90 times a day
– Facebook releases 2 times a day
– Amazon had a new release to production every
11.6 seconds in May of 2011
9
- 10. Replace management discipline over
release by engineering discipline
• The management discipline that went into
release planning and execution is replaced by
– Engineering process discipline
– Architecture techniques
– Tool support
10
- 11. Deployment
• Much of the current software engineering focus is on
completing code
• But … Code Complete Code in Production
• Between the completion of the code and the placing of
the code into production is a step called: Deployment
• Deploying completed code can be very time consuming
• One purpose of release planning is to deploy code
without errors
11
- 12. Modern Deployment Processes
Process Architecture techniques Tools
Continuous
Deployment
• Microservice architecture
• Backward/Forward
compatability
• Feature toggles
• Management
tools
• Deployment
pipeline tools
• Configuration
management tools
Post deployment
testing
• Reliability tactics
• Pedigreed testing
• Initialization testing
• Log generation
• Fault injection
tools
• Locality tools
• Performance
monitors
• Janatorial tools
12
- 14. Code is automatically placed into
production
14
Developer pushes a button and, as long as all of the automated
tests are passed, the code is placed into production
automatically through a tool chain.
• No coordination with other teams during the execution of
the tool chain
• No dependence on other teams activities
The ability to have a continuous deployment pipeline depends
on the architecture of the system being deployed.
- 15. ~2002 Amazon instituted the following
design rules - 1
• All teams will henceforth expose their data and
functionality through service interfaces.
• Teams must communicate with each other through
these interfaces.
• There will be no other form of inter-process
communication allowed: no direct linking, no direct
reads of another team’s data store, no shared-
memory model, no back-doors whatsoever. The only
communication allowed is via service interface calls
over the network.
15
- 16. Amazon design rules - 2
• It doesn’t matter what technology they[services]
use.
• All service interfaces, without exception, must be
designed from the ground up to be externalizable.
• Amazon is providing the specifications for what has
come to be called “Microservice Architecture”.
• (Its really an architectural style).
16
- 17. In Addition
• Amazon has a “two pizza” rule.
• No team should be larger than can be fed with two
pizzas (~7 members).
• Each (micro) service is the responsibility
of one team
• This means that microservices are
small and intra team bandwidth
is high
• Large systems are made up of many microservices.
• There may be as many as 140 in a typical Amazon page.
17
- 19. Micro service architecture
19
Service
• Each user request is satisfied by
some sequence of services.
• Most services are not externally
available.
• Each service communicates with
other services through service
interfaces.
• Service depth may
– Shallow (large fan out)
– Deep (small fan out, more
dependent services)
- 20. Services can have multiple instances
• The elasticity of the cloud will adjust the
number of instances of each service to reflect
the workload.
• Requests are routed through a load balancer
for each service
• This leads to
– Lots of load balancers
– Overhead for each request.
20
- 21. Digression into Service Oriented
Architecture (SOA)
– The definition of microservice architecture sounds
a lot like SOA.
– What is the difference?
– Amazon did not use the term “microservice
architecture” when they introduced their rules.
They said “this is SOA done right”
21
- 22. SOA typically has but microservice
architecture does not
• Enterprise service bus
• Elaborate protocols for sending messages to
services (WDSL*)
• Each service may be under the control of
different organization
• Brokers
• etc
22
- 23. Back to microservices
• Each service is the responsibility of a single
development team
• Individual developers can deploy new version
without coordination with other developers.
• It is possible that a single development team is
responsible for multiple services
23
- 24. Coordination model of microservice
architecture
• Elements of service interaction
– Services communicate asynchronously through
message passing
– Each service could (in principle) be deployed
anywhere on the net.
• Latency requirements will probably force particular
deployment location choices.
• Services must discover location of dependent services.
– State must be managed
24
- 25. Questions about Microservice
architecture
• /Q/ Isn’t it possible that different teams will implement
the same functionality, likely differently?
• /A/ Yes, but so what? Major duplications are avoided
through assignment of responsibilities to services.
Minor duplications are the price to be paid to avoid
necessity for synchronous coordination.
• /Q/ what about transactions?
• /A/ Microservice architectures privilege flexibility
above reliability and performance. Transactions are
recoverable through logging of service interactions.
This may introduce some delays if failures occur.
25
- 27. Deploying a new version
of an application
Multiple instances of
a service are
executing
• Red is service being replaced
with new version
• Blue are clients
• Green are dependent services
VAVB
VBVB
UAT / staging /
performance
tests
- 28. Deployment goal and constraints
• Goal of a deployment is to move from current state (N instances
of version A of a service) to a new state (N instances of version B
of a service)
• Constraints:
– Any development team can deploy their service at any time.
I.e. New version of a service can be deployed either before or
after a new version of a client. (no synchronization among
development teams)
– It takes time to replace one instance of version A with an
instance of version B (order of minutes)
– Service to clients must be maintained while the new version is
being deployed.
28
- 29. Deployment strategies
• Two basic all of nothing strategies
– Red/Black – leave N instances with version A as they
are, allocate and provision N instances with version B
and then switch to version B and release instances
with version A.
– Rolling Upgrade – allocate one instance, provision it
with version B, release one version A instance. Repeat
N times.
• Partial strategies are canary testing and A/B
testing.
29
- 30. Trade offs – Red/Black and
Rolling Upgrade
• Red/Black
– Only one version available
to the client at any
particular time.
– Requires 2N instances
(additional costs)
• Rolling Upgrade
– Multiple versions are
available for service at the
same time
– Requires N+1 instances.
• Rolling upgrade is widely
used.
Update Auto Scaling
Group
Sort Instances
Remove & Deregister Old
Instance from ELB
Confirm Upgrade Spec
Terminate Old Instance
Wait for ASG to Start
New Instance
Register New Instance
with ELB
Rolling
Upgrade
in EC2
- 31. What are the problems with
Rolling Upgrade?
• Any development team can deploy their service
at any time.
• Three concerns
– Maintaining consistency between different versions of
the same service when performing a rolling upgrade
– Maintaining consistency among different services
– Maintaining consistency between aand persistent data
• Solutions exist for these concerns.
31
- 32. Canary testing
• Canaries are a small number of instances of a new version
placed in production in order to perform live testing in a
production environment.
• Canaries are observed closely to determine whether the
new version introduces any logical or performance
problems. If not, roll out new version globally. If so, roll
back canaries.
• Named after canaries
in coal mines.
• Similar in concept to
beta testing for shrink
wrapped software
32
- 33. A/B testing
• Suppose you wish to test user response to a
system variant. E.g. UI difference or marketing
effort. A is one variant and B is the other.
• You simultaneously make available both
variants to different audiences and compare
the responses.
• Implementation is the same as canary testing.
33
- 34. Rollback
• New versions of a service may be unacceptable
either for logical or performance reasons.
• Two options in this case
• Roll back (undo deployment)
• Roll forward (discontinue current deployment and
create a new release without the problem).
• Some organizations have automated rollback processes
• If the newly deployed system does not meet particular
performance metrics roll it back.
- 35. Summary
• Deploying small increments removes release
bottleneck
• Continuous deployment is a technique to
speed up deployment time. It relies on
– Disciplined engineering process
– Architectural techniques
– Tools
35