SlideShare a Scribd company logo
WHY
KUBERNETES?
ENTERPRISE
CLOUD NATIVE SUMMIT
2019-10-08
HENNING JACOBS
@try_except_
2
ROLLING OUT KUBERNETES?
"We are rolling out Kubernetes to production next month
and I'm interested to hear from people who made that
step already."
3
DON'T USE IT !!!!!
4
DON'T USE IT !!!!!
5
6
KUBERNETES FAILURE STORIES
7
~ 5.4billion EUR
revenue 2018
> 300
million
visits
per
month
~ 14,000
employees in
Europe
> 80%
of visits via
mobile devices
> 28
million
active customers
> 400,000
product choices
> 2,000
brands
17
countries
as of June 2019
ZALANDO AT A GLANCE
8
A BRIEF HISTORY OF
ZALANDO TECH
9
2010
"Sysop-Test"
"QA-Test"
10
2013: SELF SERVICE
11
2015: RADICAL AGILITY
AWS
STUPS
DOCKER
DEPLOY
SSH
ACCESS
AUDIT
REPORTS
FULL AWS
ACCESS
Teams have
admin access
& full
responsibility
12
2015: ISOLATED AWS ACCOUNTS
Internet
*.abc.example.org *.xyz.example.org
Team ABC Team XYZ
EC2EC2
ELBELB
EC2
13
2019: SCALE
140Clusters
396Accounts
14
2019: DEVELOPERS USING KUBERNETES
15
Platform
> 1100
developers
> 200
development teams
16
YOU BUILD IT, YOU RUN IT
The traditional model is that you take your software to the
wall that separates development and operations, and
throw it over and then forget about it. Not at Amazon.
You build it, you run it. This brings developers into
contact with the day-to-day operation of their software. It
also brings them into day-to-day contact with the
customer.
- A Conversation with Werner Vogels, ACM Queue, 2006
17
ON-CALL: YOU OWN IT, YOU RUN IT
When things are broken,
we want people with the best
context trying to fix things.
- Blake Scrivener, Netflix SRE Manager
18
DEVELOPER JOURNEY
Consistent story
that models
all aspects of SW dev
19
Developer
Journey
20
Developer
Journey
Correctness
Compliance
GDPR
Security
Cost Efficiency
24x7 On Call
Governance
Resilience
Capacity
...
21
DEVELOPER PRODUCTIVITY
Code Build Test Deploy OperateSetup
Cloud Native Application Runtime
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise Cloud Native Summit
23
PLAN & SETUP
24
Plan
Stories
Rules of Play
Tech Radar
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise Cloud Native Summit
26
Setup
Application
Bootstrapping
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise Cloud Native Summit
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise Cloud Native Summit
29
BUILD & TEST
30
CDPGit
code
push
CONTINUOUS DELIVERY PLATFORM: BUILD
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise Cloud Native Summit
32
DEPLOY
33
Deploy
Kubernetes
34
DEPLOYMENT CONFIGURATION
├── deploy/apply
│ ├── deployment.yaml
│ ├── credentials.yaml # Zalando IAM
│ ├── ingress.yaml
│ └── service.yaml
└── delivery.yaml # Zalando CI/CD
35
INGRESS.YAML
kind: Ingress
metadata:
name: "..."
spec:
rules:
# DNS name your application should be exposed on
- host: "myapp.foo.example.org"
http:
paths:
- backend:
serviceName: "myapp"
servicePort: 80
36
TEMPLATING: MUSTACHE
kind: Ingress
metadata:
name: "..."
spec:
rules:
# DNS name your application should be exposed on
- host: "{{{APPLICATION}}}.example.org"
http:
paths:
- backend:
serviceName: "{{{APPLICATION}}}"
servicePort: 80
37
CONTINUOUS DELIVERY PLATFORM
38
CDP: DEPLOY
"glorified kubectl apply"
39
CDP: OPTIONAL APPROVAL
40
STACKSET: TRAFFIC SWITCHING
github.com/zalando-incubator/stackset-controller
41
TRAFFIC SWITCHING STEPS IN CDP
github.com/zalando-incubator/stackset-controller
42
Deploy
You build it, you run it!
43
EMERGENCY ACCESS SERVICE
Emergency access by referencing Incident
zkubectl cluster-access request 
--emergency -i INC REASON
Privileged production access via 4-eyes
zkubectl cluster-access request REASON
zkubectl cluster-access approve USERNAME
44
KUBERNETES WEB VIEW
kubectl get
pods,stacks,deploys,..
45
SEARCHING ACROSS 140+ CLUSTERS
codeberg.org/hjacobs/kube-web-view
codeberg.org/hjacobs/kube-web-view
47
INTEGRATIONS
48
CLOUD FORMATION VIA CI/CD
├── deploy/apply
│ ├── deployment.yaml # Kubernetes
│ ├── cf-iam-role.yaml # AWS IAM Role
│ ├── cf-rds.yaml # AWS RDS Database
│ ├── kube-ingress.yaml
│ ├── kube-secret.yaml
│ └── kube-service.yaml
└── delivery.yaml # CI/CD config
"Infrastructure as Code"
49
POSTGRES OPERATOR
Application to manage
PostgreSQL clusters on
Kubernetes
>500
clusters running
on Kubernetes
github.com/zalando/postgres-operator
Elasticsearch in Kubernetes
Elasticsearch
2.500 vCPUs
1 TB RAM
github.com/zalando-incubator/es-operator/
51
SUMMARY
• Application Bootstrapping
• Git as source of truth and UI
• 4-eyes principle for master/production
• Extensible Kubernetes API as primary interface
• OAuth/IAM credentials
• PostgreSQL, Elasticsearch
• CloudFormation for proprietary AWS services
52
MONITORING &
COST EFFICIENCY
53
OPENTRACING
54
KUBERNETES RESOURCE REPORT
github.com/hjacobs/kube-resource-report
55
RESOURCE REPORT: TEAMS
Sorting teams by
Slack Costs
github.com/hjacobs/kube-resource-report
56
KUBERNETES APPLICATION DASHBOARD
https://github.com/hjacobs/kube-ops-view
58
VERTICAL POD AUTOSCALER
limit/requests adapted by VPA
59
DOWNSCALING DURING OFF-HOURS
github.com/hjacobs/kube-downscaler
Weekend
60
KUBERNETES JANITOR
● TTL and expiry date annotations, e.g.
○ set time-to-live for your test deployment
● Custom rules, e.g.
○ delete everything without "app" label after 7 days
github.com/hjacobs/kube-janitor
61
EC2 SPOT NODES
72% savings
62
STABILITY ↔ EFFICIENCY
Slack
Autoscaling
Buffer
Disable
Overcommit
Cluster
Overhead
Resource
Report
HPA
VPA
Downscaler
Janitor
EC2 Spot
63
DELIVERY PERFORMANCE METRICS
• Lead Time
• Release Frequency
• Time to Restore Service
• Change Fail Rate
srcco.de/posts/accelerate-software-delivery-performance.html
64
CONTAINERS
From "Accelerate: The Science of Lean Software and DevOps"
65
DELIVERY PERFORMANCE METRICS
• Lead Time
• Release Frequency
• Time to Restore Service
• Change Fail Rate
≙ Commit to Prod
≙ Deploys/week/dev
≙ MTRS from incidents
≙ n/a
“.. means establishing empathy with internal
consumers (read: developers) and collaborating
with them on the design. Platform product managers
establish roadmaps and ensure the platform delivers
value to the business and enhances the developer
experience.”
- ThoughtWorks Technology Radar
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise Cloud Native Summit
68
DEVELOPER SATISFACTION
69
DOCUMENTATION
"Documentation is hard to find"
"Documentation is not comprehensive enough"
"Remove unnecessary complexity and obstacles."
"Get the documentation up to date and prepare
use cases"
"More and more clear documentation"
"More detailed docs, example repos with more
complicated deployments."
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise Cloud Native Summit
71
TESTIMONIALS
“So, thank you, Team Automata, for listening to our
community, taking our upvotes in consideration when
developing new solutions and building every day
'the first CI that doesn't suck'.”
- a user, October 2018
72
WHY
KUBERNETES?
73
WHY KUBERNETES?
• provides enough abstractions (StatefulSet, CronJob, ..)
• provides consistency (API spec/status)
• is extensible (annotations, CRDs, API aggreg.)
• certain compatibility guarantee (versioning)
• widely adopted (all cloud providers)
• works across environments and implementations
srcco.de/posts/why-kubernetes.html
74
WHY KUBERNETES?
• Efficiency
• Common Operational Model
• Developer Experience
• Cloud Provider Independent
• Compliance and Security
• Talent
(for Zalando)
75
WHY KUBERNETES?
• Efficiency
• Common Operational Model
• Developer Experience
• Cloud Provider Independent
• Compliance and Security
• Talent
(for Zalando)
76
WHY KUBERNETES?
• Efficiency
• Common Operational Model
• Developer Experience
• Cloud Provider Independent
• Compliance and Security
• Talent
(for Zalando)
77
WHY KUBERNETES?
• Efficiency
• Common Operational Model
• Developer Experience
• Cloud Provider Independent
• Compliance and Security
• Talent
(for Zalando)
78
WHY KUBERNETES?
• Efficiency
• Common Operational Model
• Developer Experience
• Cloud Provider Independent
• Compliance and Security
• Talent
(for Zalando)
79
WHY KUBERNETES?
• Efficiency
• Common Operational Model
• Developer Experience
• Cloud Provider Independent
• Compliance and Security
• Talent
(for Zalando)
80
KUBERNETES FAILURE STORIES
• Learning about production pitfalls!
• Availability bias?
https://k8s.af
81
FACTFULNESS
Things can be both better and bad!
How would failure stories for
your non-K8s infra look like?
https://k8s.af
82
COMPLEXITY FOR GOOGLE-SCALE INFRA?
• Managed DO cluster: 4 minutes
• K3s single node: 2 minutes
demo.j-serv.de
83
84
MAYBE THAT'S GOOD?
85
OPEN SOURCE & MORE
Kubernetes Web View
codeberg.org/hjacobs/kube-web-view
Skipper HTTP Router & Ingress controller
github.com/zalando/skipper
Kubernetes Janitor
github.com/hjacobs/kube-janitor
Postgres Operator
github.com/zalando-incubator/postgres-operator
More Zalando Tech Talks
github.com/zalando/public-presentations
QUESTIONS?
HENNING JACOBS
SENIOR PRINCIPAL
henning@zalando.de
@try_except_
Illustrations by @01k

More Related Content

Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise Cloud Native Summit