DevoxxUK: Optimizating Application Performance on Kubernetes
- 2. About Me
●
Architect, Runtime Cloud Optimization
●
Former Maintainer, AdoptOpenJDK Community Docker Images
●
Interested in every aspect of running Java Apps in K8s including
Cloud Native as well as Legacy migration to Cloud
●
Ex Linux Kernel and glibc hacker
Dinakar Guniguntala (@dinogun)
Runtimes Cloud Architect, Red Hat
- 3. Kubernetes is a portable,
extensible, open-source
platform for managing
containerized workloads
and services, that facilitates
both declarative configuration
… blah blah blah
Kitna Deti Hai ?*
Any questions ?
* What's the mileage ?
- 7. What is the granularity of observation ?
●
Trade-off between accurate info and overhead
Additional Operational Info
●
Quarkus Micrometer
●
Spring Actuator
●
Liberty MicroProfile
●
Node.js prom-client
Observability
- 8. BIOS
●
CPU Power and Performance Policy: <Performance>
OS / Hypervisor
●
CPU Scaling governor: <Performance>
$ cat
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
performance powersave
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Performance
Hyperthreading
●
Do not count hyperthreading while capacity planning
Don’t Forget The Hardware
- 10. Node Affinity
●
Helps to match workloads to right resources
Pod Affinity
●
Helps to schedule related pods together
Node and Pod Affinities
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: topology.kubernetes.io/zone
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: topology.kubernetes.io/zone
- 11. CPU Request / Limit Memory Request / Limit
SRE
Lower My Response Time!
Node Affinity Pod Affinity
- 12. K8s QoS
classes
Guaranteed
Burstable
BestEffort
Right Size
apiVersion: apps/v1
kind: Deployment
metadata:
name: acmeair
labels:
app: acmeair-app
spec:
replicas: 1
selector:
matchLabels:
app: acmeair-deployment
template:
metadata:
labels:
name: acmeair-deployment
app: acmeair-deployment
app.kubernetes.io/name: "acmeair-mono"
version: v1
spec:
volumes:
- name: test-volume
hostPath:
path: "/root/icp/jLogs"
type: ""
containers:
- name: acmeair-libertyapp
image: dinogun/acmeair-monolithic
imagePullPolicy: Always
ports:
- containerPort: 8080
resources:
requests:
memory: 500M
cpu: 2
limits:
memory: 1024M
cpu: 3
volumeMounts:
- name: "test-volume"
mountPath: "/opt/jLogs"
Ensure LimitRange does not
get in the way of your
deployment !
apiVersion: v1
kind: LimitRange
metadata:
name: limit-range
spec:
limits:
- default:
cpu: 1
memory: 512Mi
defaultRequest:
cpu: 0.5
memory: 256Mi
type: Container
Requests → Should cover the observed peaks
Limits → Handle any spikes !
- 13. CPU Request / Limit Memory Request / Limit
SRE
Java Heap Size / Ratio
Lower My Response Time!
Node Affinity Pod Affinity
- 14. Container Aware JVM
Use -XX:MaxRAMPercentage and
-XX:InitialRAMPercentage
instead of -Xmx and -Xms.
Heap = 2.4G
Container Mem = 3G
Container Mem = 2G Container Mem = 4G
-Xmx = 2G -Xmx = 2G -Xmx = 2G
Comparing a fixed heap size with a “MaxRAMPercentage” setting
Here “-XX:MaxRAMPercentage=80”
Don’t Hardcode the Java Heap!
Heap = 1.6G Heap = 3.2G
Beware of Default Hotspot
Settings
If container “mem < 1G”,
assumed as “client-class”
machine by the JVM and
the default is “serial GC” !
- 15. CPU Request / Limit Memory Request / Limit
SRE
Java Heap Size / Ratio
Lower My Response Time!
Node Affinity Pod Affinity
VPA HPA CA
- 16. It’s All About the Scaling
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Pods
pods:
metric:
name: packets-per-second
target:
type: AverageValue
averageValue: 1k
- type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
name: main-route
target:
type: Value
value: 10k
Set HPA with app specific metrics
- type: External
external:
metric:
name: concurrent_connections
selector: "connection=current"
target:
type: Value
Value: 1200
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: zookeeper
Use PodDisruptionBudget with CA
to ensure no service disruption
- 20. So what do we need here ?
●
Multiple stake holders to express requirements as an
“Objective Function”
●
Autonomously detect all the right options that tries to
match the “Objective Function”
●
Try options intelligently and provide a recommendation
- 22. Autotune Architecture
Example Autotune yaml
apiVersion: "recommender.com/v1"
kind: "Autotune"
metadata:
name: "quarkusapp-autotune"
namespace: "quarkusapp-autotune-ns"
spec:
slo:
objective_function: “performedChecks_total”
direction: “maximize”
slo_class: "throughput"
hpo_algo_impl: optuna_tpe
function_variables:
- name: “performedChecks_total”
query: "metrics_QuarkusApp_performedChecks_total"
datasource: "prometheus"
value_type: "double"
mode: "show"
selector:
matchLabel: "app.kubernetes.io/name"
matchLabelValue: "quarkusApp-deployment"
datasource:
name: “prometheus”
value: “prometheus_URL”
Dependency
Analyzer
Autotune
Operator
Experiment
Manager
App
Operator(s)
App Pods
(Production)
Deploy App Pods
with Experimental
Config
Config
experiment
Experiment
Results
App
Metrics
App Pods
(Training)
Incoming
App Load
Config
Recommendation
Recommendation
Manager
Metric
Providers
Tuning Sets
Search Space
Objective function
+
Tunables
(Container + Runtime +
App Server + App)
+
Ranges
optuna_tpe
Hyper-Parameter
Optimization
tpemultivariate
Hyper-Parameter
Optimization
optuna_scikit
Hyper-Parameter
Optimization
Results
Summary
Micrometer
Metrics
Layer
Info
- 24. Objective Fn: Reduce Response Time
[Layer] [Tunable] [Default, Range]
[Quarkus] quarkus.thread-pool.core-threads [1, 3-256]
[Quarkus] quarkus.thread-pool.queue-size [unbounded, 0-10000]
[Quarkus] quarkus.datasource.jdbc.min-size [0, 2-31]
[Quarkus] quarkus.datasource.jdbc.max-size [20, 32-100]
[Hotspot] FreqInlineSize [325, 325-1000]
[Hotspot] MaxInlineLevel [9, 9-50]
[Hotspot] MinInliningThreshold [250, 0-500]
[Hotspot] CompileThreshold [1500, 1000-20000]
[Hotspot] CompileThresholdScaling [1, 1-20]
[Hotspot] ConcGCThreads [0, 0-32]
[Hotspot] InlineSmallCode [1000, 500-5000]
[Hotspot] LoopUnrollLimit [50, 20-250]
[Hotspot] LoopUnrollMin [4, 0-20]
[Hotspot] MinSurvivorRatio [3, 3-48]
[Hotspot] NewRatio [2, 1-20]
[Hotspot] TieredStopAtLevel [4, 0-4]
[Hotspot] TieredCompilation [false, ]
[Hotspot] AllowParallelDefineClass [false, ]
[Hotspot] AllowVectorizeOnDemand [true, ]
[Hotspot] AlwaysCompileLoopMethods [false, ]
[Hotspot] AlwaysPreTouch [false, ]
[Hotspot] AlwaysTenure [false, ]
[Hotspot] BackgroundCompilation [true, ]
[Hotspot] DoEscapeAnalysis [true, ]
[Hotspot] UseInlineCaches [true, ]
[Hotspot] UseLoopPredicate [true, ]
[Hotspot] UseStringDeduplication [false, ]
[Hotspot] UseSuperWord [true, ]
[Hotspot] UseTypeSpeculation [true, ]
[Container] cpuRequest [None, 1-32]
[Container] memoryRequest [None, 270M-8192M]
Openshift version 4.8.13
3 Master
6 Worker
32C – 32GB
Each
RHEL 8.3
4C – 8GB
Benchmark → TechEmpower
Framework
– Quarkus RestEasy
K8s resource requests = limits
Incoming load is constant = 512 users
- 27. Summary: Better perf at a cost of higher hardware config
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-4
Autotune vs Default Config – Take 1
[ Obj Fn = Minimal Response Time ]
60% better response time 19% better throughput
- 29. Summary: Better perf but slightly higher tail latencies
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-6
Autotune vs Default Config – Take 2
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) ]
64% better response time 6% better throughput
- 31. Best perf taking into account all requirements !
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-7
Autotune vs Default Config – Take 3
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ]
62% better response time 7% better throughput
- 32. Cost for handling 1 million transactions / sec
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-7
Autotune vs Default Config – Take 3 - COST
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ]
8% cost reduction
- 33. Objective Fn: Reduce Response Time
[Layer] [Tunable] [Default, Range] Best Config (1.91 ms)
[Quarkus] quarkus.thread-pool.core-threads [1, 0-32] = 19
[Quarkus] quarkus.thread-pool.queue-size [unbounded, 0-10000] = 3700
[Quarkus] quarkus.datasource.jdbc.min-size [0, 1-12] = 10
[Quarkus] quarkus.datasource.jdbc.max-size [12, 12-90] = 86
[Hotspot] FreqInlineSize [325, 325-500] = 340
[Hotspot] MaxInlineLevel [9, 9-50] = 50
[Hotspot] MinInliningThreshold [250, 0-200] = 55
[Hotspot] CompileThreshold [1500, 1000-10000] = 6930
[Hotspot] CompileThresholdScaling [1, 1-15] = 8.3
[Hotspot] ConcGCThreads [0, 0-8] = 6
[Hotspot] InlineSmallCode [1000, 500-5000] = 1416
[Hotspot] LoopUnrollLimit [50, 20-250] = 128
[Hotspot] LoopUnrollMin [4, 0-20] = 13
[Hotspot] MinSurvivorRatio [3, 3-48] = 12
[Hotspot] NewRatio [2, 1-10] = 9
[Hotspot] TieredStopAtLevel [4, 0-4] = 4
[Hotspot] TieredCompilation [false, ] = true
[Hotspot] AllowParallelDefineClass [false, ] = false
[Hotspot] AllowVectorizeOnDemand [true, ] = true
[Hotspot] AlwaysCompileLoopMethods [false, ] = false
[Hotspot] AlwaysPreTouch [false, ] = false
[Hotspot] AlwaysTenure [false, ] = true
[Hotspot] BackgroundCompilation [true, ] = true
[Hotspot] DoEscapeAnalysis [true, ] = true
[Hotspot] UseInlineCaches [true, ] = false
[Hotspot] UseLoopPredicate [true, ] = false
[Hotspot] UseStringDeduplication [false, ] = false
[Hotspot] UseSuperWord [true, ] = true
[Hotspot] UseTypeSpeculation [true, ] = true
[Container] cpuRequest [None, 1-4] = 4
[Container] memoryRequest [None, 270M-4096M] = 3319M
Openshift version 4.8.13
3 Master
6 Worker
32C – 32GB
Each
RHEL 8.3
4C – 8GB
Benchmark → TechEmpower
Framework
– Quarkus RestEasy
K8s resource requests = limits
Incoming load is constant = 512 users
- 34. Autotune Roadmap
●
Autotune MVP expected 1H 2022
●
Currently single service only
●
For Dev / QA environments
●
Different load conditions = multiple
recommended configs
●
HPA recommendation
- 35. Summary
●
Observability is Key
●
Do not forget to tune the hardware
●
Set Node and Pod Affinities
●
Ensure requests and limits are set for all app pods and right sized
●
Do not hardcode the Java heap
●
Use app specific scaling metrics
●
Ensure no disruption with PDB
●
Check out Autotune for autonomous tuning and stay tuned(!) for
updates.
- 36. Repo’s and Contributing
●
Kruize Project - https://github.com/kruize
●
Autotune - https://github.com/kruize/autotune
●
Autotune Demo - https://github.com/kruize/autotune-demo
●
Benchmarks - https://github.com/kruize/benchmarks
●
Autotune Results - https://github.com/kruize/autotune-results
Call for collaboration !
Kruize Slack
@dinogun