SlideShare a Scribd company logo
Optimizing Application
Performance on Kubernetes
Dinakar Guniguntala @dinogun
About Me
●
Architect, Runtime Cloud Optimization
●
Former Maintainer, AdoptOpenJDK Community Docker Images
●
Interested in every aspect of running Java Apps in K8s including
Cloud Native as well as Legacy migration to Cloud
●
Ex Linux Kernel and glibc hacker
Dinakar Guniguntala (@dinogun)
Runtimes Cloud Architect, Red Hat
Kubernetes is a portable,
extensible, open-source
platform for managing
containerized workloads
and services, that facilitates
both declarative configuration
… blah blah blah
Kitna Deti Hai ?*
Any questions ?
* What's the mileage ?
●
Throughput
●
Response Time
●
Utilization
Public
Private
Public
Public
Lower My Response Time!
What is the granularity of observation ?
●
Trade-off between accurate info and overhead
Additional Operational Info
●
Quarkus Micrometer
●
Spring Actuator
●
Liberty MicroProfile
●
Node.js prom-client
Observability
BIOS
●
CPU Power and Performance Policy: <Performance>
OS / Hypervisor
●
CPU Scaling governor: <Performance>
$ cat
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
performance powersave
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Performance
Hyperthreading
●
Do not count hyperthreading while capacity planning
Don’t Forget The Hardware
Node Affinity
SRE
Lower My Response Time!
Pod Affinity
Node Affinity
●
Helps to match workloads to right resources
Pod Affinity
●
Helps to schedule related pods together
Node and Pod Affinities
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: topology.kubernetes.io/zone
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: topology.kubernetes.io/zone
CPU Request / Limit Memory Request / Limit
SRE
Lower My Response Time!
Node Affinity Pod Affinity
K8s QoS
classes
Guaranteed
Burstable
BestEffort
Right Size
apiVersion: apps/v1
kind: Deployment
metadata:
name: acmeair
labels:
app: acmeair-app
spec:
replicas: 1
selector:
matchLabels:
app: acmeair-deployment
template:
metadata:
labels:
name: acmeair-deployment
app: acmeair-deployment
app.kubernetes.io/name: "acmeair-mono"
version: v1
spec:
volumes:
- name: test-volume
hostPath:
path: "/root/icp/jLogs"
type: ""
containers:
- name: acmeair-libertyapp
image: dinogun/acmeair-monolithic
imagePullPolicy: Always
ports:
- containerPort: 8080
resources:
requests:
memory: 500M
cpu: 2
limits:
memory: 1024M
cpu: 3
volumeMounts:
- name: "test-volume"
mountPath: "/opt/jLogs"
Ensure LimitRange does not
get in the way of your
deployment !
apiVersion: v1
kind: LimitRange
metadata:
name: limit-range
spec:
limits:
- default:
cpu: 1
memory: 512Mi
defaultRequest:
cpu: 0.5
memory: 256Mi
type: Container
Requests → Should cover the observed peaks
Limits → Handle any spikes !
CPU Request / Limit Memory Request / Limit
SRE
Java Heap Size / Ratio
Lower My Response Time!
Node Affinity Pod Affinity
Container Aware JVM
Use -XX:MaxRAMPercentage and
-XX:InitialRAMPercentage
instead of -Xmx and -Xms.
Heap = 2.4G
Container Mem = 3G
Container Mem = 2G Container Mem = 4G
-Xmx = 2G -Xmx = 2G -Xmx = 2G
Comparing a fixed heap size with a “MaxRAMPercentage” setting
Here “-XX:MaxRAMPercentage=80”
Don’t Hardcode the Java Heap!
Heap = 1.6G Heap = 3.2G
Beware of Default Hotspot
Settings
If container “mem < 1G”,
assumed as “client-class”
machine by the JVM and
the default is “serial GC” !
CPU Request / Limit Memory Request / Limit
SRE
Java Heap Size / Ratio
Lower My Response Time!
Node Affinity Pod Affinity
VPA HPA CA
It’s All About the Scaling
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Pods
pods:
metric:
name: packets-per-second
target:
type: AverageValue
averageValue: 1k
- type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
name: main-route
target:
type: Value
value: 10k
Set HPA with app specific metrics
- type: External
external:
metric:
name: concurrent_connections
selector: "connection=current"
target:
type: Value
Value: 1200
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: zookeeper
Use PodDisruptionBudget with CA
to ensure no service disruption
SRE
Optimize the App Stack?
Lets Take a Step Back
Life of a SRE?!
Finance
Developer
User
So what do we need here ?
●
Multiple stake holders to express requirements as an
“Objective Function”
●
Autonomously detect all the right options that tries to
match the “Objective Function”
●
Try options intelligently and provide a recommendation
Introducing
Kruize Autotune
https://github.com/kruize/autotune
Autotune Architecture
Example Autotune yaml
apiVersion: "recommender.com/v1"
kind: "Autotune"
metadata:
name: "quarkusapp-autotune"
namespace: "quarkusapp-autotune-ns"
spec:
slo:
objective_function: “performedChecks_total”
direction: “maximize”
slo_class: "throughput"
hpo_algo_impl: optuna_tpe
function_variables:
- name: “performedChecks_total”
query: "metrics_QuarkusApp_performedChecks_total"
datasource: "prometheus"
value_type: "double"
mode: "show"
selector:
matchLabel: "app.kubernetes.io/name"
matchLabelValue: "quarkusApp-deployment"
datasource:
name: “prometheus”
value: “prometheus_URL”
Dependency
Analyzer
Autotune
Operator
Experiment
Manager
App
Operator(s)
App Pods
(Production)
Deploy App Pods
with Experimental
Config
Config
experiment
Experiment
Results
App
Metrics
App Pods
(Training)
Incoming
App Load
Config
Recommendation
Recommendation
Manager
Metric
Providers
Tuning Sets
Search Space
Objective function
+
Tunables
(Container + Runtime +
App Server + App)
+
Ranges
optuna_tpe
Hyper-Parameter
Optimization
tpemultivariate
Hyper-Parameter
Optimization
optuna_scikit
Hyper-Parameter
Optimization
Results
Summary
Micrometer
Metrics
Layer
Info
Demo
Objective Fn: Reduce Response Time
[Layer] [Tunable] [Default, Range]
[Quarkus] quarkus.thread-pool.core-threads [1, 3-256]
[Quarkus] quarkus.thread-pool.queue-size [unbounded, 0-10000]
[Quarkus] quarkus.datasource.jdbc.min-size [0, 2-31]
[Quarkus] quarkus.datasource.jdbc.max-size [20, 32-100]
[Hotspot] FreqInlineSize [325, 325-1000]
[Hotspot] MaxInlineLevel [9, 9-50]
[Hotspot] MinInliningThreshold [250, 0-500]
[Hotspot] CompileThreshold [1500, 1000-20000]
[Hotspot] CompileThresholdScaling [1, 1-20]
[Hotspot] ConcGCThreads [0, 0-32]
[Hotspot] InlineSmallCode [1000, 500-5000]
[Hotspot] LoopUnrollLimit [50, 20-250]
[Hotspot] LoopUnrollMin [4, 0-20]
[Hotspot] MinSurvivorRatio [3, 3-48]
[Hotspot] NewRatio [2, 1-20]
[Hotspot] TieredStopAtLevel [4, 0-4]
[Hotspot] TieredCompilation [false, ]
[Hotspot] AllowParallelDefineClass [false, ]
[Hotspot] AllowVectorizeOnDemand [true, ]
[Hotspot] AlwaysCompileLoopMethods [false, ]
[Hotspot] AlwaysPreTouch [false, ]
[Hotspot] AlwaysTenure [false, ]
[Hotspot] BackgroundCompilation [true, ]
[Hotspot] DoEscapeAnalysis [true, ]
[Hotspot] UseInlineCaches [true, ]
[Hotspot] UseLoopPredicate [true, ]
[Hotspot] UseStringDeduplication [false, ]
[Hotspot] UseSuperWord [true, ]
[Hotspot] UseTypeSpeculation [true, ]
[Container] cpuRequest [None, 1-32]
[Container] memoryRequest [None, 270M-8192M]
Openshift version 4.8.13
3 Master
6 Worker
32C – 32GB
Each
RHEL 8.3
4C – 8GB
Benchmark → TechEmpower
Framework
– Quarkus RestEasy
K8s resource requests = limits
Incoming load is constant = 512 users
Objective Fn: Reduce Response Time
Be careful
what you
wish for !
0.28 ms
Default
0.83 ms
Autotune vs Default Config – Take 1
[ Obj Fn = Minimal Response Time ]
Summary: Better perf at a cost of higher hardware config
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-4
Autotune vs Default Config – Take 1
[ Obj Fn = Minimal Response Time ]
60% better response time 19% better throughput
1.82 ms
Default
5.01 ms
Autotune vs Default Config – Take 2
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) ]
Summary: Better perf but slightly higher tail latencies
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-6
Autotune vs Default Config – Take 2
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) ]
64% better response time 6% better throughput
1.91 ms
Default
5.01 ms
Autotune vs Default Config – Take 3
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ]
Best perf taking into account all requirements !
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-7
Autotune vs Default Config – Take 3
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ]
62% better response time 7% better throughput
Cost for handling 1 million transactions / sec
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-7
Autotune vs Default Config – Take 3 - COST
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ]
8% cost reduction
Objective Fn: Reduce Response Time
[Layer] [Tunable] [Default, Range] Best Config (1.91 ms)
[Quarkus] quarkus.thread-pool.core-threads [1, 0-32] = 19
[Quarkus] quarkus.thread-pool.queue-size [unbounded, 0-10000] = 3700
[Quarkus] quarkus.datasource.jdbc.min-size [0, 1-12] = 10
[Quarkus] quarkus.datasource.jdbc.max-size [12, 12-90] = 86
[Hotspot] FreqInlineSize [325, 325-500] = 340
[Hotspot] MaxInlineLevel [9, 9-50] = 50
[Hotspot] MinInliningThreshold [250, 0-200] = 55
[Hotspot] CompileThreshold [1500, 1000-10000] = 6930
[Hotspot] CompileThresholdScaling [1, 1-15] = 8.3
[Hotspot] ConcGCThreads [0, 0-8] = 6
[Hotspot] InlineSmallCode [1000, 500-5000] = 1416
[Hotspot] LoopUnrollLimit [50, 20-250] = 128
[Hotspot] LoopUnrollMin [4, 0-20] = 13
[Hotspot] MinSurvivorRatio [3, 3-48] = 12
[Hotspot] NewRatio [2, 1-10] = 9
[Hotspot] TieredStopAtLevel [4, 0-4] = 4
[Hotspot] TieredCompilation [false, ] = true
[Hotspot] AllowParallelDefineClass [false, ] = false
[Hotspot] AllowVectorizeOnDemand [true, ] = true
[Hotspot] AlwaysCompileLoopMethods [false, ] = false
[Hotspot] AlwaysPreTouch [false, ] = false
[Hotspot] AlwaysTenure [false, ] = true
[Hotspot] BackgroundCompilation [true, ] = true
[Hotspot] DoEscapeAnalysis [true, ] = true
[Hotspot] UseInlineCaches [true, ] = false
[Hotspot] UseLoopPredicate [true, ] = false
[Hotspot] UseStringDeduplication [false, ] = false
[Hotspot] UseSuperWord [true, ] = true
[Hotspot] UseTypeSpeculation [true, ] = true
[Container] cpuRequest [None, 1-4] = 4
[Container] memoryRequest [None, 270M-4096M] = 3319M
Openshift version 4.8.13
3 Master
6 Worker
32C – 32GB
Each
RHEL 8.3
4C – 8GB
Benchmark → TechEmpower
Framework
– Quarkus RestEasy
K8s resource requests = limits
Incoming load is constant = 512 users
Autotune Roadmap
●
Autotune MVP expected 1H 2022
●
Currently single service only
●
For Dev / QA environments
●
Different load conditions = multiple
recommended configs
●
HPA recommendation
Summary
●
Observability is Key
●
Do not forget to tune the hardware
●
Set Node and Pod Affinities
●
Ensure requests and limits are set for all app pods and right sized
●
Do not hardcode the Java heap
●
Use app specific scaling metrics
●
Ensure no disruption with PDB
●
Check out Autotune for autonomous tuning and stay tuned(!) for
updates.
Repo’s and Contributing
●
Kruize Project - https://github.com/kruize
●
Autotune - https://github.com/kruize/autotune
●
Autotune Demo - https://github.com/kruize/autotune-demo
●
Benchmarks - https://github.com/kruize/benchmarks
●
Autotune Results - https://github.com/kruize/autotune-results
Call for collaboration !
Kruize Slack
@dinogun
Questions

More Related Content

DevoxxUK: Optimizating Application Performance on Kubernetes

  • 1. Optimizing Application Performance on Kubernetes Dinakar Guniguntala @dinogun
  • 2. About Me ● Architect, Runtime Cloud Optimization ● Former Maintainer, AdoptOpenJDK Community Docker Images ● Interested in every aspect of running Java Apps in K8s including Cloud Native as well as Legacy migration to Cloud ● Ex Linux Kernel and glibc hacker Dinakar Guniguntala (@dinogun) Runtimes Cloud Architect, Red Hat
  • 3. Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration … blah blah blah Kitna Deti Hai ?* Any questions ? * What's the mileage ?
  • 7. What is the granularity of observation ? ● Trade-off between accurate info and overhead Additional Operational Info ● Quarkus Micrometer ● Spring Actuator ● Liberty MicroProfile ● Node.js prom-client Observability
  • 8. BIOS ● CPU Power and Performance Policy: <Performance> OS / Hypervisor ● CPU Scaling governor: <Performance> $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors performance powersave $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor Performance Hyperthreading ● Do not count hyperthreading while capacity planning Don’t Forget The Hardware
  • 9. Node Affinity SRE Lower My Response Time! Pod Affinity
  • 10. Node Affinity ● Helps to match workloads to right resources Pod Affinity ● Helps to schedule related pods together Node and Pod Affinities spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: topology.kubernetes.io/zone podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S2 topologyKey: topology.kubernetes.io/zone
  • 11. CPU Request / Limit Memory Request / Limit SRE Lower My Response Time! Node Affinity Pod Affinity
  • 12. K8s QoS classes Guaranteed Burstable BestEffort Right Size apiVersion: apps/v1 kind: Deployment metadata: name: acmeair labels: app: acmeair-app spec: replicas: 1 selector: matchLabels: app: acmeair-deployment template: metadata: labels: name: acmeair-deployment app: acmeair-deployment app.kubernetes.io/name: "acmeair-mono" version: v1 spec: volumes: - name: test-volume hostPath: path: "/root/icp/jLogs" type: "" containers: - name: acmeair-libertyapp image: dinogun/acmeair-monolithic imagePullPolicy: Always ports: - containerPort: 8080 resources: requests: memory: 500M cpu: 2 limits: memory: 1024M cpu: 3 volumeMounts: - name: "test-volume" mountPath: "/opt/jLogs" Ensure LimitRange does not get in the way of your deployment ! apiVersion: v1 kind: LimitRange metadata: name: limit-range spec: limits: - default: cpu: 1 memory: 512Mi defaultRequest: cpu: 0.5 memory: 256Mi type: Container Requests → Should cover the observed peaks Limits → Handle any spikes !
  • 13. CPU Request / Limit Memory Request / Limit SRE Java Heap Size / Ratio Lower My Response Time! Node Affinity Pod Affinity
  • 14. Container Aware JVM Use -XX:MaxRAMPercentage and -XX:InitialRAMPercentage instead of -Xmx and -Xms. Heap = 2.4G Container Mem = 3G Container Mem = 2G Container Mem = 4G -Xmx = 2G -Xmx = 2G -Xmx = 2G Comparing a fixed heap size with a “MaxRAMPercentage” setting Here “-XX:MaxRAMPercentage=80” Don’t Hardcode the Java Heap! Heap = 1.6G Heap = 3.2G Beware of Default Hotspot Settings If container “mem < 1G”, assumed as “client-class” machine by the JVM and the default is “serial GC” !
  • 15. CPU Request / Limit Memory Request / Limit SRE Java Heap Size / Ratio Lower My Response Time! Node Affinity Pod Affinity VPA HPA CA
  • 16. It’s All About the Scaling apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: php-apache spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: php-apache minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 - type: Pods pods: metric: name: packets-per-second target: type: AverageValue averageValue: 1k - type: Object object: metric: name: requests-per-second describedObject: apiVersion: networking.k8s.io/v1beta1 kind: Ingress name: main-route target: type: Value value: 10k Set HPA with app specific metrics - type: External external: metric: name: concurrent_connections selector: "connection=current" target: type: Value Value: 1200 apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: zk-pdb spec: maxUnavailable: 1 selector: matchLabels: app: zookeeper Use PodDisruptionBudget with CA to ensure no service disruption
  • 18. Lets Take a Step Back
  • 19. Life of a SRE?! Finance Developer User
  • 20. So what do we need here ? ● Multiple stake holders to express requirements as an “Objective Function” ● Autonomously detect all the right options that tries to match the “Objective Function” ● Try options intelligently and provide a recommendation
  • 22. Autotune Architecture Example Autotune yaml apiVersion: "recommender.com/v1" kind: "Autotune" metadata: name: "quarkusapp-autotune" namespace: "quarkusapp-autotune-ns" spec: slo: objective_function: “performedChecks_total” direction: “maximize” slo_class: "throughput" hpo_algo_impl: optuna_tpe function_variables: - name: “performedChecks_total” query: "metrics_QuarkusApp_performedChecks_total" datasource: "prometheus" value_type: "double" mode: "show" selector: matchLabel: "app.kubernetes.io/name" matchLabelValue: "quarkusApp-deployment" datasource: name: “prometheus” value: “prometheus_URL” Dependency Analyzer Autotune Operator Experiment Manager App Operator(s) App Pods (Production) Deploy App Pods with Experimental Config Config experiment Experiment Results App Metrics App Pods (Training) Incoming App Load Config Recommendation Recommendation Manager Metric Providers Tuning Sets Search Space Objective function + Tunables (Container + Runtime + App Server + App) + Ranges optuna_tpe Hyper-Parameter Optimization tpemultivariate Hyper-Parameter Optimization optuna_scikit Hyper-Parameter Optimization Results Summary Micrometer Metrics Layer Info
  • 23. Demo
  • 24. Objective Fn: Reduce Response Time [Layer] [Tunable] [Default, Range] [Quarkus] quarkus.thread-pool.core-threads [1, 3-256] [Quarkus] quarkus.thread-pool.queue-size [unbounded, 0-10000] [Quarkus] quarkus.datasource.jdbc.min-size [0, 2-31] [Quarkus] quarkus.datasource.jdbc.max-size [20, 32-100] [Hotspot] FreqInlineSize [325, 325-1000] [Hotspot] MaxInlineLevel [9, 9-50] [Hotspot] MinInliningThreshold [250, 0-500] [Hotspot] CompileThreshold [1500, 1000-20000] [Hotspot] CompileThresholdScaling [1, 1-20] [Hotspot] ConcGCThreads [0, 0-32] [Hotspot] InlineSmallCode [1000, 500-5000] [Hotspot] LoopUnrollLimit [50, 20-250] [Hotspot] LoopUnrollMin [4, 0-20] [Hotspot] MinSurvivorRatio [3, 3-48] [Hotspot] NewRatio [2, 1-20] [Hotspot] TieredStopAtLevel [4, 0-4] [Hotspot] TieredCompilation [false, ] [Hotspot] AllowParallelDefineClass [false, ] [Hotspot] AllowVectorizeOnDemand [true, ] [Hotspot] AlwaysCompileLoopMethods [false, ] [Hotspot] AlwaysPreTouch [false, ] [Hotspot] AlwaysTenure [false, ] [Hotspot] BackgroundCompilation [true, ] [Hotspot] DoEscapeAnalysis [true, ] [Hotspot] UseInlineCaches [true, ] [Hotspot] UseLoopPredicate [true, ] [Hotspot] UseStringDeduplication [false, ] [Hotspot] UseSuperWord [true, ] [Hotspot] UseTypeSpeculation [true, ] [Container] cpuRequest [None, 1-32] [Container] memoryRequest [None, 270M-8192M] Openshift version 4.8.13 3 Master 6 Worker 32C – 32GB Each RHEL 8.3 4C – 8GB Benchmark → TechEmpower Framework – Quarkus RestEasy K8s resource requests = limits Incoming load is constant = 512 users
  • 25. Objective Fn: Reduce Response Time Be careful what you wish for !
  • 26. 0.28 ms Default 0.83 ms Autotune vs Default Config – Take 1 [ Obj Fn = Minimal Response Time ]
  • 27. Summary: Better perf at a cost of higher hardware config For full results please see https://github.com/kruize/autotune-results/tree/main/techempower/experiment-4 Autotune vs Default Config – Take 1 [ Obj Fn = Minimal Response Time ] 60% better response time 19% better throughput
  • 28. 1.82 ms Default 5.01 ms Autotune vs Default Config – Take 2 [ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) ]
  • 29. Summary: Better perf but slightly higher tail latencies For full results please see https://github.com/kruize/autotune-results/tree/main/techempower/experiment-6 Autotune vs Default Config – Take 2 [ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) ] 64% better response time 6% better throughput
  • 30. 1.91 ms Default 5.01 ms Autotune vs Default Config – Take 3 [ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ]
  • 31. Best perf taking into account all requirements ! For full results please see https://github.com/kruize/autotune-results/tree/main/techempower/experiment-7 Autotune vs Default Config – Take 3 [ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ] 62% better response time 7% better throughput
  • 32. Cost for handling 1 million transactions / sec For full results please see https://github.com/kruize/autotune-results/tree/main/techempower/experiment-7 Autotune vs Default Config – Take 3 - COST [ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ] 8% cost reduction
  • 33. Objective Fn: Reduce Response Time [Layer] [Tunable] [Default, Range] Best Config (1.91 ms) [Quarkus] quarkus.thread-pool.core-threads [1, 0-32] = 19 [Quarkus] quarkus.thread-pool.queue-size [unbounded, 0-10000] = 3700 [Quarkus] quarkus.datasource.jdbc.min-size [0, 1-12] = 10 [Quarkus] quarkus.datasource.jdbc.max-size [12, 12-90] = 86 [Hotspot] FreqInlineSize [325, 325-500] = 340 [Hotspot] MaxInlineLevel [9, 9-50] = 50 [Hotspot] MinInliningThreshold [250, 0-200] = 55 [Hotspot] CompileThreshold [1500, 1000-10000] = 6930 [Hotspot] CompileThresholdScaling [1, 1-15] = 8.3 [Hotspot] ConcGCThreads [0, 0-8] = 6 [Hotspot] InlineSmallCode [1000, 500-5000] = 1416 [Hotspot] LoopUnrollLimit [50, 20-250] = 128 [Hotspot] LoopUnrollMin [4, 0-20] = 13 [Hotspot] MinSurvivorRatio [3, 3-48] = 12 [Hotspot] NewRatio [2, 1-10] = 9 [Hotspot] TieredStopAtLevel [4, 0-4] = 4 [Hotspot] TieredCompilation [false, ] = true [Hotspot] AllowParallelDefineClass [false, ] = false [Hotspot] AllowVectorizeOnDemand [true, ] = true [Hotspot] AlwaysCompileLoopMethods [false, ] = false [Hotspot] AlwaysPreTouch [false, ] = false [Hotspot] AlwaysTenure [false, ] = true [Hotspot] BackgroundCompilation [true, ] = true [Hotspot] DoEscapeAnalysis [true, ] = true [Hotspot] UseInlineCaches [true, ] = false [Hotspot] UseLoopPredicate [true, ] = false [Hotspot] UseStringDeduplication [false, ] = false [Hotspot] UseSuperWord [true, ] = true [Hotspot] UseTypeSpeculation [true, ] = true [Container] cpuRequest [None, 1-4] = 4 [Container] memoryRequest [None, 270M-4096M] = 3319M Openshift version 4.8.13 3 Master 6 Worker 32C – 32GB Each RHEL 8.3 4C – 8GB Benchmark → TechEmpower Framework – Quarkus RestEasy K8s resource requests = limits Incoming load is constant = 512 users
  • 34. Autotune Roadmap ● Autotune MVP expected 1H 2022 ● Currently single service only ● For Dev / QA environments ● Different load conditions = multiple recommended configs ● HPA recommendation
  • 35. Summary ● Observability is Key ● Do not forget to tune the hardware ● Set Node and Pod Affinities ● Ensure requests and limits are set for all app pods and right sized ● Do not hardcode the Java heap ● Use app specific scaling metrics ● Ensure no disruption with PDB ● Check out Autotune for autonomous tuning and stay tuned(!) for updates.
  • 36. Repo’s and Contributing ● Kruize Project - https://github.com/kruize ● Autotune - https://github.com/kruize/autotune ● Autotune Demo - https://github.com/kruize/autotune-demo ● Benchmarks - https://github.com/kruize/benchmarks ● Autotune Results - https://github.com/kruize/autotune-results Call for collaboration ! Kruize Slack @dinogun