SlideShare a Scribd company logo
KUBERNETES AND OPENSTACK AT SCALE
Will it blend?
Stephen Gordon (@xsgordon)
Principal Product Manager, Red Hat
May 8th, 2017
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT2
ONCE UPON A TIME...
Part 1
● 1000 OpenShift Container Platform 3.3 / Kubernetes 1.3
nodes on OpenStack infrastructure
● Presented methodology and results in Barcelona:
○ https://www.cncf.io/blog/2016/08/23/deploying-1000-
nodes-of-openshift-on-the-cncf-cluster-part-1/
● Goals were:
○ Push limits
○ Identify best practices
○ Document best practices
○ Fix issues
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT3
FOR OUR NEXT TRICK!
Part 2
● Goals:
○ 2048 OpenShift Container Platform 3.5 / Kubernetes 1.5
nodes on OpenStack infrastructure
○ Network ingress tier saturation test
○ Overlay2 graph driver w/ SELinux test
○ Persistent volume scalability and performance test of
Container Native Storage (glusterfs)
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT4
KUBERNETES SCALABILITY SIG
Scalability SIG SLAs:
● API responsiveness
○ 99% of calls return in < 1 s
● Pod startup time
○ 99% of pods start within 5s*
Also define a number of other primary and
derived metrics.
* With pre-pulled images
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT5
A CONTAINER STACK FOR OPENSTACK
OPENSTACK KUBERNETES
+
A wild solution appears...
Consumption of resources
Able to easily access new environments to
quickly build new apps and move on
Exposition of resources
Provide necessary environments to developers
in minutes, not weeks or months
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT6
A CONTAINER STACK FOR OPENSTACK
A wild solution appears...
OPENSTACK OPENSHIFT
+
Consumption of resources
Integrated platform to run, orchestrate,
monitor, and scale containers. Built around
Kubernetes and Docker.
Exposition of resources
Provide necessary environments to developers
in minutes, not weeks or months
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT7
CONCEPTUAL ARCHITECTURE
Architectural tenets:
● Technical
independence
● Contextual awareness
● Avoiding redundancy
● Simplified management
Reference architecture:
red.ht/2ibNmvX
PREPARATION
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT9
WHERE TO TEST?
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
1
0
HOW TO TEST?
System Verification Test suite (SVT)
● Red Hat OpenShift Performance and Scalability team’s
upstream test suites:
○ Application Performance
○ Application Scalability
○ OpenShift Performance
○ OpenShift Scalability (incl. cluster-loader)
○ Networking Performance
○ Reliability/Longevity
● Also includes some additional tools e.g. image provisioner
● https://github.com/openshift/svt
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
1
1
ARCHITECTURE
Baremetal Cluster (100 nodes)
OpenShift-on-OpenStack Cluster (2048 nodes)
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
1
2
ARCHITECTURE (cont.)
● Software:
○ Red Hat OpenStack Platform 10, based on “Newton”
○ OpenShift Container Platform 3.5 (built around K8S 1.5)
○ Red Hat Enterprise Linux 7.3 (mostly…)
● Deployment:
○ Deployed OpenStack + Ceph using TripleO
○ Deployed OpenShift Container Platform using openshift-ansible.
● Applying previous learnings
○ Storage architecture
○ Image formatting
○ Pre-baked images (see image_provisioner tool)
NETWORK INGRESS/ROUTING
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
1
4
NETWORK INGRESS/ROUTING TIER
Testing HAProxy Performance
● Load generator itself runs
in a pod.
● Added SNI and TLS variants
to the test suite.
● Configuration by passing in
configmaps.
● Focused in on HTTP with
keepalive and TLS
terminated at the edge.
projects:
- num: 1
basename: centos-stress
ifexists: delete
tuning: default
templates:
- num: 1
file: ./content/quickstarts/stress/stress-pod.json
parameters:
- RUN: "wrk" # which app to execute inside WLG pod
- RUN_TIME: "120" # benchmark run-time in seconds
- PLACEMENT: "test" # Placement of the WLG pods based on node label
- WRK_DELAY: "100" # maximum delay between client requests in ms
- WRK_TARGETS: "^cakephp-" # extended RE (egrep) to filter target routes
- WRK_CONNS_PER_THREAD: "1" # how many connections per worker thread/route
- WRK_KEEPALIVE: "y" # use HTTP keepalive [yn]
- WRK_TLS_SESSION_REUSE: "y" # use TLS session reuse [yn]
- URL_PATH: "/" # target path for HTTP(S) requests
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
1
5
NETWORK INGRESS/ROUTING TIER
Testing HAProxy Performance (cont.)
● 1p-mix-cpu*: nbproc=1, run on any CPU
● 1p-mix-cpu0: nbproc=1, run on core 0
● 1p-mix-cpu1: nbproc=1, run on core 1
● 1p-mix-cpu2: nbproc=1, run on core 2
● 1p-mix-cpu3: nbproc=1, run on core 3
● 1p-mix-mc10x: nbproc=1, run on any core,
sched_migration_cost=5000000
● 2p-mix-cpu*: nbproc=2, run on any core
● 4p-mix-cpu02: nbproc=4, run on core 2
NETWORK
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
1
7
NETWORK PERFORMANCE
Testing OpenShift-sdn (OVS+VXLAN) Performance
● OpenShift includes and uses OpenShift-sdn (OpenvSwitch + VXLAN) by
default:
○ Provides full multi-tenancy
○ Is fully pluggable (as is ingress/routing tier)
○ Supports all four footprints (physical/virtual/private/public)
● Web-based workloads are mostly transactional
● Focused microbenchmark on a ping-pong test of varying payload sizes
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
1
8
NETWORK PERFORMANCE
Testing OpenShift-sdn (OVS+VXLAN) Performance (cont.)
● Tested mix of payload sizes
and stream counts.
● tcp_rr-XXB-Yi
○ XX = # of bytes
○ Y = # of instances
(streams)
● Slimmed down version of
RFC2544
STORAGE
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
2
0
OVERLAY2 w/ SELINUX
Next on storage wars...
● Until recently RHEL used Device Mapper for docker’s storage graph driver
○ Overlay support added in RHEL 7.2
○ Overlay2 supported added in RHEL 7.3
○ Overlay2 support w/ SELinux added upstream and expected in RHEL 7.4
■ https://lkml.org/lkml/2016/7/5/409
○ Device Mapper remains default in RHEL for now, Overlay2 default in Fedora
26
■ https://fedoraproject.org/wiki/Changes/DockerOverlay2
● Let’s try it out!
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
2
1
OVERLAY2 w/ SELINUX
Results
● Single base
image for all
pods
● 240 pods on
the node
(rate limited
creation)
● Reasonable
memory
savings
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
2
2
OVERLAY2 w/ SELINUX
Results
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
2
3
CONTAINER NATIVE STORAGE
Approach
● OpenShift Container Platform supports a wide variety of volume providers
via the standard Kubernetes volume interface
● Red Hat Container Native Storage is a Gluster-based persistent volume
provider deployed on OpenShift
● Used the NVMe disks as “bricks” for Gluster, exposed 1G persistent
volumes
● Container Native Storage nodes marked unschedulable for other OpenShift
pods
● Ran throughput numbers for create/delete operations, as well as API
parallelism
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
2
4
CONTAINER NATIVE STORAGE
Results
● CNS allocated
volumes in constant
time
● Consistent with
results for other
persistent volume
providers
NEXT STEPS
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
2
6
NEXT STEPS
To infinity, and beyond!
● Filed 40+ bugs across a variety of projects and components
● Scaling and Performance Guide, new with OpenShift Container Platform
3.5
● Getting Involved
○ “Kubernetes Ops on OpenStack” forum session
■ Wednesday, May 10, 1:50pm-2:30pm
■ Hynes Convention Center MR102
○ K8S SIG Scalability
○ K8S SIG OpenStack
KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT
2
7
REFERENCES
● Part 1: https://www.cncf.io/blog/2016/08/23/deploying-1000-nodes-of-
openshift-on-the-cncf-cluster-part-1/
● Part 2: https://www.cncf.io/blog/2017/03/28/deploying-2048-openshift-
nodes-cncf-cluster-part-2/
● Overlay2 and Device Mapper
https://developers.redhat.com/blog/2016/10/25/docker-project-can-
you-have-overlay2-speed-and-density-with-devicemapper-yep/
● Red Hat Performance and Scale Trello:
https://trello.com/b/M1bpo55E/scalability
THANK YOU
plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHatNews

More Related Content

Kubernetes and OpenStack at Scale

  • 1. KUBERNETES AND OPENSTACK AT SCALE Will it blend? Stephen Gordon (@xsgordon) Principal Product Manager, Red Hat May 8th, 2017
  • 2. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT2 ONCE UPON A TIME... Part 1 ● 1000 OpenShift Container Platform 3.3 / Kubernetes 1.3 nodes on OpenStack infrastructure ● Presented methodology and results in Barcelona: ○ https://www.cncf.io/blog/2016/08/23/deploying-1000- nodes-of-openshift-on-the-cncf-cluster-part-1/ ● Goals were: ○ Push limits ○ Identify best practices ○ Document best practices ○ Fix issues
  • 3. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT3 FOR OUR NEXT TRICK! Part 2 ● Goals: ○ 2048 OpenShift Container Platform 3.5 / Kubernetes 1.5 nodes on OpenStack infrastructure ○ Network ingress tier saturation test ○ Overlay2 graph driver w/ SELinux test ○ Persistent volume scalability and performance test of Container Native Storage (glusterfs)
  • 4. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT4 KUBERNETES SCALABILITY SIG Scalability SIG SLAs: ● API responsiveness ○ 99% of calls return in < 1 s ● Pod startup time ○ 99% of pods start within 5s* Also define a number of other primary and derived metrics. * With pre-pulled images
  • 5. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT5 A CONTAINER STACK FOR OPENSTACK OPENSTACK KUBERNETES + A wild solution appears... Consumption of resources Able to easily access new environments to quickly build new apps and move on Exposition of resources Provide necessary environments to developers in minutes, not weeks or months
  • 6. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT6 A CONTAINER STACK FOR OPENSTACK A wild solution appears... OPENSTACK OPENSHIFT + Consumption of resources Integrated platform to run, orchestrate, monitor, and scale containers. Built around Kubernetes and Docker. Exposition of resources Provide necessary environments to developers in minutes, not weeks or months
  • 7. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT7 CONCEPTUAL ARCHITECTURE Architectural tenets: ● Technical independence ● Contextual awareness ● Avoiding redundancy ● Simplified management Reference architecture: red.ht/2ibNmvX
  • 9. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT9 WHERE TO TEST?
  • 10. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 1 0 HOW TO TEST? System Verification Test suite (SVT) ● Red Hat OpenShift Performance and Scalability team’s upstream test suites: ○ Application Performance ○ Application Scalability ○ OpenShift Performance ○ OpenShift Scalability (incl. cluster-loader) ○ Networking Performance ○ Reliability/Longevity ● Also includes some additional tools e.g. image provisioner ● https://github.com/openshift/svt
  • 11. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 1 1 ARCHITECTURE Baremetal Cluster (100 nodes) OpenShift-on-OpenStack Cluster (2048 nodes)
  • 12. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 1 2 ARCHITECTURE (cont.) ● Software: ○ Red Hat OpenStack Platform 10, based on “Newton” ○ OpenShift Container Platform 3.5 (built around K8S 1.5) ○ Red Hat Enterprise Linux 7.3 (mostly…) ● Deployment: ○ Deployed OpenStack + Ceph using TripleO ○ Deployed OpenShift Container Platform using openshift-ansible. ● Applying previous learnings ○ Storage architecture ○ Image formatting ○ Pre-baked images (see image_provisioner tool)
  • 14. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 1 4 NETWORK INGRESS/ROUTING TIER Testing HAProxy Performance ● Load generator itself runs in a pod. ● Added SNI and TLS variants to the test suite. ● Configuration by passing in configmaps. ● Focused in on HTTP with keepalive and TLS terminated at the edge. projects: - num: 1 basename: centos-stress ifexists: delete tuning: default templates: - num: 1 file: ./content/quickstarts/stress/stress-pod.json parameters: - RUN: "wrk" # which app to execute inside WLG pod - RUN_TIME: "120" # benchmark run-time in seconds - PLACEMENT: "test" # Placement of the WLG pods based on node label - WRK_DELAY: "100" # maximum delay between client requests in ms - WRK_TARGETS: "^cakephp-" # extended RE (egrep) to filter target routes - WRK_CONNS_PER_THREAD: "1" # how many connections per worker thread/route - WRK_KEEPALIVE: "y" # use HTTP keepalive [yn] - WRK_TLS_SESSION_REUSE: "y" # use TLS session reuse [yn] - URL_PATH: "/" # target path for HTTP(S) requests
  • 15. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 1 5 NETWORK INGRESS/ROUTING TIER Testing HAProxy Performance (cont.) ● 1p-mix-cpu*: nbproc=1, run on any CPU ● 1p-mix-cpu0: nbproc=1, run on core 0 ● 1p-mix-cpu1: nbproc=1, run on core 1 ● 1p-mix-cpu2: nbproc=1, run on core 2 ● 1p-mix-cpu3: nbproc=1, run on core 3 ● 1p-mix-mc10x: nbproc=1, run on any core, sched_migration_cost=5000000 ● 2p-mix-cpu*: nbproc=2, run on any core ● 4p-mix-cpu02: nbproc=4, run on core 2
  • 17. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 1 7 NETWORK PERFORMANCE Testing OpenShift-sdn (OVS+VXLAN) Performance ● OpenShift includes and uses OpenShift-sdn (OpenvSwitch + VXLAN) by default: ○ Provides full multi-tenancy ○ Is fully pluggable (as is ingress/routing tier) ○ Supports all four footprints (physical/virtual/private/public) ● Web-based workloads are mostly transactional ● Focused microbenchmark on a ping-pong test of varying payload sizes
  • 18. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 1 8 NETWORK PERFORMANCE Testing OpenShift-sdn (OVS+VXLAN) Performance (cont.) ● Tested mix of payload sizes and stream counts. ● tcp_rr-XXB-Yi ○ XX = # of bytes ○ Y = # of instances (streams) ● Slimmed down version of RFC2544
  • 20. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 2 0 OVERLAY2 w/ SELINUX Next on storage wars... ● Until recently RHEL used Device Mapper for docker’s storage graph driver ○ Overlay support added in RHEL 7.2 ○ Overlay2 supported added in RHEL 7.3 ○ Overlay2 support w/ SELinux added upstream and expected in RHEL 7.4 ■ https://lkml.org/lkml/2016/7/5/409 ○ Device Mapper remains default in RHEL for now, Overlay2 default in Fedora 26 ■ https://fedoraproject.org/wiki/Changes/DockerOverlay2 ● Let’s try it out!
  • 21. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 2 1 OVERLAY2 w/ SELINUX Results ● Single base image for all pods ● 240 pods on the node (rate limited creation) ● Reasonable memory savings
  • 22. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 2 2 OVERLAY2 w/ SELINUX Results
  • 23. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 2 3 CONTAINER NATIVE STORAGE Approach ● OpenShift Container Platform supports a wide variety of volume providers via the standard Kubernetes volume interface ● Red Hat Container Native Storage is a Gluster-based persistent volume provider deployed on OpenShift ● Used the NVMe disks as “bricks” for Gluster, exposed 1G persistent volumes ● Container Native Storage nodes marked unschedulable for other OpenShift pods ● Ran throughput numbers for create/delete operations, as well as API parallelism
  • 24. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 2 4 CONTAINER NATIVE STORAGE Results ● CNS allocated volumes in constant time ● Consistent with results for other persistent volume providers
  • 26. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 2 6 NEXT STEPS To infinity, and beyond! ● Filed 40+ bugs across a variety of projects and components ● Scaling and Performance Guide, new with OpenShift Container Platform 3.5 ● Getting Involved ○ “Kubernetes Ops on OpenStack” forum session ■ Wednesday, May 10, 1:50pm-2:30pm ■ Hynes Convention Center MR102 ○ K8S SIG Scalability ○ K8S SIG OpenStack
  • 27. KUBERNETES AND OPENSTACK AT SCALE #OPENSTACKSUMMIT #REDHAT 2 7 REFERENCES ● Part 1: https://www.cncf.io/blog/2016/08/23/deploying-1000-nodes-of- openshift-on-the-cncf-cluster-part-1/ ● Part 2: https://www.cncf.io/blog/2017/03/28/deploying-2048-openshift- nodes-cncf-cluster-part-2/ ● Overlay2 and Device Mapper https://developers.redhat.com/blog/2016/10/25/docker-project-can- you-have-overlay2-speed-and-density-with-devicemapper-yep/ ● Red Hat Performance and Scale Trello: https://trello.com/b/M1bpo55E/scalability

Editor's Notes

  1. Goals were: Push system to it’s limit, incl. Ensuring we can reproduce work done in the community with kubernetes upstream incl. SIG Scalability (will come to this in a minute) Identify config changes and best practices to increase capacity and performance Document and file issues upstream and send patches where applicable
  2. Saturation test for OpenShift’s HAProxy-based network ingress tier Overlay2 graph driver and SELinux support from kernel v4.9 Persistent volume scalability and performance using Red Hat’ Container-Native Storage (CNS) product (gluster based) Saturation test for OpenShift’s integrated container registry and CI/CD pipeline
  3. Primary metrics include: Max cores per cluster Max pods per core Management overhead per node Management overhead per cluster Derived metrics include: Max cores per node Max pods per machine Max machines per cluster Max pods per cluster End-to-end pod startup time Scheduler throughput Max cluster saturation time Pre-pulled images because high degree of variability introduced (network throughput, size of image, etc.) between images that are unrelated to k8s performance.
  4. Why IaaS and PaaS Exposition versus Consumption Current state (VMs) versus future state (BM) Culture/people challenges (developer versus operations, who is driving) Isolation concerns Scaling concerns OpenStack Open source cloud computing platform for building massively scalable clouds. Kubernetes Open source system for automating deployment, scaling and management of containerized applications. Provides framework for building distributed platforms. Kubernetes container management/orchestration Red Hat is the biggest contributor outside of Google How did Red Hat end up on the Kubernetes horse? We bet on a simple idea: that an open source community is the best place to build the future of application orchestration, and that only an open source community could successfully integrate the diverse range of capabilities necessary to succeed.
  5. OpenShift An integrated infrastructure platform to run, orchestrate, monitor and scale containers. Built around Kubernetes and Docker. OpenShift application platform Acquired Makara in Nov 2010 OpenShift Origin launched in Apr 2012 Docker Open Source Mar 2013 First Kubernetes commit on github Jun 2014 OpenShift v3 re-architected around Docker and Kubernetes Jun 2015 building on operational experiences obtained by OpenShift Online team with v2. LDK!
  6. Sandwhich: Your applications OpenShift masters, nodes, registry Infrastructure services (LBaaS, Neutron, Nova, Cinder, etc.) Architectural tenets: Technical independence: Ensure that containers are defined such that they remain independent of the underlying infrastructure. Containers must continue to be portable across host environments. Contextual awareness: Allow containers to easily take advantage of OpenStack shared services beyond compute (i.e. networking and storage). To do this, Red Hat Atomic Enterprise (and other Red Hat container offerings) must be context aware. Avoid redundancy: Limit redundancies where possible to minimize performance and other resource hits. This includes limiting the number of layers between the container and the hardware. Simplified management: Simplify management by delivering a holistic, integrated view across platforms. Currently contextual awareness comes via the cloud provider implementation (all or nothing) Expect to see increased experimentation with using services piecemeal/a la carte (e.g. Cinder) Storage: Container hosts consume OpenStack storage Tenant isolation Application storage managed by Kubernetes Stateful applications Containerized distributed storage services Networking: Use OpenShift-SDN to have full application isolation but get double encapsulation when using Neutron with GRE or VXLAN tunnels. Tenant isolation via OpenStack SDN using Kuryr eventually Use Flannel with host-gw backend to avoid double encapsulation. Load Balancing provided by LBaaS V1 by default. Other options: External load balancer (recommended for production) Dedicated load balancer node - create a dedicated node for HAProxy. Good for demo/test but no HA. None - if using single master node. Authenticate OpenShift users using LDAP.
  7. Re-validate Kubernetes SIG Scalability findings on equivalent OpenShift Container Platform release.
  8. The CNCF cluster is made up of 1000 nodes deployed at Switch, Las Vegas by Intel for the use of the CNCF community. We were using ~300 NVMe storage will come in handy later
  9. Not product supported application_performance: JMeter-based performance testing of applications hosted on OpenShift. applications_scalability: Performance and scalability testing of the OpenShift web UI. conformance: Wrappers to run a subset of e2e/conformance tests in an SVT environment (work in progress) image_provisioner: Ansible playbooks for building AMI and qcow2 images with OpenShift rpms and Docker images baked in. networking: Performance tests for the OpenShift SDN and kube-proxy. openshift_performance: Performance tests for container build parallelism, projects and persistent storage (EBS, Ceph, Gluster and NFS) openshift_scalability: Home of the infamous "cluster-loader", details in openshift_scalability/README.md reliability: Run tests over long periods of time (weeks), cycle object quantity up and down.
  10. Why both? For the foreseeable future we envisage there will be baremetal, virtualized, containerized workloads Current state is most people we see are running containers in VMs. Cultural/people issues: Easiest way to get going without rocking the organization wide IT boat in some cases Concerns about potential for breakout (contrast to QEMU and use of similar constructs there) Scale issues: # of pods per node (currently 250 and rising), workload dependent. Availability: Ability to live migrate VMs, not impossible to live migrate a container but also not really the way things should work long term.
  11. The Overcloud usually consists of nodes in predefined roles such as Controller nodes, Compute nodes, and different storage node types. Each of these default roles contains a set of services defined in the core Heat template collection on the director node. However, the architecture of the core Heat templates provides a method to: Create custom roles Add and remove services from each role Storage Layout Each storage node includes 2 SSDs and 10 SAS disks. Passed NVMe to VMs for Container Native Storage (Gluster) Ceph performs significantly better when deployed with write-journals on SSDs. Created two write-journals on the SSDs and allocated 5 of the spinning disks to each SSD. In all, we had 90 Ceph OSDs, equating to 158 TB of available disk space. Image Upload Converted to RAW for upload to glance Use snapshot/boot-from-volume flow Consumed ~ 700MB per VM VM pool in Ceph this time around ~1.5 TB for 2048 VMs versus 22 TB last time for 1,000 VMs. Reduced I/O and time to boot VMs, < 15 mins for the 2048 VMs. Ceph’s role in this environment is to provide boot-from-volume service for our VMs (via Cinder).
  12. Routing tier consists of nodes running HAProxy for ingress into the cluster. Identified that there are (on average), a large number of low-throughput cluster ingress connections from clients (i.e. web browsers) to HAProxy versus a small number of high-throughput connections. Already some changes in this space based on previous iterations: Default connection limit of 2000 leaves plenty of room on commonly available CPU cores for additional connections. Thus, bumped the default connection limit to 20,000 in OpenShift 3.5 out of the box. If you have other needs to customize the configuration for HAProxy, our networking folks have made it significantly easier — as of OpenShift 3.4, the router pod now uses a configmap, making tweaks to the config that much simpler.
  13. Load generator configured via passing in ConfigMaps Queries Kubernetes API for list of routes. Builds list of test targets dynamically Zoomed in on a particularly representative workload mix Combination of HTTP with keepalive and TLS terminated at the edge. Chose this because it represents how most OpenShift production deployments are used - serving large numbers of web applications for internal and external use, with a range of security postures.
  14. Graph shows throughput test with a Y-axis of Requests Per Second, higher is better. nbproc refers to number of HAProxy processes spawned. Sched_migration_cost is a kernel tunable that weights processes when deciding if/how the kernel should load balance them amongst available cores. What we learned: CPU affinity matters. But why are certain cores nearly 2x faster? This is because HAProxy is now hitting the CPU cache more often due to NUMA/PCI locality with the network adapter. Increasing nbproc helps throughput. nbproc=2 is ~2x faster than nbproc=1, BUT we get no more boost from going to 4 cores, and in fact nbproc=4 is slower than nbproc=2. This is because there were 4 cores in this guest, and 4 busy HAProxy threads left no room for the OS to do its thing (like process interrupts). Can improve performance over 20% from baseline with no changes other than sched_migration_cost. By increasing it by a factor of 10, we keep HAProxy on the CPU longer, and increase our likelihood of CPU cache hits by doing so. This is a common technique amongst the low-latency networking crowd, and is in fact recommended tuning in our Low Latency Performance Tuning Guide for RHEL7.
  15. Provides full multi-tenancy Encapsulation comes with tradeoffs in CPU cycles to wrap/unwrap packets Can be mitigated via VXLAN offloading with commonly available NICs incl. Those in CNCF cluster. Pluggable, so like OpenStack you can use other SDN solutions where integration has been done Also expect to use Kuryr in future Allows it to be used on any public/private footprint incl. OpenStack
  16. RFC2544 - Benchmarking Methodology for Network Interconnect Devices Discusses and defines a number of tests that may be used to describe the performance characteristics of a network interconnecting device. Also describes specific formats for reporting the results of the tests. As you would expect, adding more streams for same payload provides a notable increase. Difference between baremetal/baremetal+pod and vm/vm+pod only becomes pronounced at largest payload size. Bonus tuning: Large clusters with over 1000 routes or nodes require increasing the default kernel arp cache size. We’ve increased it by a factor of 8x, and are including that tuning out of the box in OpenShift 3.5.
  17. Reasons: Maturity Supportability Security POSIX compliance Overlay/Overlay2 Density improvements gained by page cache sharing are very important for certain environments where there is significant overlap in base image content. Overlay2 w/ SELinux in Linux kernel 4.9
  18. Rate limited pod creation using “tuningset” w/ cluster-loader Each of the 6 bumps is a batch of 40 pods. Before it moves to the next batch, cluster-loader makes sure the previous batch is in running state. In this way avoid crushing the API server with requests, and can examine the system’s profiles at each plateau. The savings in terms of memory is reasonable (again, this is a “perfect world” scenario and your mileage may vary).
  19. The reduction in disk operations below is due to subsequent container starts leveraging the kernel’s page cache rather than having to repeatedly fetch base image content from storage: Overall found overlay2 to be very stable, and it becomes even more interesting with the addition of SELinux support.
  20. Deployed in pods, scheduled like any application Used Kubernetes dynamic provisioning to expose volumes to applications. Marked unschedulable to control variability.
  21. Roughly 6 seconds from submit to the PVC going into “Bound” state. This number does not vary when CNS is deployed on bare metal or virtualized. Not pictured here are our tests verifying that several other persistent volume providers respond in a very similar timeframe.
  22. Filed bugs across Kubernetes, Docker, OpenStack, Ceph, Kernel, OpenvSwitch, Golang, Gluster