SlideShare a Scribd company logo
Let’s Try Every CRI Runtime
Available for Kubernetes
Phil Estes, Distinguished Engineer
IBM Cloud
Background: OCI
@estesp
OCI specifications
Linux kernel Windows kernel
Container
registries
Container
runtimes
Docker, containerd, cri-o, Kata,
Firecracker, gVisor, Nabla,
Singularity, ...
DockerHub, OSS distribution
project, Cloud registries, JFrog, ...
Background: CRI
@estesp
K8s and CRI Responsibilities
@estesp
Kubernetes Container Runtime
CRI
▧ K8s API
▧ Storage
▧ Networking (CNI)
▧ Healthchecks
▧ Placement
▧ Custom resources
▧ Pod container lifecycle
○ Start/stop/delete
▧ Image management
○ Pull/status
▧ Status
▧ Container interactions
○ attach, exec, ports, log
Background: CRI Runtimes
@estesp
kubelet
dockershim
dockerd
kubelet
cri-containerd
containerd
kubelet
cri-o
runc
kubelet
containerd
Kata Firecracker
kubelet --container-runtime {string}
--container-runtime-endpoint {string}
kubelet
singularity-cri
singularity
Caveats
@estesp
What I don’t have time to cover/demo...
▧ Windows containers & runtimes
▧ rkt (CNCF)
▧ Virtual Kubelet (CRI implementation)
▧ Nabla containers (IBM)
Caveats
@estesp
EXPECTATION
REALITY
Setup
GKE 2-node
Docker Docker
IKS 3-node
containerd containerd containerd
IBM Cloud
BMI
containerd
Firecracker Kata gVisorSingle node
VM
cri-o
IBM Cloud
VSI
cri-o cri-o cri-o
okd 3-node
@estesp
Docker
@estesp
Docker
@estesp
● Most common, original runtime for Kubernetes clusters
● Simplifies tooling for mixed use cluster node (e.g. applications
relying on `docker …` commands “just work”
● Docker Enterprise customers get support and
multi-orchestrator support (swarm + K8s in same cluster)
● “More than enough” engine for Kubernetes
● Concerns over mismatch/lack of release sync between Docker
releases and Kubernetes releases (e.g. “certified” engine version)
● Extra memory/CPU use due to extra layer (docker->ctr->runc)
Containerd
@estesp
Containerd
@estesp
● Used in GKE (Google), IKS (IBM), & Alibaba public clouds
● Significant hardening/testing by nature of use in every Docker
installation (tens of millions of engines)
● Lower memory/CPU use; clean API for extensibility/embedding
● No Docker API socket (tools/vendor support)
● Still growing in maturity/use
● Windows support in flight; soon at parity with Docker engine
CRI-O
@estesp
CRI-O
@estesp
● Used in RH OpenShift; SuSE CaaS; other customers/uses
● “all the runtime Kubernetes needs and nothing more”
● UNIX perspective on separating concerns (client, registry
interactions, build)
● Not consumable apart from RH tools (design choice)
● Use/installation on non-RH distros can be complicated
● Extensibility limited (e.g. proposal from Kata to add containerd
shim API to cri-o)
Sandboxes + RuntimeClass
@estesp
Containerd v2 Shim API
@estesp
Kata Containers
@estesp
Kata Containers
@estesp
● Lightweight virtualization via Intel Clear Containers + Hyper.sh
predecessors
● Implemented via KVM/qemu-based VM isolation
● Works with Docker, cri-o, & containerd
● Solid and maturing project with Intel and others leading;
governance under OpenStack Foundation
● Have added ability to drive Firecracker VMM as well
● Supports ARM, x86_64, AMD64, and IBM p and zSeries
AWS Firecracker
@estesp
AWS Firecracker
@estesp
● Lightweight virtualization via Rust-written VMM, originating
from Google’s crosvm project; target serverless/functions area
● Open Sourced by Amazon in November 2018
● Works standalone via API or via containerd
● cgroup + seccomp “jailer” to tighten down kernel access
● Integrated with containerd via shim and external snapshotter
implementation
● Quickly moving & young project; packaging and delivery still in
flux and requires quite a few manual steps today
gVisor
@estesp
gVisor
@estesp
● A kernel-in-userspace concept from Google; written in Golang
● Used in concert with GKE; for example with Google Cloud Run
for increased isolation/security boundary
● Works standalone (OCI runc replacement) or via containerd
shim implementation
● Reduced syscalls used against “real kernel”; applications run
against gVisor syscall implementations
● Limited functionality; some applications may not work if syscall
not implemented in gVisor
● Syscall overhead, network performance impacted (ref: KVM mode)
Singularity
@estesp
Singularity
@estesp
● An HPC/academic community focused container runtime
● Initially not implementing OCI, now has OCI compliant mode
● To meet HPC use model; not daemon-based, low privilege,
user-oriented runtime (e.g. HPC end user workload scheduling)
● Sylabs, creator of Singularity have recently written a CRI
implementation that drives Singularity runtime
● Uses OCI compliant mode; converts images to SIF, however
● Focused solely on the academic/HPC use case
Nabla
@estesp
Nabla
@estesp
● IBM Research created open source sandbox runtime
● Uses highly limited seccomp profile + unikernel implementation
● Similar to gVisor, but instead of user-mode kernel, uses
unikernel+application approach
● Currently requires building images against special set of
unikernel-linked runtimes (Node, Python, Java, etc.)
● IBM Research pursuing ways to remove this limitation; only
runtime which doesn’t allow generic use of any container image
Summary
@estesp
● OCI specs (runtime, image, distribution) have
enabled a common underpinning for innovation
that maintains interoperability
● CRI has enabled a “pluggable” model for
container runtimes underneath Kubernetes
● Options are growing; most innovation is around
sandboxes and enabled for easier use with
RuntimeClass in K8s
Let's Try Every CRI Runtime Available for Kubernetes

More Related Content

Let's Try Every CRI Runtime Available for Kubernetes

  • 1. Let’s Try Every CRI Runtime Available for Kubernetes Phil Estes, Distinguished Engineer IBM Cloud
  • 2. Background: OCI @estesp OCI specifications Linux kernel Windows kernel Container registries Container runtimes Docker, containerd, cri-o, Kata, Firecracker, gVisor, Nabla, Singularity, ... DockerHub, OSS distribution project, Cloud registries, JFrog, ...
  • 4. K8s and CRI Responsibilities @estesp Kubernetes Container Runtime CRI ▧ K8s API ▧ Storage ▧ Networking (CNI) ▧ Healthchecks ▧ Placement ▧ Custom resources ▧ Pod container lifecycle ○ Start/stop/delete ▧ Image management ○ Pull/status ▧ Status ▧ Container interactions ○ attach, exec, ports, log
  • 5. Background: CRI Runtimes @estesp kubelet dockershim dockerd kubelet cri-containerd containerd kubelet cri-o runc kubelet containerd Kata Firecracker kubelet --container-runtime {string} --container-runtime-endpoint {string} kubelet singularity-cri singularity
  • 6. Caveats @estesp What I don’t have time to cover/demo... ▧ Windows containers & runtimes ▧ rkt (CNCF) ▧ Virtual Kubelet (CRI implementation) ▧ Nabla containers (IBM)
  • 8. Setup GKE 2-node Docker Docker IKS 3-node containerd containerd containerd IBM Cloud BMI containerd Firecracker Kata gVisorSingle node VM cri-o IBM Cloud VSI cri-o cri-o cri-o okd 3-node @estesp
  • 10. Docker @estesp ● Most common, original runtime for Kubernetes clusters ● Simplifies tooling for mixed use cluster node (e.g. applications relying on `docker …` commands “just work” ● Docker Enterprise customers get support and multi-orchestrator support (swarm + K8s in same cluster) ● “More than enough” engine for Kubernetes ● Concerns over mismatch/lack of release sync between Docker releases and Kubernetes releases (e.g. “certified” engine version) ● Extra memory/CPU use due to extra layer (docker->ctr->runc)
  • 12. Containerd @estesp ● Used in GKE (Google), IKS (IBM), & Alibaba public clouds ● Significant hardening/testing by nature of use in every Docker installation (tens of millions of engines) ● Lower memory/CPU use; clean API for extensibility/embedding ● No Docker API socket (tools/vendor support) ● Still growing in maturity/use ● Windows support in flight; soon at parity with Docker engine
  • 14. CRI-O @estesp ● Used in RH OpenShift; SuSE CaaS; other customers/uses ● “all the runtime Kubernetes needs and nothing more” ● UNIX perspective on separating concerns (client, registry interactions, build) ● Not consumable apart from RH tools (design choice) ● Use/installation on non-RH distros can be complicated ● Extensibility limited (e.g. proposal from Kata to add containerd shim API to cri-o)
  • 16. Containerd v2 Shim API @estesp
  • 18. Kata Containers @estesp ● Lightweight virtualization via Intel Clear Containers + Hyper.sh predecessors ● Implemented via KVM/qemu-based VM isolation ● Works with Docker, cri-o, & containerd ● Solid and maturing project with Intel and others leading; governance under OpenStack Foundation ● Have added ability to drive Firecracker VMM as well ● Supports ARM, x86_64, AMD64, and IBM p and zSeries
  • 20. AWS Firecracker @estesp ● Lightweight virtualization via Rust-written VMM, originating from Google’s crosvm project; target serverless/functions area ● Open Sourced by Amazon in November 2018 ● Works standalone via API or via containerd ● cgroup + seccomp “jailer” to tighten down kernel access ● Integrated with containerd via shim and external snapshotter implementation ● Quickly moving & young project; packaging and delivery still in flux and requires quite a few manual steps today
  • 22. gVisor @estesp ● A kernel-in-userspace concept from Google; written in Golang ● Used in concert with GKE; for example with Google Cloud Run for increased isolation/security boundary ● Works standalone (OCI runc replacement) or via containerd shim implementation ● Reduced syscalls used against “real kernel”; applications run against gVisor syscall implementations ● Limited functionality; some applications may not work if syscall not implemented in gVisor ● Syscall overhead, network performance impacted (ref: KVM mode)
  • 24. Singularity @estesp ● An HPC/academic community focused container runtime ● Initially not implementing OCI, now has OCI compliant mode ● To meet HPC use model; not daemon-based, low privilege, user-oriented runtime (e.g. HPC end user workload scheduling) ● Sylabs, creator of Singularity have recently written a CRI implementation that drives Singularity runtime ● Uses OCI compliant mode; converts images to SIF, however ● Focused solely on the academic/HPC use case
  • 26. Nabla @estesp ● IBM Research created open source sandbox runtime ● Uses highly limited seccomp profile + unikernel implementation ● Similar to gVisor, but instead of user-mode kernel, uses unikernel+application approach ● Currently requires building images against special set of unikernel-linked runtimes (Node, Python, Java, etc.) ● IBM Research pursuing ways to remove this limitation; only runtime which doesn’t allow generic use of any container image
  • 27. Summary @estesp ● OCI specs (runtime, image, distribution) have enabled a common underpinning for innovation that maintains interoperability ● CRI has enabled a “pluggable” model for container runtimes underneath Kubernetes ● Options are growing; most innovation is around sandboxes and enabled for easier use with RuntimeClass in K8s