Introduction and Deep Dive Into Containerd
- 1. Kohei Tokunaga & Akihiro Suda, NTT Corporation
Introduction and Deep Dive Into Containerd
- 3. Overview
● CNCF graduated container runtime project
● Resource manager
• Container process
• Image artifacts
• Filesystem snapshots
• Metadata and dependencies management
● Tightly scoped (100% approval is required to stretch) but highly extensible
● Used by Kubernetes, Docker and various container-based projects
https://github.com/containerd/containerd
- 4. Usage in community
● Managed: GKE, AWS Fargate, AKS, IKS
● Development: Docker/moby, BuildKit
● K8s distribution: k3s, kind、minikube, kubespray, microk8s, k0s
● FaaS: faasd
Adoption
https://sysdig.com/blog/sysdig-2021-contai
ner-security-usage-report/
● Docker’s use of containerd + pure use of containerd is
83% of container usage (Sysdig 2021 container security
and usage report)
● Used by several managed services as well as open source
projects in community
- 5. How containerd is used?
Low-level runtime
kubelet
CRI
Low-level runtime
containerd API
dockerd
Low-level runtime
containerd API
Arbitrary tools
As a CRI runtime
As a component of
Docker
As a general container
management tool
- 6. Containerd as a CRI runtime
Container Registry
kubectl apply
Detects Pod events
Manages Pods using CRI runtime
• Manages Pods, containers and images
• Pulls image from the registry
• Executes low-level runtimes
Creates and manipulates isolated
execution environments as containers
e.g. runc, gVisor, Kata Containers
Node
Low-level runtime
apiserver
kubelet
CRI
pull
The de facto standard CRI runtime for Kubernetes
● Managed Kubernetes: IKS, GKE, AKS, AWS Fargate, …
● Kubernetes distributions: K3s, kind, minikube, kubespray, microk8s, k0s, ...
- 7. Containerd as a component of Docker
Container Registry
docker run
Manages containers, images, networking
and volumes, etc.
Low-level runtime
• Manages containers
• Executes low-level runtimes
containerd API
dockerd
Node
pull/push
Docker API
Creates and manipulates isolated
execution environments as containers
e.g. runc, gVisor, Kata Containers
- 8. Containerd as a general container management tool
● Several applications are developed based on containerd
● Containerd provides a Go client library (discussed later)
● Applications can extend containerd with plugins, without recompilation (discussed later)
Applications managing containers
Low-level runtime
Provides container management
functionality to upper tools
containerd API
BuildKit faasd
Pouch
Container
nerdctl
Creates and manipulates isolated
execution environments as containers
e.g. runc, gVisor, Kata Containers
- 10. Containerd Architecture
OS
plugins
container image tasks
namespace
leases version
introspection
events diff
Server
runtimes
Client
containerd
API
Kubelet
CRI
● Client-server architecture
• Go client library (used by Docker, BuildKit, etc.)
● Client calls server via containerd API
• Through /run/containerd/containerd.sock
● Various low-level runtimes are supported
• OCI runtimes (runc, gVisor, Kata Container, etc)
• Firecracker (firecracker-containerd)
● Extensibility
• Low-level plugins
• Extending containerd API with custom services
• Client library is easy to customize
- 11. Containerd Client
● “Smart” Client (Go library)
• Containerd API bindings
• Registry client
• Pulling/Pushing images
• Image unpacker
• Creating OCI config for OCI runtimes
● Go application can integrate with containerd
using client library
OS
container image tasks
namespace
leases version
introspection
events diff
container image Etc…
namespace
leases content
snapshots
events tasks
Registry client
Image
unpacker
events
OCI config
constructor
Server
plugins runtimes
API bindings
Utilities
Utilities
- 12. Containerd Client Implementations
OS
container image tasks
namespace
leases version
introspection
events diff
Server
plugins runtimes
ctr, nerdctl, Docker, etc
containerd
API
Client lib
● ctr: https://github.com/containerd/containerd
• CLI client for containerd
• Mainly for debugging or trying new features
● nerdctl: https://github.com/containerd/nerdctl
• Docker-compatible CLI for containerd
• Easy to use for Docker users
• Supports containerd’s cutting-edge features
(e.g. lazy pulling, image encryption)
● containerd-based tools
• Arbitrary tools can integrate to containerd
using client library
• e.g. Docker, BuildKit, faasd
- 13. Containerd Core & API
container image etc…
namespace
leases content
snapshots
CRI tasks
Metadata store
OS
Container
management
Image
management
Container
execution
Shared DB among services
shim
OCI
CRI
● Micro services
• Containerd API is the set of APIs of services
• Services are loosely connected
● Shared metadata DB
• bbolt-based
• https://github.com/etcd-io/bbolt
• Stores metadata of containers, images,
contents, snapshots, etc.
• Manages reference graph for GC
- 15. Low-level Services
OS
container image etc…
namespace
leases content
snapshots
CRI tasks
C
o
n
t
e
n
t
s
t
o
r
e
S
n
a
p
s
h
o
t
t
e
r
R
u
n
t
i
m
e
● Content Store
• Stores image manifest and layers “as-is”
• content addressable (keyed by digest)
● Snapshotter
• Manages ”snapshots”
• Extracted and stacked view of rootfs layers
• Passed to OCI runtimes as rootfs
• Snapshotter impl. per backing filesystem
• Overlayfs, btrfs, aufs, FUSE, …
● Runtime
• Executes low-level runtimes via “shim”
• Shim is a wrapper daemon of OCI runtime
• Well-suit to stateful runtimes (e.g. Kata
Containers)
- 16. Image content flow
unpacker
Content Store Diff Service Snapshotter
Unpack layers
extracted
snapshots
Task & Runtime
Store layer blobs “as-is”
Decompression
Decryption
etc…
Mount snapshots as rootfs
rootfs of
a container
remote
container &
task service
API bindings
・・・
Registry
pull
containerd client utilities
- 18. Extending containerd with plugins and services
OS
container image etc…
namespace
leases content
snapshots
CRI tasks
shim
OCI
plugins
plugins
s
h
i
m
● containerd is tightly scoped but highly extensible
● Custom low-level service; no need to recompile
• external binary plugins
• Plugin via unix socket (proxy snapshotter,
proxy content store)
• Plugin as an executable binary (stream
processor, shim)
• Go plugin
● API is extendable by implementing your own
custom service
• e.g. ”control API” of firecracker-containerd
- 19. Extension example 1: Lazy pulling
● Remote snapshotter plugin
• allows “lazy pulling” of images from arbitrary remote store (not limited to the registry)
• container can startup without waiting for the entire image contents being locally available
● Snapshotter can run as an external daemon (proxy snapshotter)
• No re-compilation is required
• Containerd talks with the snapshotter via unix socket
● Stargz Snapshotter enables lazy pulling of OCI-compatible eStargz/Stargz images from standard registry
• https://github.com/containerd/stargz-snapshotter
Arbitrary remote store
proce
ss
container
Remote
Snapshotter
Node
Provides rootfs snapshots as mount points
Remote Snapshotters in community
- Stargz Snapshotter
- CVMFS-snapshotter
- Nydus-snapshotter
- OverlayBD-snapshotter
- 20. Extension example 2: Generic image layers
● Containerd can handle arbitrary image layers, not limited to OCI standards
• gzip, zstd, encrypted layers…
● Stream Processor plugin converts arbitrary media type to another (e.g. OCI standard types)
● Separated binary can plug into containerd, without re-compilation
Stream Processor in community
- imgcrypt for encrypted images
e.g. layer decription
Image
layers
Rootfs
snapshots
・・・
Builtin
decompressor
Plugin binary
Diff Service
- 21. Extension example 3: Integrating low-level runtimes
● V2 Shim per low-level runtime
● Both of OCI (e.g. runc) and Non-OCI (e.g. Firecracker) runtime can integrate to containerd
● Binary naming convention: io.containerd.runc.v2 -> containerd-shim-runc-v2
● Pluggable logging destination
• fifo(Linux), npipe(Windows), external binary(Linux, Windows), file(Linux, Windows)
runc
Kata
Containers
gVisor Firecracker
io.containerd.runc.v2 io.containerd.kata.v2 io.containerd.runsc.v1 io.containerd.aws-firecracker
Low-level
Runtimes
in community
V2 shims in
community
Runtime service
- 23. Two APIs are available
containerd API is recommended for most use cases, but CRI API might be easier to
get started
Implementing your own containerd client
containerd API CRI API
Consumers Docker/Moby, BuildKit, faasd,
nerdctl...
Kubernetes
Paradigm Task-oriented Pod-oriented
Flexibility Good Bad
Simplicity Bad Good
Transportation gRPC over UNIX socket gRPC over UNIX socket
- 24. Implementing your own containerd client
● Both containerd API and CRI API use gRPC
● In theory you could use any language for your own client
● But containerd API depends on “smart client” written in Go,
especially for pulling images
● So, currently, Go is the best language for Native API
● Contribution is wanted for other languages
- 27. Implementing your own containerd client
Example: https://containerd.io/docs/getting-started/
You will add WithXXX options here:
- oci.WithProcessArgs
- oci.WithMounts
- oci.WithMemoryLimit
- seccomp.WithProfile
- ...
- 28. Implementing your own containerd client
In addition to the client, you will also want to implement OCI hooks and logger binary
● OCI Hooks: custom commands called on creation and deletion of containers
○ e.g., for setting up and tearing down CNI bridge and portmap
○ Optional, but necessary if you want your containers to be restarted
automatically on host reboot
○ Example: https://github.com/containerd/nerdctl/blob/v0.7.2/run.go#L629-L663
● Logger Binary: custom command for handling container logs
○ e.g., store as a local file, transfer to fluentd, …
○ Example: https://github.com/containerd/nerdctl/blob/v0.7.2/run.go#L618-L627
- 29. Implementing your own containerd client
Full example: nerdctl
https://github.com/containerd/nerdctl
Spun out from `ctr` tool with more practical
features:
- Automatic restarting
- Port forwarding
- Logging
- Rootless
- Stargz
- OCIcrypt
- …
You may copy the code as the
“starter pack” to create your own client :)
- 31. containerd 1.5 updates (April)
● Support zstd as an image compression algorithm
○ Faster than gzip
○ https://facebook.github.io/zstd/
● Support NRI: Node Resource Interface
○ Akin to CNI, but for managing resources, e.g., cgroup
○ https://github.com/containerd/nri
● Enable OCIcrypt decryption by default
○ Supported since 1.3, but it was not enabled by default
○ https://github.com/containers/ocicrypt https://github.com/containerd/imgcrypt
● nerdctl (contaiNERD ctl) joined containerd, as a non-core subproject
○ Docker-compatible CLI but with stargz and ocicrypt
○ https://github.com/containerd/nerdctl
- 32. containerd 1.5 updates (April)
● The CRI plugin repo (github.com/containerd/cri) is now merged into the main repo
(github.com/containerd/containerd)
○ No visible change to users, but significantly simplifies contribution process
● Client library is now available as a Go module
- 33. Future plan
● Filesystem quota (#759)
● CRI support for user namespaces (KEP #2101)
○ Run Kubernetes pods as a user that is different from the daemon user
○ Akin to “Rootless Containers”, but different (and does not conflict, either)
● Chown-less user namespaces (#4734)
○ Requires idmapped mounts, introduced in kernel 5.12
● Pause-less pod sandboxes (#4131)
● More documentation (help wanted! 🙏)
- 34. Third party plugin updates
● Nydus Snapshotter https://github.com/dragonflyoss/image-service
○ Similar to Stargz Snapshotter but with a different image format
● OverlayBD Snapshotter https://github.com/alibaba/accelerated-container-image
○ Boot containers from iSCSI
● runu https://github.com/ukontainer/runu
○ Linux containers on macOS, using LKL (Linux Kernel Library)
● runj https://github.com/samuelkarp/runj
○ FreeBSD containers
- 35. Recap
● The de facto standard runtime for Kubernetes, but not only for Kubernetes
● Extensible with plugins
○ Runtime plugins, e.g., gVisor, Kata
○ Snapshotter plugins, e.g., Stargz Snapshotter
○ Stream processor plugins, e.g., OCIcrypt
○ Logging binary plugins, e.g., json-file
○ ...
● New subproject: nerdctl (https://github.com/containerd/nerdctl)
○ Like `docker` but with full features of containerd
○ Like `ctr` but with full user experience of `docker`
○ nerdctl run -d -p 80:80 --restart=always nginx