SlideShare a Scribd company logo
Debugging Linux issues with eBPF
One incident from start to finish with dynamic tracing applied
Ivan Babrou
Performance @ Cloudflare
What does Cloudflare do
CDN
Moving content physically
closer to visitors with
our CDN.
Intelligent caching
Unlimited DDOS
mitigation
Unlimited bandwidth at
flat pricing with free
plans
Edge access control
IPFS gateway
Onion service
Website Optimization
Making web fast and up to
date for everyone.
TLS 1.3 (with 0-RTT)
HTTP/2 + QUIC
Server push
AMP
Origin load-balancing
Smart routing
Serverless / Edge Workers
Post quantum crypto
DNS
Cloudflare is the fastest
managed DNS providers
in the world.
1.1.1.1
2606:4700:4700::1111
DNS over TLS
160+
Data centers globally
4.5M+
DNS requests/s
across authoritative, recursive
and internal
10%
Internet requests
everyday
10M+
HTTP requests/second
Websites, apps & APIs
in 150 countries
10M+
Cloudflare’s anycast network
Network capacity
20Tbps

Recommended for you

LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance

Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: " Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."

linuxperformanceperformance tuning
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing

This document discusses how eBPF (extended Berkeley Packet Filter) can be used for kernel tracing. It provides an overview of BPF and eBPF, how eBPF programs are compiled and run in the kernel, the use of BPF maps, and how eBPF enables new possibilities for dynamic kernel instrumentation through techniques like Kprobes and ftrace.

ebpftracinglinux
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started

Talk for Facebook Systems@Scale 2021 by Brendan Gregg: "BPF (eBPF) tracing is the superpower that can analyze everything, helping you find performance wins, troubleshoot software, and more. But with many different front-ends and languages, and years of evolution, finding the right starting point can be hard. This talk will make it easy, showing how to install and run selected BPF tools in the bcc and bpftrace open source projects for some quick wins. Think like a sysadmin, not like a programmer."

linuxbpfperformance
350B+
DNS requests/day
across authoritative, recursive
and internal
800B+
HTTP requests/day
Cloudflare’s anycast network (daily ironic numbers)
Network capacity
1.73Ebpd
Link to slides with speaker notes
Slideshare doesn’t allow links on the first 3 slides
Cloudflare is a Debian shop
● All machines were running Debian Jessie on bare metal
● OS boots over PXE into memory, packages and configs are ephemeral
● Kernel can be swapped as easy as OS
● New Stable (stretch) came out, we wanted to keep up
● Very easy to upgrade:
○ Build all packages for both distributions
○ Upgrade machines in groups, look at metrics, fix issues, repeat
○ Gradually phase out Jessie
○ Pop a bottle of champagne and celebrate
Cloudflare core Kafka platform at the time
● Kafka is a distributed log with multiple producers and consumers
● 3 clusters: 2 small (dns + logs) with 9 nodes, 1 big (http) with 106 nodes
● 2 x 10C Intel Xeon E5-2630 v4 @ 2.2GHz (40 logical CPUs), 128GB RAM
● 12 x 800GB SSD in RAID0
● 2 x 10G bonded NIC
● Mostly network bound at ~100Gbps ingress and ~700Gbps egress
● Check out our blog post on Kafka compression
● We also blogged about our Gen 9 edge machines recently

Recommended for you

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop

This document provides instructions for setting up and attending an eBPF workshop. It includes links for setting up the workshop platform, background slides, and code repository. It also lists an agenda with topics that will be covered, including setting up the eBPF lab, an introduction, eBPF 101, writing eBPF programs, BCC, and a tutorial. Attendees are asked to let the presenter know if they have any problems setting up.

ebpfbpfxdp
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more

Video: https://www.youtube.com/watch?v=JRFNIKUROPE . Talk for linux.conf.au 2017 (LCA2017) by Brendan Gregg, about Linux enhanced BPF (eBPF). Abstract: A world of new capabilities is emerging for the Linux 4.x series, thanks to enhancements that have been included in Linux for to Berkeley Packet Filter (BPF): an in-kernel virtual machine that can execute user space-defined programs. It is finding uses for security auditing and enforcement, enhancing networking (including eXpress Data Path), and performance observability and troubleshooting. Many new open source tools that have been written in the past 12 months for performance analysis that use BPF. Tracing superpowers have finally arrived for Linux! For its use with tracing, BPF provides the programmable capabilities to the existing tracing frameworks: kprobes, uprobes, and tracepoints. In particular, BPF allows timestamps to be recorded and compared from custom events, allowing latency to be studied in many new places: kernel and application internals. It also allows data to be efficiently summarized in-kernel, including as histograms. This has allowed dozens of new observability tools to be developed so far, including measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. This talk will summarize BPF capabilities and use cases so far, and then focus on its use to enhance Linux tracing, especially with the open source bcc collection. bcc includes BPF versions of old classics, and many new tools, including execsnoop, opensnoop, funcccount, ext4slower, and more (many of which I developed). Perhaps you'd like to develop new tools, or use the existing tools to find performance wins large and small, especially when instrumenting areas that previously had zero visibility. I'll also summarize how we intend to use these new capabilities to enhance systems analysis at Netflix.

linuxperformancebpf
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance

This document provides a performance engineer's predictions for computing performance trends in 2021 and beyond. The engineer discusses trends in processors, memory, disks, networking, runtimes, kernels, hypervisors, and observability. For processors, predictions include multi-socket systems becoming less common, the future of simultaneous multithreading being unclear, practical core count limits being reached in the 2030s, and more processor vendors including ARM-based and RISC-V options. Memory predictions focus on many workloads being memory-bound currently.

performancecomputingcloud computing
Small clusters went ok, big one did not
One node upgraded to Stretch
Perf to the rescue: “perf top -F 99”
RCU stalls in dmesg
[ 4923.462841] INFO: rcu_sched self-detected stall on CPU
[ 4923.462843] 13-...: (2 GPs behind) idle=ea7/140000000000001/0 softirq=1/2 fqs=4198
[ 4923.462845] (t=8403 jiffies g=110722 c=110721 q=6440)
Error logging issues
Aug 15 21:51:35 myhost kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
Aug 15 21:51:35 myhost kernel: 26-...: (1881 ticks this GP) idle=76f/140000000000000/0
softirq=8/8 fqs=365
Aug 15 21:51:35 myhost kernel: (detected by 0, t=2102 jiffies, g=1837293, c=1837292, q=262)
Aug 15 21:51:35 myhost kernel: Task dump for CPU 26:
Aug 15 21:51:35 myhost kernel: java R running task 13488 1714 1513 0x00080188
Aug 15 21:51:35 myhost kernel: ffffc9000d1f7898 ffffffff814ee977 ffff88103f410400 000000000000000a
Aug 15 21:51:35 myhost kernel: 0000000000000041 ffffffff82203142 ffffc9000d1f78c0 ffffffff814eea10
Aug 15 21:51:35 myhost kernel: 0000000000000041 ffffffff82203142 ffff88103f410400 ffffc9000d1f7920
Aug 15 21:51:35 myhost kernel: Call Trace:
Aug 15 21:51:35 myhost kernel: [<ffffffff814ee977>] ? scrup+0x147/0x160
Aug 15 21:51:35 myhost kernel: [<ffffffff814eea10>] ? lf+0x80/0x90
Aug 15 21:51:35 myhost kernel: [<ffffffff814eecb5>] ? vt_console_print+0x295/0x3c0

Recommended for you

Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started

The document describes the ftrace function tracing tool in Linux kernels. It allows attaching to functions in the kernel to trace function calls. It works by having the GCC compiler insert indirect function entry calls. These calls are recorded during linking and replaced with nops at boot time for efficiency. This allows function tracing with low overhead by tracing the indirect function entry calls.

linux kernelopen source
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel

Perf is a collection of Linux kernel tools for performance monitoring and profiling. It provides sampling and profiling of the system to analyze performance bottlenecks. Perf supports hardware events from the CPU performance counters, software events from the kernel, and tracepoint events from the kernel and loaded modules. It offers tools like perf record to sample events and store them, perf report to analyze stored samples, and perf trace to trace system events in real-time.

opensuse asia summit 2017
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF

Using the new extended Berkley Packet Filter capabilities in Linux to the improve performance of auditing security relevant kernel events around network, file and process actions.

securitymonitoringlinux
Page allocation failures
Aug 16 01:14:51 myhost systemd-journald[13812]: Missed 17171 kernel messages
Aug 16 01:14:51 myhost kernel: [<ffffffff81171754>] shrink_inactive_list+0x1f4/0x4f0
Aug 16 01:14:51 myhost kernel: [<ffffffff8117234b>] shrink_node_memcg+0x5bb/0x780
Aug 16 01:14:51 myhost kernel: [<ffffffff811725e2>] shrink_node+0xd2/0x2f0
Aug 16 01:14:51 myhost kernel: [<ffffffff811728ef>] do_try_to_free_pages+0xef/0x310
Aug 16 01:14:51 myhost kernel: [<ffffffff81172be5>] try_to_free_pages+0xd5/0x180
Aug 16 01:14:51 myhost kernel: [<ffffffff811632db>] __alloc_pages_slowpath+0x31b/0xb80
...
[78991.546088] systemd-network: page allocation stalls for 287000ms, order:0,
mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
Downgrade and investigate
● System CPU was up, so it must be the kernel upgrade
● Downgrade Stretch to Jessie
● Downgrade Linux 4.9 to 4.4 (known good, but no allocation stall logging)
● Investigate without affecting customers
● Bisection pointed at OS upgrade, kernel was not responsible
Make a flamegraph with perf
#!/bin/sh -e
# flamegraph-perf [perf args here] > flamegraph.svg
# Explicitly setting output and input to perf.data is needed to make perf work over ssh without TTY.
perf record -o perf.data "$@"
# Fetch JVM stack maps if possible, this requires -XX:+PreserveFramePointer
export JAVA_HOME=/usr/lib/jvm/oracle-java8-jdk-amd64 AGENT_HOME=/usr/local/perf-map-agent
/usr/local/flamegraph/jmaps 1>&2
IDLE_REGEXPS="^swapper;.*(cpuidle|cpu_idle|cpu_bringup_and_idle|native_safe_halt|xen_hypercall_sched_op|x
en_hypercall_vcpu_op)"
perf script -i perf.data | /usr/local/flamegraph/stackcollapse-perf.pl --all grep -E -v "$IDLE_REGEXPS" |
/usr/local/flamegraph/flamegraph.pl --colors=java --hash --title=$(hostname)
Full system flamegraphs point at sendfile
Jessie
Stretch
sendfile

Recommended for you

Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005

This document summarizes a presentation about tuning parallel code on Solaris. It discusses: 1) Using tools like DTrace, prstat, and vmstat to analyze performance issues like thread scheduling and I/O problems in parallel applications on Solaris. 2) Two examples of using DTrace to analyze thread scheduling and troubleshoot I/O performance problems in a virtualized Windows server. 3) How the examples demonstrated using DTrace to identify unbalanced thread scheduling and discover that a domain controller was disabling disk write caching, slowing performance.

Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun

Zun is an OpenStack service that manages containers as first-class resources without relying on virtual machines. The document discusses enabling DPDK and SR-IOV support in Zun to accelerate containerized network functions (NFV). It outlines challenges in using containers for NFV and how Zun addresses gaps. Benchmark tests show containers leveraging DPDK and SR-IOV through Zun can achieve near-physical server performance for networking workloads.

dpdkcontainernfv
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerun

This document contains the slides from a presentation given by WonoKaerun at the Indonesian Security Conference 2011 in Palembang. The presentation introduces rootkits and techniques for hiding malware at the kernel level on Linux systems. It covers topics like loadable kernel modules, interrupt descriptor table hooking, virtual file system hacking, page fault handler hijacking, debugging register abuse, and kernel instrumentation patching. The goal is to evade detection by security solutions by gaining control of the kernel before anti-rootkit defenses can activate. Throughout, the document emphasizes the cat-and-mouse nature of offensive and defensive security research.

idsecconf2011
Enhance
Stretch sendfile flamegraph spinlocks
eBPF and BCC tools
Latency of sendfile on Jessie: < 31us
$ sudo /usr/share/bcc/tools/funclatency -uTi 1 do_sendfile
Tracing 1 functions for "do_sendfile"... Hit Ctrl-C to end.
23:27:25
usecs : count distribution
0 -> 1 : 9 | |
2 -> 3 : 47 |**** |
4 -> 7 : 53 |***** |
8 -> 15 : 379 |****************************************|
16 -> 31 : 329 |********************************** |
32 -> 63 : 101 |********** |
64 -> 127 : 23 |** |
128 -> 255 : 50 |***** |
256 -> 511 : 7 | |

Recommended for you

Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDB

Brian Bouterse discusses using the GNU Debugger (GDB) to debug hung Python processes. GDB can be used to attach to running Python processes and debug issues that occur in production or with remote/rarely occurring problems. The debugger provides tools like stack traces and examining local variables. Python extensions for GDB provide additional functionality for listing source code, switching threads, and more. Debugging with GDB requires installing debug symbols and dealing with optimized-out code. Alternative tools like strace and rpdb can also provide debugging assistance.

gdbpythondebug
Troubleshooting Tips from a Docker Support Engineer
Troubleshooting Tips from a Docker Support EngineerTroubleshooting Tips from a Docker Support Engineer
Troubleshooting Tips from a Docker Support Engineer

The document discusses various troubleshooting techniques for Docker including using tools like socat and curl to characterize networking and TLS issues, checking container processes and permissions, using volumes to store persistent data, and resolving issues with incorrect localhost references between containers. It also provides examples of troubleshooting issues with a Minecraft server, Ruby application, and Nginx proxy configuration.

troubleshootingnetworkingdocker
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF

Talk by Brendan Gregg at Kernel Recipes 2017 (Paris): "The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix. Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime). This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more."

bpflinuxlinux performance tracing
Latency of sendfile on Stretch: < 511us
usecs : count distribution
0 -> 1 : 1 | |
2 -> 3 : 20 |*** |
4 -> 7 : 46 |******* |
8 -> 15 : 56 |******** |
16 -> 31 : 65 |********** |
32 -> 63 : 75 |*********** |
64 -> 127 : 75 |*********** |
128 -> 255 : 258 |****************************************|
256 -> 511 : 144 |********************** |
512 -> 1023 : 24 |*** |
1024 -> 2047 : 27 |**** |
2048 -> 4095 : 28 |**** |
4096 -> 8191 : 35 |***** |
Number of mod_timer runs
# Jessie
$ sudo /usr/share/bcc/tools/funccount -T -i 1
mod_timer
Tracing 1 functions for "mod_timer"... Hit Ctrl-C
to end.
00:33:36
FUNC COUNT
mod_timer 60482
00:33:37
FUNC COUNT
mod_timer 58263
00:33:38
FUNC COUNT
mod_timer 54626
# Stretch
$ sudo /usr/share/bcc/tools/funccount -T -i 1
mod_timer
Tracing 1 functions for "mod_timer"... Hit Ctrl-C
to end.
00:33:28
FUNC COUNT
mod_timer 149068
00:33:29
FUNC COUNT
mod_timer 155994
00:33:30
FUNC COUNT
mod_timer 160688
Number of lock_timer_base runs
# Jessie
$ sudo /usr/share/bcc/tools/funccount -T -i 1
lock_timer_base
Tracing 1 functions for "lock_timer_base"... Hit
Ctrl-C to end.
00:32:36
FUNC COUNT
lock_timer_base 15962
00:32:37
FUNC COUNT
lock_timer_base 16261
00:32:38
FUNC COUNT
lock_timer_base 15806
# Stretch
$ sudo /usr/share/bcc/tools/funccount -T -i 1
lock_timer_base
Tracing 1 functions for "lock_timer_base"... Hit
Ctrl-C to end.
00:32:32
FUNC COUNT
lock_timer_base 119189
00:32:33
FUNC COUNT
lock_timer_base 196895
00:32:34
FUNC COUNT
lock_timer_base 140085
We can trace timer tracepoints with perf
$ sudo perf list | fgrep timer:
timer:hrtimer_cancel [Tracepoint event]
timer:hrtimer_expire_entry [Tracepoint event]
timer:hrtimer_expire_exit [Tracepoint event]
timer:hrtimer_init [Tracepoint event]
timer:hrtimer_start [Tracepoint event]
timer:itimer_expire [Tracepoint event]
timer:itimer_state [Tracepoint event]
timer:tick_stop [Tracepoint event]
timer:timer_cancel [Tracepoint event]
timer:timer_expire_entry [Tracepoint event]
timer:timer_expire_exit [Tracepoint event]
timer:timer_init [Tracepoint event]
timer:timer_start [Tracepoint event]

Recommended for you

LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability

Here is a bpftrace program to measure scheduler latency for ICMP echo requests: #!/usr/local/bin/bpftrace kprobe:icmp_send { @start[tid] = nsecs; } kprobe:__netif_receive_skb_core { @diff[tid] = hist(nsecs - @start[tid]); delete(@start[tid]); } END { print(@diff); clear(@diff); } This traces the time between the icmp_send kernel function (when the packet is queued for transmit) and the __netif_receive_skb_core function (when the response packet is received). The

bpfebpflinu
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics

Michael Kehoe provides an overview of Linux container basics. Containers isolate processes running within them and provide security and resource control similar to virtual machines but with faster deployment. Key Linux kernel features like namespaces and cgroups are used to isolate containers. Namespaces isolate resources like the network, filesystem and process IDs. Cgroups limit resources like CPU and memory. Copy-on-write is used to improve memory efficiency. Container runtimes like Docker and containerd use these features to package and run applications in containers.

linuxcontainerszones
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...

Запускаем сервер (БД, Web-сервер или что-то свое собственное) и не получаем желаемый RPS. Запускаем top и видим, что 100% выедается CPU. Что дальше, на что расходуется процессорное время? Можно ли подкрутить какие-то ручки, чтобы улучшить производительность? А если параметр CPU не высокий, то куда смотреть дальше? Мы рассмотрим несколько сценариев проблем производительности, рассмотрим доступные инструменты анализа производительности и разберемся в методологии оптимизации производительности Linux, ответим на вопрос за какие ручки и как крутить.

highload++сервера
Number of timers per function
# Jessie
$ sudo perf record -e timer:timer_start -p 23485 --
sleep 10 && sudo perf script | sed 's/.*
function=//g' | awk '{ print $1 }' | sort | uniq -c
[ perf record: Woken up 54 times to write data ]
[ perf record: Captured and wrote 17.778 MB
perf.data (173520 samples) ]
2 clocksource_watchdog
5 cursor_timer_handler
2 dev_watchdog
10 garp_join_timer
2 ixgbe_service_timer
4769 tcp_delack_timer
171 tcp_keepalive_timer
168512 tcp_write_timer
# Stretch
$ sudo perf record -e timer:timer_start -p 3416 --
sleep 10 && sudo perf script | sed 's/.*
function=//g' | awk '{ print $1 }' | sort | uniq -c
[ perf record: Woken up 671 times to write data ]
[ perf record: Captured and wrote 198.273 MB
perf.data (1988650 samples) ]
6 clocksource_watchdog
12 cursor_timer_handler
2 dev_watchdog
18 garp_join_timer
4 ixgbe_service_timer
4622 tcp_delack_timer
1 tcp_keepalive_timer
1983978 tcp_write_timer
Timer flamegraphs comparison
Jessie
Stretch
tcp_push_one
Number of calls for hot functions
# Jessie
$ sudo /usr/share/bcc/tools/funccount -T -i 1
tcp_sendmsg
Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C
to end.
03:33:33
FUNC COUNT
tcp_sendmsg 21166
$ sudo /usr/share/bcc/tools/funccount -T -i 1
tcp_push_one
Tracing 1 functions for "tcp_push_one"... Hit Ctrl-
C to end.
03:37:14
FUNC COUNT
tcp_push_one 496
# Stretch
$ sudo /usr/share/bcc/tools/funccount -T -i 1
tcp_sendmsg
Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C
to end.
03:33:30
FUNC COUNT
tcp_sendmsg 53834
$ sudo /usr/share/bcc/tools/funccount -T -i 1
tcp_push_one
Tracing 1 functions for "tcp_push_one"... Hit Ctrl-
C to end.
03:37:10
FUNC COUNT
tcp_push_one 64483
Count stacks leading to tcp_push_one
$ sudo stackcount -i 10 tcp_push_one

Recommended for you

re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix

This document provides an overview of Brendan Gregg's presentation on BPF performance analysis at Netflix. It discusses: - Why BPF is changing the Linux OS model to become more event-based and microkernel-like. - The internals of BPF including its origins, instruction set, execution model, and how it is integrated into the Linux kernel. - How BPF enables a new class of custom, efficient, and safe performance analysis tools for analyzing various Linux subsystems like CPUs, memory, disks, networking, applications, and the kernel. - Examples of specific BPF-based performance analysis tools developed by Netflix, AWS, and others for analyzing tasks, scheduling, page faults

bpflinuxperformance
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem

The document discusses reverse engineering the firmware of Swisscom's Centro Grande modems. It identifies several vulnerabilities found, including a command overflow issue that allows complete control of the device by exceeding the input buffer, and multiple buffer overflow issues that can be exploited to execute code remotely by crafting specially formatted XML files. Details are provided on the exploitation techniques and timeline of coordination with Swisscom to address the vulnerabilities.

infosecsecuritycybsec16
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux

Disruptive IP Networking with Intel DPDK on Linux 07 Jan, 2013 SAKURA Internet Research Center Senior Researcher / Naoto MATSUMOTO

#dpdk
Stacks for tcp_push_one (stackcount)
tcp_push_one
inet_sendpage
kernel_sendpage
sock_sendpage
pipe_to_sendpage
__splice_from_pipe
splice_from_pipe
generic_splice_sendpage
direct_splice_actor
splice_direct_to_actor
do_splice_direct
do_sendfile
sys_sendfile64
do_syscall_64
return_from_SYSCALL_64
4950
tcp_push_one
inet_sendmsg
sock_sendmsg
kernel_sendmsg
sock_no_sendpage
tcp_sendpage
inet_sendpage
kernel_sendpage
sock_sendpage
pipe_to_sendpage
__splice_from_pipe
splice_from_pipe
generic_splice_sendpage
...
return_from_SYSCALL_64
735110
Diff of the most popular stack
--- jessie.txt 2017-08-16 21:14:13.000000000 -0700
+++ stretch.txt 2017-08-16 21:14:20.000000000 -0700
@@ -1,4 +1,9 @@
tcp_push_one
+inet_sendmsg
+sock_sendmsg
+kernel_sendmsg
+sock_no_sendpage
+tcp_sendpage
inet_sendpage
kernel_sendpage
sock_sendpage
Let’s look at tcp_sendpage
int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) {
ssize_t res;
if (!(sk->sk_route_caps & NETIF_F_SG) ||
!sk_check_csum_caps(sk))
return sock_no_sendpage(sk->sk_socket, page, offset, size,
flags);
lock_sock(sk);
tcp_rate_check_app_limited(sk); /* is sending application-limited? */
res = do_tcp_sendpages(sk, page, offset, size, flags);
release_sock(sk);
return res;
}
what we see on the stack
segmentation offload
Cloudflare network setup
eth2 -->| |--> vlan10
|---> bond0 -->|
eth3 -->| |--> vlan100

Recommended for you

Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part III

The document discusses diagnosing and mitigating MySQL performance issues. It describes using various operating system monitoring tools like vmstat, iostat, and top to analyze CPU, memory, disk, and network utilization. It also discusses using MySQL-specific tools like the MySQL command line, mysqladmin, mysqlbinlog, and external tools to diagnose issues like high load, I/O wait, or slow queries by examining metrics like queries, connections, storage engine statistics, and InnoDB logs and data written. The agenda covers identifying system and MySQL-specific bottlenecks by verifying OS metrics and running diagnostics on the database, storage engines, configuration, and queries.

mysql
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全

- The document discusses various Linux system log files such as /var/log/messages, /var/log/secure, and /var/log/cron and provides examples of log entries. - It also covers log rotation tools like logrotate and logwatch that are used to manage log files. - Networking topics like IP addressing, subnet masking, routing, ARP, and tcpdump for packet sniffing are explained along with examples.

linux
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016

Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."

Missing offload settings
eth2 -->| |--> vlan10
|---> bond0 -->|
eth3 -->| |--> vlan100
Compare ethtool -k settings on vlan10
-tx-checksumming: off
+tx-checksumming: on
- tx-checksum-ip-generic: off
+ tx-checksum-ip-generic: on
-scatter-gather: off
- tx-scatter-gather: off
+scatter-gather: on
+ tx-scatter-gather: on
-tcp-segmentation-offload: off
- tx-tcp-segmentation: off [requested on]
- tx-tcp-ecn-segmentation: off [requested on]
- tx-tcp-mangleid-segmentation: off [requested on]
- tx-tcp6-segmentation: off [requested on]
-udp-fragmentation-offload: off [requested on]
-generic-segmentation-offload: off [requested on]
+tcp-segmentation-offload: on
+ tx-tcp-segmentation: on
+ tx-tcp-ecn-segmentation: on
+ tx-tcp-mangleid-segmentation: on
+ tx-tcp6-segmentation: on
+udp-fragmentation-offload: on
+generic-segmentation-offload: on
Ha! Easy fix, let’s just enable it:
$ sudo ethtool -K vlan10 sg on
Actual changes:
tx-checksumming: on
tx-checksum-ip-generic: on
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp-mangleid-segmentation: on
tx-tcp6-segmentation: on
udp-fragmentation-offload: on
R in SRE stands for Reboot
Kafka restarted

Recommended for you

YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance

Talk for YOW! by Brendan Gregg. "Systems performance studies the performance of computing systems, including all physical components and the full software stack to help you find performance wins for your application and kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (ftrace, bcc/BPF, and bpftrace/BPF), advice about what is and isn't important to learn, and case studies to see how it is applied. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud. "

linuxperformance
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use Them

Linux networking tools can be used to analyze network connectivity and performance. Tools like ifconfig show interface configurations, route displays routing tables, arp shows the ARP cache, dig/nslookup resolve DNS, and traceroute traces the network path. Nmap scans for open ports, ping checks latency, and tcpdump captures traffic. Iperf3 and wrk2 can load test throughput and capacity, while tcpreplay replays captured traffic. These CLI tools provide essential network information and testing capabilities from the command line.

networkingtoolslinux
test
testtest
test

This document provides an overview of Linux performance monitoring tools including mpstat, top, htop, vmstat, iostat, free, strace, and tcpdump. It discusses what each tool measures and how to use it to observe system performance and diagnose issues. The tools presented provide visibility into CPU usage, memory usage, disk I/O, network traffic, and system call activity which are essential for understanding workload performance on Linux systems.

pizza
It was a bug in systemd all along
Logs cluster effect
Stretch upgrade
Offload fixed
DNS cluster effect
Stretch upgrade
Offload fixed
Lessons learned
● It’s important to pay closer attention and seemingly unrelated metrics
● Linux kernel can be easily traced with perf and bcc tools
○ Tools work out of the box
○ You don’t have to be a developer
● TCP offload is incredibly important and applies to vlan interfaces
● Switching OS on reboot proved to be useful

Recommended for you

OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF

Talk by Brendan Gregg for OSSNA 2017. "Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will be a dive deep on these new tracing, observability, and debugging capabilities, which sooner or later will be available to everyone who uses Linux. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance. This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."

linuxebpfbpf
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...

The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix. Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime). This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more. Brendan Gregg, Netflix

linux kernelperformanceoptimization
Quic illustrated
Quic illustratedQuic illustrated
Quic illustrated

QUIC is a new transport protocol developed by Google to replace TCP+TLS. It aims to reduce latency by eliminating OSI layers and supporting features like 0-RTT handshakes. The document provides a high-level overview of QUIC including its architecture, use of TLS 1.3, streams for multiplexing data, and support for features like connection migration through the use of connection IDs. It also discusses QUIC's current implementation status and adoption. Examples are given of QUIC packets and the handshake process.

quichttptls
But really it was just an excuse
● Internal blog post about this is from Aug 2017
● External blog post in Cloudflare blog is from May 2018
● All to show where ebpf_exporter can be useful
○ Our tool to export hidden kernel metrics with eBPF
○ Can trace any kernel function and hardware counters
○ IO latency histograms, timer counters, TCP retransmits, etc.
○ Exports data in Prometheus (OpenMetrics) format
Can be nicely visualized with new Grafana
Disk upgrade in production
Thank you
● Blog post this talk is based on
● Github for ebpf_exporter: https://github.com/cloudflare/ebpf_exporter
● Slides for ebpf_exporter talk with presenter notes (and a blog post)
○ Disclaimer: contains statistical dinosaur gifs
● Training on ebpf_exporter with Alexander Huynh
○ Look for “Hidden Linux Metrics with Prometheus eBPF Exporter”
○ Wednesday, Oct 31st, 11:45 - 12:30, Cumberland room 3-4
● We’re hiring
Ivan on twitter: @ibobrik

More Related Content

What's hot

Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
Brendan Gregg
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
Brendan Gregg
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
Brendan Gregg
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
Viller Hsiao
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
Brendan Gregg
 
eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
Michael Kehoe
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
Brendan Gregg
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Anne Nicolas
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
lcplcp1
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
Alex Maestretti
 
Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005
dflexer
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
heut2008
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerun
idsecconf
 
Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDB
bmbouter
 
Troubleshooting Tips from a Docker Support Engineer
Troubleshooting Tips from a Docker Support EngineerTroubleshooting Tips from a Docker Support Engineer
Troubleshooting Tips from a Docker Support Engineer
Jeff Anderson
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
Brendan Gregg
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
Michael Kehoe
 

What's hot (20)

Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerun
 
Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDB
 
Troubleshooting Tips from a Docker Support Engineer
Troubleshooting Tips from a Docker Support EngineerTroubleshooting Tips from a Docker Support Engineer
Troubleshooting Tips from a Docker Support Engineer
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 

Similar to Debugging linux issues with eBPF

Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Ontico
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix
Brendan Gregg
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
Cyber Security Alliance
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux
Naoto MATSUMOTO
 
Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part III
Alkin Tezuysal
 
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
維泰 蔡
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
Brendan Gregg
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
Brendan Gregg
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use Them
Sneha Inguva
 
test
testtest
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Brendan Gregg
 
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Anne Nicolas
 
Quic illustrated
Quic illustratedQuic illustrated
Quic illustrated
Alexander Krizhanovsky
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
Aman Gupta
 
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPFUSENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
Brendan Gregg
 
Openstack 101
Openstack 101Openstack 101
Openstack 101
POSSCON
 
Varnish @ Velocity Ignite
Varnish @ Velocity IgniteVarnish @ Velocity Ignite
Varnish @ Velocity Ignite
Artur Bergman
 
Using Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance TroublesUsing Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance Troubles
ScyllaDB
 
Linux networking
Linux networkingLinux networking
Linux networking
Armando Reis
 
Hacking the swisscom modem
Hacking the swisscom modemHacking the swisscom modem
Hacking the swisscom modem
Cyber Security Alliance
 

Similar to Debugging linux issues with eBPF (20)

Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux
 
Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part III
 
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use Them
 
test
testtest
test
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
 
Quic illustrated
Quic illustratedQuic illustrated
Quic illustrated
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPFUSENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
 
Openstack 101
Openstack 101Openstack 101
Openstack 101
 
Varnish @ Velocity Ignite
Varnish @ Velocity IgniteVarnish @ Velocity Ignite
Varnish @ Velocity Ignite
 
Using Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance TroublesUsing Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance Troubles
 
Linux networking
Linux networkingLinux networking
Linux networking
 
Hacking the swisscom modem
Hacking the swisscom modemHacking the swisscom modem
Hacking the swisscom modem
 

Recently uploaded

Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
welrejdoall
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Bert Blevins
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
 

Recently uploaded (20)

Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
 

Debugging linux issues with eBPF

  • 1. Debugging Linux issues with eBPF One incident from start to finish with dynamic tracing applied
  • 3. What does Cloudflare do CDN Moving content physically closer to visitors with our CDN. Intelligent caching Unlimited DDOS mitigation Unlimited bandwidth at flat pricing with free plans Edge access control IPFS gateway Onion service Website Optimization Making web fast and up to date for everyone. TLS 1.3 (with 0-RTT) HTTP/2 + QUIC Server push AMP Origin load-balancing Smart routing Serverless / Edge Workers Post quantum crypto DNS Cloudflare is the fastest managed DNS providers in the world. 1.1.1.1 2606:4700:4700::1111 DNS over TLS
  • 4. 160+ Data centers globally 4.5M+ DNS requests/s across authoritative, recursive and internal 10% Internet requests everyday 10M+ HTTP requests/second Websites, apps & APIs in 150 countries 10M+ Cloudflare’s anycast network Network capacity 20Tbps
  • 5. 350B+ DNS requests/day across authoritative, recursive and internal 800B+ HTTP requests/day Cloudflare’s anycast network (daily ironic numbers) Network capacity 1.73Ebpd
  • 6. Link to slides with speaker notes Slideshare doesn’t allow links on the first 3 slides
  • 7. Cloudflare is a Debian shop ● All machines were running Debian Jessie on bare metal ● OS boots over PXE into memory, packages and configs are ephemeral ● Kernel can be swapped as easy as OS ● New Stable (stretch) came out, we wanted to keep up ● Very easy to upgrade: ○ Build all packages for both distributions ○ Upgrade machines in groups, look at metrics, fix issues, repeat ○ Gradually phase out Jessie ○ Pop a bottle of champagne and celebrate
  • 8. Cloudflare core Kafka platform at the time ● Kafka is a distributed log with multiple producers and consumers ● 3 clusters: 2 small (dns + logs) with 9 nodes, 1 big (http) with 106 nodes ● 2 x 10C Intel Xeon E5-2630 v4 @ 2.2GHz (40 logical CPUs), 128GB RAM ● 12 x 800GB SSD in RAID0 ● 2 x 10G bonded NIC ● Mostly network bound at ~100Gbps ingress and ~700Gbps egress ● Check out our blog post on Kafka compression ● We also blogged about our Gen 9 edge machines recently
  • 9. Small clusters went ok, big one did not One node upgraded to Stretch
  • 10. Perf to the rescue: “perf top -F 99”
  • 11. RCU stalls in dmesg [ 4923.462841] INFO: rcu_sched self-detected stall on CPU [ 4923.462843] 13-...: (2 GPs behind) idle=ea7/140000000000001/0 softirq=1/2 fqs=4198 [ 4923.462845] (t=8403 jiffies g=110722 c=110721 q=6440)
  • 12. Error logging issues Aug 15 21:51:35 myhost kernel: INFO: rcu_sched detected stalls on CPUs/tasks: Aug 15 21:51:35 myhost kernel: 26-...: (1881 ticks this GP) idle=76f/140000000000000/0 softirq=8/8 fqs=365 Aug 15 21:51:35 myhost kernel: (detected by 0, t=2102 jiffies, g=1837293, c=1837292, q=262) Aug 15 21:51:35 myhost kernel: Task dump for CPU 26: Aug 15 21:51:35 myhost kernel: java R running task 13488 1714 1513 0x00080188 Aug 15 21:51:35 myhost kernel: ffffc9000d1f7898 ffffffff814ee977 ffff88103f410400 000000000000000a Aug 15 21:51:35 myhost kernel: 0000000000000041 ffffffff82203142 ffffc9000d1f78c0 ffffffff814eea10 Aug 15 21:51:35 myhost kernel: 0000000000000041 ffffffff82203142 ffff88103f410400 ffffc9000d1f7920 Aug 15 21:51:35 myhost kernel: Call Trace: Aug 15 21:51:35 myhost kernel: [<ffffffff814ee977>] ? scrup+0x147/0x160 Aug 15 21:51:35 myhost kernel: [<ffffffff814eea10>] ? lf+0x80/0x90 Aug 15 21:51:35 myhost kernel: [<ffffffff814eecb5>] ? vt_console_print+0x295/0x3c0
  • 13. Page allocation failures Aug 16 01:14:51 myhost systemd-journald[13812]: Missed 17171 kernel messages Aug 16 01:14:51 myhost kernel: [<ffffffff81171754>] shrink_inactive_list+0x1f4/0x4f0 Aug 16 01:14:51 myhost kernel: [<ffffffff8117234b>] shrink_node_memcg+0x5bb/0x780 Aug 16 01:14:51 myhost kernel: [<ffffffff811725e2>] shrink_node+0xd2/0x2f0 Aug 16 01:14:51 myhost kernel: [<ffffffff811728ef>] do_try_to_free_pages+0xef/0x310 Aug 16 01:14:51 myhost kernel: [<ffffffff81172be5>] try_to_free_pages+0xd5/0x180 Aug 16 01:14:51 myhost kernel: [<ffffffff811632db>] __alloc_pages_slowpath+0x31b/0xb80 ... [78991.546088] systemd-network: page allocation stalls for 287000ms, order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
  • 14. Downgrade and investigate ● System CPU was up, so it must be the kernel upgrade ● Downgrade Stretch to Jessie ● Downgrade Linux 4.9 to 4.4 (known good, but no allocation stall logging) ● Investigate without affecting customers ● Bisection pointed at OS upgrade, kernel was not responsible
  • 15. Make a flamegraph with perf #!/bin/sh -e # flamegraph-perf [perf args here] > flamegraph.svg # Explicitly setting output and input to perf.data is needed to make perf work over ssh without TTY. perf record -o perf.data "$@" # Fetch JVM stack maps if possible, this requires -XX:+PreserveFramePointer export JAVA_HOME=/usr/lib/jvm/oracle-java8-jdk-amd64 AGENT_HOME=/usr/local/perf-map-agent /usr/local/flamegraph/jmaps 1>&2 IDLE_REGEXPS="^swapper;.*(cpuidle|cpu_idle|cpu_bringup_and_idle|native_safe_halt|xen_hypercall_sched_op|x en_hypercall_vcpu_op)" perf script -i perf.data | /usr/local/flamegraph/stackcollapse-perf.pl --all grep -E -v "$IDLE_REGEXPS" | /usr/local/flamegraph/flamegraph.pl --colors=java --hash --title=$(hostname)
  • 16. Full system flamegraphs point at sendfile Jessie Stretch sendfile
  • 19. eBPF and BCC tools
  • 20. Latency of sendfile on Jessie: < 31us $ sudo /usr/share/bcc/tools/funclatency -uTi 1 do_sendfile Tracing 1 functions for "do_sendfile"... Hit Ctrl-C to end. 23:27:25 usecs : count distribution 0 -> 1 : 9 | | 2 -> 3 : 47 |**** | 4 -> 7 : 53 |***** | 8 -> 15 : 379 |****************************************| 16 -> 31 : 329 |********************************** | 32 -> 63 : 101 |********** | 64 -> 127 : 23 |** | 128 -> 255 : 50 |***** | 256 -> 511 : 7 | |
  • 21. Latency of sendfile on Stretch: < 511us usecs : count distribution 0 -> 1 : 1 | | 2 -> 3 : 20 |*** | 4 -> 7 : 46 |******* | 8 -> 15 : 56 |******** | 16 -> 31 : 65 |********** | 32 -> 63 : 75 |*********** | 64 -> 127 : 75 |*********** | 128 -> 255 : 258 |****************************************| 256 -> 511 : 144 |********************** | 512 -> 1023 : 24 |*** | 1024 -> 2047 : 27 |**** | 2048 -> 4095 : 28 |**** | 4096 -> 8191 : 35 |***** |
  • 22. Number of mod_timer runs # Jessie $ sudo /usr/share/bcc/tools/funccount -T -i 1 mod_timer Tracing 1 functions for "mod_timer"... Hit Ctrl-C to end. 00:33:36 FUNC COUNT mod_timer 60482 00:33:37 FUNC COUNT mod_timer 58263 00:33:38 FUNC COUNT mod_timer 54626 # Stretch $ sudo /usr/share/bcc/tools/funccount -T -i 1 mod_timer Tracing 1 functions for "mod_timer"... Hit Ctrl-C to end. 00:33:28 FUNC COUNT mod_timer 149068 00:33:29 FUNC COUNT mod_timer 155994 00:33:30 FUNC COUNT mod_timer 160688
  • 23. Number of lock_timer_base runs # Jessie $ sudo /usr/share/bcc/tools/funccount -T -i 1 lock_timer_base Tracing 1 functions for "lock_timer_base"... Hit Ctrl-C to end. 00:32:36 FUNC COUNT lock_timer_base 15962 00:32:37 FUNC COUNT lock_timer_base 16261 00:32:38 FUNC COUNT lock_timer_base 15806 # Stretch $ sudo /usr/share/bcc/tools/funccount -T -i 1 lock_timer_base Tracing 1 functions for "lock_timer_base"... Hit Ctrl-C to end. 00:32:32 FUNC COUNT lock_timer_base 119189 00:32:33 FUNC COUNT lock_timer_base 196895 00:32:34 FUNC COUNT lock_timer_base 140085
  • 24. We can trace timer tracepoints with perf $ sudo perf list | fgrep timer: timer:hrtimer_cancel [Tracepoint event] timer:hrtimer_expire_entry [Tracepoint event] timer:hrtimer_expire_exit [Tracepoint event] timer:hrtimer_init [Tracepoint event] timer:hrtimer_start [Tracepoint event] timer:itimer_expire [Tracepoint event] timer:itimer_state [Tracepoint event] timer:tick_stop [Tracepoint event] timer:timer_cancel [Tracepoint event] timer:timer_expire_entry [Tracepoint event] timer:timer_expire_exit [Tracepoint event] timer:timer_init [Tracepoint event] timer:timer_start [Tracepoint event]
  • 25. Number of timers per function # Jessie $ sudo perf record -e timer:timer_start -p 23485 -- sleep 10 && sudo perf script | sed 's/.* function=//g' | awk '{ print $1 }' | sort | uniq -c [ perf record: Woken up 54 times to write data ] [ perf record: Captured and wrote 17.778 MB perf.data (173520 samples) ] 2 clocksource_watchdog 5 cursor_timer_handler 2 dev_watchdog 10 garp_join_timer 2 ixgbe_service_timer 4769 tcp_delack_timer 171 tcp_keepalive_timer 168512 tcp_write_timer # Stretch $ sudo perf record -e timer:timer_start -p 3416 -- sleep 10 && sudo perf script | sed 's/.* function=//g' | awk '{ print $1 }' | sort | uniq -c [ perf record: Woken up 671 times to write data ] [ perf record: Captured and wrote 198.273 MB perf.data (1988650 samples) ] 6 clocksource_watchdog 12 cursor_timer_handler 2 dev_watchdog 18 garp_join_timer 4 ixgbe_service_timer 4622 tcp_delack_timer 1 tcp_keepalive_timer 1983978 tcp_write_timer
  • 27. Number of calls for hot functions # Jessie $ sudo /usr/share/bcc/tools/funccount -T -i 1 tcp_sendmsg Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end. 03:33:33 FUNC COUNT tcp_sendmsg 21166 $ sudo /usr/share/bcc/tools/funccount -T -i 1 tcp_push_one Tracing 1 functions for "tcp_push_one"... Hit Ctrl- C to end. 03:37:14 FUNC COUNT tcp_push_one 496 # Stretch $ sudo /usr/share/bcc/tools/funccount -T -i 1 tcp_sendmsg Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end. 03:33:30 FUNC COUNT tcp_sendmsg 53834 $ sudo /usr/share/bcc/tools/funccount -T -i 1 tcp_push_one Tracing 1 functions for "tcp_push_one"... Hit Ctrl- C to end. 03:37:10 FUNC COUNT tcp_push_one 64483
  • 28. Count stacks leading to tcp_push_one $ sudo stackcount -i 10 tcp_push_one
  • 29. Stacks for tcp_push_one (stackcount) tcp_push_one inet_sendpage kernel_sendpage sock_sendpage pipe_to_sendpage __splice_from_pipe splice_from_pipe generic_splice_sendpage direct_splice_actor splice_direct_to_actor do_splice_direct do_sendfile sys_sendfile64 do_syscall_64 return_from_SYSCALL_64 4950 tcp_push_one inet_sendmsg sock_sendmsg kernel_sendmsg sock_no_sendpage tcp_sendpage inet_sendpage kernel_sendpage sock_sendpage pipe_to_sendpage __splice_from_pipe splice_from_pipe generic_splice_sendpage ... return_from_SYSCALL_64 735110
  • 30. Diff of the most popular stack --- jessie.txt 2017-08-16 21:14:13.000000000 -0700 +++ stretch.txt 2017-08-16 21:14:20.000000000 -0700 @@ -1,4 +1,9 @@ tcp_push_one +inet_sendmsg +sock_sendmsg +kernel_sendmsg +sock_no_sendpage +tcp_sendpage inet_sendpage kernel_sendpage sock_sendpage
  • 31. Let’s look at tcp_sendpage int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { ssize_t res; if (!(sk->sk_route_caps & NETIF_F_SG) || !sk_check_csum_caps(sk)) return sock_no_sendpage(sk->sk_socket, page, offset, size, flags); lock_sock(sk); tcp_rate_check_app_limited(sk); /* is sending application-limited? */ res = do_tcp_sendpages(sk, page, offset, size, flags); release_sock(sk); return res; } what we see on the stack segmentation offload
  • 32. Cloudflare network setup eth2 -->| |--> vlan10 |---> bond0 -->| eth3 -->| |--> vlan100
  • 33. Missing offload settings eth2 -->| |--> vlan10 |---> bond0 -->| eth3 -->| |--> vlan100
  • 34. Compare ethtool -k settings on vlan10 -tx-checksumming: off +tx-checksumming: on - tx-checksum-ip-generic: off + tx-checksum-ip-generic: on -scatter-gather: off - tx-scatter-gather: off +scatter-gather: on + tx-scatter-gather: on -tcp-segmentation-offload: off - tx-tcp-segmentation: off [requested on] - tx-tcp-ecn-segmentation: off [requested on] - tx-tcp-mangleid-segmentation: off [requested on] - tx-tcp6-segmentation: off [requested on] -udp-fragmentation-offload: off [requested on] -generic-segmentation-offload: off [requested on] +tcp-segmentation-offload: on + tx-tcp-segmentation: on + tx-tcp-ecn-segmentation: on + tx-tcp-mangleid-segmentation: on + tx-tcp6-segmentation: on +udp-fragmentation-offload: on +generic-segmentation-offload: on
  • 35. Ha! Easy fix, let’s just enable it: $ sudo ethtool -K vlan10 sg on Actual changes: tx-checksumming: on tx-checksum-ip-generic: on tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: on tx-tcp-mangleid-segmentation: on tx-tcp6-segmentation: on udp-fragmentation-offload: on
  • 36. R in SRE stands for Reboot Kafka restarted
  • 37. It was a bug in systemd all along
  • 38. Logs cluster effect Stretch upgrade Offload fixed
  • 39. DNS cluster effect Stretch upgrade Offload fixed
  • 40. Lessons learned ● It’s important to pay closer attention and seemingly unrelated metrics ● Linux kernel can be easily traced with perf and bcc tools ○ Tools work out of the box ○ You don’t have to be a developer ● TCP offload is incredibly important and applies to vlan interfaces ● Switching OS on reboot proved to be useful
  • 41. But really it was just an excuse ● Internal blog post about this is from Aug 2017 ● External blog post in Cloudflare blog is from May 2018 ● All to show where ebpf_exporter can be useful ○ Our tool to export hidden kernel metrics with eBPF ○ Can trace any kernel function and hardware counters ○ IO latency histograms, timer counters, TCP retransmits, etc. ○ Exports data in Prometheus (OpenMetrics) format
  • 42. Can be nicely visualized with new Grafana Disk upgrade in production
  • 43. Thank you ● Blog post this talk is based on ● Github for ebpf_exporter: https://github.com/cloudflare/ebpf_exporter ● Slides for ebpf_exporter talk with presenter notes (and a blog post) ○ Disclaimer: contains statistical dinosaur gifs ● Training on ebpf_exporter with Alexander Huynh ○ Look for “Hidden Linux Metrics with Prometheus eBPF Exporter” ○ Wednesday, Oct 31st, 11:45 - 12:30, Cumberland room 3-4 ● We’re hiring Ivan on twitter: @ibobrik

Editor's Notes

  1. Hello, Today we’re going to go through one production issue from start to finish and see how we can apply dynamic tracing to get to the bottom of the problem.
  2. My name is Ivan and I work for a company called Cloudflare, where I focus on performance and efficiency of our products.
  3. To give you some context, thise are some key areas Cloudflare specializes in. In addition to being a good old CDN service with free unlimited DDOS protection, we try to be at the front of innovation with technologies like TLS v1.3, QUIC and edge workers, making internet faster and and more secure for end users and website owners. We’re also the fastest authoritative and recursive DNS provider. Our resolver 1.1.1.1 is privacy oriented and supports things like DNS over TLS, stipping intermediates from knowing your DNS requests, not to mention DNSSEC. If you have a website of any size, you should totally put this behind Cloudflare.
  4. Here are some numbers to give you an idea of the scale we operate on. We have 160 datacenters around the world and plan to grow to at least 200 next year. At peak these datacenters process more than 10 million HTTP requests per second. At the same time the very same datacenters serve 4.5 million DNS requests per second across internal and external DNS. That’s a lot of data to analyze and we collect logs into core datacenters for processing and analytics.
  5. I often get frustrated when people show numbers that are not scaled to seconds. I figured I cannot win them, so I may as well just join them Here you see numbers per day. My favorite one is network capaity, which is 1.73 exabytes per day. As you can see, these numbers make no sense. It gets even weirder when different metrics are scaled to different time units. Please don’t use this as a reference, always scale down to second.
  6. Now to set a scene for this talk specifically, it makes sense to tell a little on our hardware and software stack. All machines serving traffic and doing backend analytics are bare metal servers running Debian, at that point in time we were running Jessie. We’re big fans of ephemeral stuff and not a single machine has OS installed on persistent storage. Instead, we boot from a minimal immutable initramfs from network and install all packages and configs on top of that into ramfs with configuration management system. This means that on reboot every machine is clean and OS and kernel can be swapped with just a reboot. And the story starts with my personal desire to update Debian to the latest Stable release, which was Stretch at that time. Our plan for this upgrade was quite simple because of our setup. We can just build all necessary packages for both distributions, switch some group of machines into Stretch, fix what’s broken and carry on to the next group of machines. No need to wipe disks, reinstall anything or deal with dependency issues. We even only needed to build just one OS image as opposed to one image per workload. On the edge every machine is the same, so that part was trivial. In core datacenters where backend out of band processing happens we have different machines doing different workloads, which means we have a more diverse set of metrics to look at, but we can also switch some groups completely faster.
  7. One of such groups was a set of our Kafka clusters. If you’re not familiar with Kafka, it’s basically a distributed log system. Multiple producers append messages to topics and then multiple consumers read those logs. For the most part we’re using it as a queue with a large on-disk buffer that can get us time to fix issues in consumers without losing data. We have three major clusters: DNS and Logs are small with just 9 nodes each, and HTTP is massive with 106 nodes. You can see the specs for HTTP cluster at that time on the slides: 128GB of RAM and two Broadwell Xeon CPUs in NUMA setup with 40 logical CPUs. We opted out for 12 SSDs in RAID0 to prevent IO trashing from consumers falling out of page cache. Disk level redundancy is absent in favor of larger usable disk space and higher throughput, we rely on 3x replication instead. In terms of network we had 2x10G NIC in bonded setup for maximum network throughput. It was not intended to provide any redundancy. We used to have a lot of issues with being network bound, but in the end that was solved by aggressive compression with zstd. Funnily enough, we also opted out to have 2x25G NICs, just because they are cheaper, even though we are not network bound anymore. Check out our blog post about Kafka compression or a recent one about Gen 9 edge servers if you want to learn more.
  8. So we did our upgrade on small Kafka clusters and it went pretty well, at least nobody said anything and user facing metrics looked good. If you were listening to talks yesterday, that’s what apparently should be alerted on, so no alerts fired. On the big HTTP cluster, however, we started seeing issues with consumers timing out and lagging, so we looked closer at the metrics we had. And this is what we saw: one upgraded node was using a lot more CPU than before, 5x more in fact. By itself this is not as big of an issue, you can see that we’re not stressing out CPUs that much. Typica Kafka CPU usage before this upgrade was around 3 logical CPUs out of 40, which leaves a lot of room. Still, having 5x CPU usage was definitely an unexpected outcome. For control datapoints, we compared the problematic machine to another machine where no upgrade happened, and an intermediary node that received a full software stack upgrade on reboot, but not an OS upgrade, which we optimistically bundled with a minor kernel upgrade. Neither of these two nodes experienced the same CPU saturation issues, even though their setups were practically identical.
  9. For debugging CPU saturation issues, we depend on linux perf command to find the cause. It’s included with the kernel and on end user distributions you can install it with package like linux-base or something. The first question that comes to mind when we see CPU saturation issues is what is using the CPU. In tools like top we can see what processes occupy CPU, but with perf you can see which functions inside these processes sit on CPU the most. This covers kernel and user space for well behaved programs that have a way to decode stacks. That includes C/C++ with frame pointers and Go. Here you can see top-like output from perf with the most expensive functions in terms of CPU time. Sorting is a bit confusing, because it sorts by inclusive time, but we’re mostly interested in “self” column, which shows how often the very tip of the stack is on CPU. In this case most of the time is taken by some spinlock slowpath. Spinlocks in the kernel exist to protect critical sections from concurrent access. There are two reasons to use them: * Critical section is small and is not contended * Lock owner cannot sleep (like interrupts cannot do that) If spinlock cannot be acquired, caller burns CPU until it can get hold of the lock. While it may sound like a questionable idea at first, there are legitimate uses for this mechanism. In our situation it seems like spinlock is really contended and half of CPU cycles are not doing useful work. We don’t know what lock is causing this to happen from this output, however. There were also other symptoms, so let’s look at them first.
  10. If anything bad happens in production, it’s always a good idea to have a look at dmesg. Messages there can be cryptic, but they can at least point you in the right direction. Fixing an issue is 95% knowing where to find the issue. In that particular case we saw RCU stalls, where RCU stands for read-copy-update. I’m not exactly an expert in this, but it sounds like another synchronization mechanism and it can be affected by spinlocks we saw before. We've seen rare RCU stalls before, and our (suboptimal) solution was to reboot the machine if no other issues can be found. 99% of the time reboot fixed the issue for a long time. However, one can only handle so many reboots before the problem becomes severe enough to warrant a deep dive. In this case we had other clues.
  11. While looking deeper into dmesg, we noticed issues around writing messages to the console. This suggested that we were logging too many errors, and the actual failure may be earlier in the process. Armed with this knowledge, we looked at the very beginning of the message chain.
  12. And this is what we saw. If you work with NUMA machines, you may immediately see “shrink_node” and have a minor PTSD episode. What you should be looking at is the number of missed kernel messages. There were so many errors, journald wasn’t able to keep up. We have console access to work around that, and that’s where we saw page allocation stalls in the second log except. You don't want your page allocations to stall for 5 minutes, especially when it's order zero allocation, which is the smallest allocation of one 4 KiB page.
  13. Comparing to our control nodes, the only two possible explanations were: a minor kernel upgrade, and the switch from Debian Jessie to Debian Stretch. We suspected the former, since CPU usage implies a kernel issue. Just to be safe, we rolled both the kernel back from 4.9 to a known good 4.4, and downgraded the affected nodes back to Debian Jessie. This was a reasonable compromise, since we needed to minimize downtime on production nodes. Then we proceeded to look into the issue in isolation. To our surprise, after some bisecting we found that OS upgrade alone was responsible for our issues, kernel was off the hook. Now all that remained is to find out what exactly was going on.
  14. Flamegraphs are a great way to visualize stacks that cause CPU usage in the system. We have a wrapper around Brendan Gregg’s flamegraph scripts that removes idle time and enables JVM stacks out of the box. This gives us a way to get an overview of CPU usage in one command.
  15. And this is how full system flamegraphs look like. We have jessie in the background on the left and stretch in the foreground on the right. This may be hard to see, but the idea is that each bar is a stack frame and width corresponds to frequency of this stack’s appearance, which is a proxy for CPU usage. You can see a fat column of frames on the left on Stretch, that’s not present on Jessie. We can see it’s the sendfile syscall and it’s highlighted in purple. It’s also present and highlighted on Jessie, but it’s tiny and quite hard to see. Flamegraphs allow you to click on the frame, which will zoom into stacks containing this frame, generating some sort of a sub-flamegraph.
  16. So let’s click on sendfile on Stretch and see what’s going on.
  17. This is what we saw. For somebody who’s not a kernel developer this just looks like a bunch of TCP stuff, which is exactly what I saw. Some colleagues suggested that the differences in the graphs may be due to TCP offload being disabled, but upon checking our NIC settings, we found that the feature flags were identical. You can also see some spinlocks at the tip of the flamegraph, which reinforces our initial findings with perf top. Let’s see what else we can figure out from here.
  18. To find out what’s going on with the system, we’ll be using bcc tools. Linux kernel has a VM that allows us to attach lightweight and safe probes to trace the kernel. eBPF itself is a hot topic and there are talks that explore it in great detail, slides for this talk link to them if you are interested. To clarify, VM here is more like JVM that provides runtime and not like KVM that provides hardware virtualization. You can compile code down to this VM from any language, so don’t look surprised when one day you’ll see javascript running in the kernel. I warned you. For the sake of brevity let’s just say that there’s a collection of readily available utilities that can help you debug various parts of the kernel and underlying hardware. That collection is called BCC tools and we’re going to use some of these to get to the bottom of our issue. On this slide you can see how different subsystems can be traced with different tools.
  19. To trace latency distributions of sendfile syscalls between Jessie and Stretch, we’re going to use funclatency. It takes a function name and prints exponential latency histogram for the function calls. Here we print latency histogram for do_sendfile, which is sendfile syscall function, in microseconds, every second. You can see that most of the calls on Jessie hover between 8 and 31 microseconds. Is that good or bad? I don’t know, but a good way to find out is to compare against another system.
  20. Now let’s look at what’s going on with Stretch. I had to cut some parts, because histogram was not fitting into the slide. If on Jessie we saw most of the calls complete in under 31 microsecond, here we see that that number is 511 microseconds, that’s a whopping 16x jump in latency.
  21. In the flamegraphs, you can see timers being set at the tip (mod_timer function is responsible for that), with these timers taking locks. We can count number of function calls instead of measuring their latency, and this is where funccount tool comes in. Feeding mod_timer as an argument to it we can see how many function calls there were every second. Here we have Jessie on the left and Stretch on the right. On stretch we installed 3x more timers than on Jessie. That’s not 16x difference, but still something.
  22. If we look at the number of locks taken for these timers by running funccount on lock_timer_base function, we can see an even bigger difference, around 10x this time. To sum up: on Stretch we installed 3x more timers, resulting in 10x the amount of contention. It definitely seems like we’re onto something.
  23. We can look at the kernel source code to figure out which timers are being scheduled based on the flamegraph, but that seems like a tedious task. Instead, we can use perf tool again to gather some stats on this for us. There’s a bunch of tracepoints in the kernel that provide insight into timer subsystem. We’re going to use timer_start for our needs.
  24. Here we record all timers started for 10s and then print function names they were triggering with respective counts. On Stretch we install 12x tcp_write_timer timers, that sounds like something that can cause issues. Remember: we are on a bandwidth bound workload where interface is 20G, that’s a lot of bytes to move.
  25. Taking specific flamegraphs of the timers revealed the differences in their operation. It’s probably hard ro see, but tcp_push_one really stands out on Stretch. Let’s dig in.
  26. The traces showed huge variations of tcp_sendmsg and tcp_push_one within sendfile, which is expected from the flamegraphs before.
  27. To further introspect, we leveraged a kernel feature available since 4.9: an ability to count and aggregate stacks in the kernel. BCC tools include stackcout tool that does exactly that, so let’s take advantage of that.
  28. The most popular Jessie stack is on the left and the most popular Stretch stack is on the right. There were a few much less popular stacks too, but there’s only so much one can fit on the slides. Stretch stack was too long, “…” is the same as highlighted section in Jessie stack. These are mostly the same and it’s not exactly fun to find the difference, so let’s just look at the diff on the next slide.
  29. We see 5 extra functions in the middle of the stack, starting with tcp_sendpage. Time to look at the source code. Usually I just google the function name and it gives me a result to elixir.bootlin.com, where I swap “latest” to my kernel version. Source code there allows you to click on identifiers and jump around the code to navigate.
  30. This is how tcp_sendpage function looks like, I pasted it verbatim from the kernel source. From tcp_sendpage our stack jumps into sock_no_sendpage. If you lookup what NET_F_SG means, you’ll find it’s segmentation offload. Segmentation offload is a technique where kernel doesn’t split TCP stream into packets, but instead offloads this job to a NIC. This makes a big difference when you want to send large chunks of data over high speed links. That’s exactly what we are doing and we definitely want to have offload enabled.
  31. Let’s take a pause and see how we configure network on our machines. Our 2x10G NIC provides eth2 and eth3, which we then bond into bond0 interface. On top of that bond0 we create two vlan interfaces, one for public internet and one for internal network.
  32. It turned out that we had segmentation offload enabled for only a few of our NICs: eth2, eth3, and bond0. When we checked NIC settings for offload earlier, we only checked physical interfaces and bonded one, but ignored vlan interfaces, where offload was indeed missing.
  33. We compared ethtool output for vlan interface and there was our issue in plain sight.
  34. We can just enable TCP offload by enabling scatter-gather (which is what “sg” stands for) and be done with it. Easy, right? Imagine our disappointment when this did not work. So much work with clear indication that this is the cause and the fix did not work.
  35. The last missing piece we found was that offload changes are applied only during connection initiation. We turned Kafka off and back on again to start offloading and immediately saw positive effects, which is green line. This is not 5x change I mentioned at the beginning, because we were experimenting on a lightly loaded node to avoid disruptions.
  36. Our network interfaces are managed by systemd networkd, so it turns out that missing offload settings were a bug in systemd in the end. It’s not clear whether upstream or Debian patches are responsible for this, however. In the meantime, we work around our upstream issue by enabling offload features automatically on boot if they are disabled on VLAN interfaces.
  37. Having a fix enabled, we rebooted our logs Kafka cluster to upgrade to the latest kernel, and on 5 day CPU usage history you can see clear positive results.
  38. On DNS cluster results were more dramatic because of the higher load. On this screenshot only one node is fixed, but you can see how much better it behaves compared to the rest.
  39. The first lesson here is to pay closer attention to metrics during major upgrades. We did not see major CPU changes on moderately loaded cluster and did not expect to see any effects on fully loaded machines. In the end we were not upgrading Kafka, which was main consumer of user CPU, or kernel, which was consuming system CPU. The second lesson is how useful perf and bcc tools were at pointing us to where the issue is. These tools work out of the box, they are safe and do not require any third party kernel modules. More importantly, they do not require operator to be a kernel expert, you just need some basic understanding of concepts. Another lesson is how important TCP offload is and how its importance grows non-linearly with traffic. It was unexpected that supposedly purely virtual vlan interfaces could be affected by offload, but it turned out they were. Challenge your assumptions often, I guess. Lastly, we used our ability to swap OS and kernels on reboot to the fullest. Having no need to install OS meant we didn’t have to reinstall it and could iterate quickly.
  40. Internal blog post about this incident was published in August 2017, heavily truncated external blog post went out in May 2018. That external blog post is what this talk is based on. All of it to illustrate how the tool we wrote can be used. If during debugging we were using bcc tools to count timers firing in the kernel ad hoc, we could’ve had a metric for this and noticed the issue sooner by just seeing an increase on a graph. This is what ebpf_exporter allows you to have: you can trace any function in the kernel (and in userspace) at very low overhead and create metrics in Prometheus format from it. For example, you can have latency histogram for disk io as a metric, which is not normally possible with procfs or anything else.
  41. Here’s a slide from my presentation of ebpf_exporter, which shows the level of detail you can get. On the left you can see IO wait time from /proc/diskstats, which is what Linux provides, and on the right you can see heatmap of IO latency, which is what ebpf_exporter enables. With the histograms you can see how many IOs landed in a particular bucket and things like multimodal distributions can be seen. You can also see how many IOs went above some threshold, allowing you to have alerts on this. Same goes for timers, kernel does not keep count of what is firing anywhere for collection.
  42. That’s all I had to talk about today. On the slides you have some links on the topic. Slides with speaker notes will be available on the LISA18 website and I’ll also tweet the link. I encourage you to look at my talk on ebpf_exporter itself, which goes into details about why histograms are so great. It involves dinosaur gifs in a very scientific way you probably do not expect, so make sure to check that out. My colleague Alex will be doing a training on ebpf_exporter tomorrow if you want to learn more about that, please come and talk to us. Slides have the information on time and location. If you want to learn more about eBPF itself, you can find Brendan Gregg around and ask him as well as myself.