Content caching is one of the most effective ways to dramatically improve the performance of a web site. In this webinar, we’ll deep-dive into NGINX’s caching abilities and investigate the architecture used, debugging techniques and advanced configuration. By the end of the webinar, you’ll be well equipped to configure NGINX to cache content exactly as you need. View full webinar on demand at http://nginx.com/resources/webinars/content-caching-nginx/
BPF of Berkeley Packet Filter mechanism was first introduced in linux in 1997 in version 2.1.75. It has seen a number of extensions of the years. Recently in versions 3.15 - 3.19 it received a major overhaul which drastically expanded it's applicability. This talk will cover how the instruction set looks today and why. It's architecture, capabilities, interface, just-in-time compilers. We will also talk about how it's being used in different areas of the kernel like tracing and networking and future plans.
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
In a world where compute is paramount, it is all too easy to overlook the importance of storage and IO in the performance and optimization of Spark jobs.
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Luca Canali, a data engineer at CERN, presented on performance troubleshooting using Apache Spark metrics at the UnifiedDataAnalytics #SparkAISummit. CERN runs large Hadoop and Spark clusters to process over 300 PB of data from the Large Hadron Collider experiments. Luca discussed how to gather, analyze, and visualize Spark metrics to identify bottlenecks and improve performance.
This document summarizes Chen Yang's presentation on vectorized query execution in Apache Spark at Facebook. The key points are: 1) Spark is the largest SQL query engine at Facebook and uses columnar formats like ORC to improve storage efficiency. 2) Vectorized processing can improve performance over row-at-a-time processing by reducing per-row overhead and improving cache locality. 3) Facebook has implemented a vectorized ORC reader and writer in Spark that shows up to 8x speedup on microbenchmarks compared to the row-at-a-time approach.
eBPF (extended Berkeley Packet Filters) is a modern kernel technology that can be used to introduce dynamic tracing into a system that wasn't prepared or instrumented in any way. The tracing programs run in the kernel, are guaranteed to never crash or hang your system, and can probe every module and function -- from the kernel to user-space frameworks such as Node and Ruby. In this workshop, you will experiment with Linux dynamic tracing first-hand. First, you will explore BCC, the BPF Compiler Collection, which is a set of tools and libraries for dynamic tracing. Many of your tracing needs will be answered by BCC, and you will experiment with memory leak analysis, generic function tracing, kernel tracepoints, static tracepoints in user-space programs, and the "baked" tools for file I/O, network, and CPU analysis. You'll be able to choose between working on a set of hands-on labs prepared by the instructors, or trying the tools out on your own test system. Next, you will hack on some of the bleeding edge tools in the BCC toolkit, and build a couple of simple tools of your own. You'll be able to pick from a curated list of GitHub issues for the BCC project, a set of hands-on labs with known "school solutions", and an open-ended list of problems that need tools for effective analysis. At the end of this workshop, you will be equipped with a toolbox for diagnosing issues in the field, as well as a framework for building your own tools when the generic ones do not suffice.
This talk discusses Linux profiling using perf_events (also called "perf") based on Netflix's use of it. It covers how to use perf to get CPU profiling working and overcome common issues. The speaker will give a tour of perf_events features and show how Netflix uses it to analyze performance across their massive Amazon EC2 Linux cloud. They rely on tools like perf for customer satisfaction, cost optimization, and developing open source tools like NetflixOSS. Key aspects covered include why profiling is needed, a crash course on perf, CPU profiling workflows, and common "gotchas" to address like missing stacks, symbols, or profiling certain languages and events.
This document discusses tracing in the Linux kernel. It describes various tracing mechanisms like ftrace, tracepoints, kprobes, perf, and eBPF. Ftrace allows tracing functions via compiler instrumentation or dynamically. Tracepoints define custom trace events that can be inserted at specific points. Kprobes and related probes like jprobes allow tracing kernel functions. Perf provides performance monitoring capabilities. eBPF enables custom tracing programs to be run efficiently in the kernel via just-in-time compilation. Tracing tools like perf, systemtap, and LTTng provide user interfaces.
Using the new extended Berkley Packet Filter capabilities in Linux to the improve performance of auditing security relevant kernel events around network, file and process actions.
In the Cloud Native community, eBPF is gaining popularity, which can often be the best solution for solving different challenges with deep observability of system. Currently, eBPF is being embraced by major players. Mydbops co-Founder, Kabilesh P.R (MySQL and Mongo Consultant) illustrates on debugging linux issues with eBPF. A brief about BPF & eBPF, BPF internals and the tools in actions for faster resolution.
The document discusses Linux networking architecture and covers several key topics in 3 paragraphs or less: It first describes the basic structure and layers of the Linux networking stack including the network device interface, network layer protocols like IP, transport layer, and sockets. It then discusses how network packets are managed in Linux through the use of socket buffers and associated functions. The document also provides an overview of the data link layer and protocols like Ethernet, PPP, and how they are implemented in Linux.
FOSDEM15 SDN developer room talk DPDK performance How to not just do a demo with DPDK The Intel DPDK provides a platform for building high performance Network Function Virtualization applications. But it is hard to get high performance unless certain design tradeoffs are made. This talk focuses on the lessons learned in creating the Brocade vRouter using DPDK. It covers some of the architecture, locking and low level issues that all have to be dealt with to achieve 80 Million packets per second forwarding.
eBPF is an exciting new technology that is poised to transform Linux performance engineering. eBPF enables users to dynamically and programatically trace any kernel or user space code path, safely and efficiently. However, understanding eBPF is not so simple. The goal of this talk is to give audiences a fundamental understanding of eBPF, how it interconnects existing Linux tracing technologies, and provides a powerful aplatform to solve any Linux performance problem.
Here is a bpftrace program to measure scheduler latency for ICMP echo requests: #!/usr/local/bin/bpftrace kprobe:icmp_send { @start[tid] = nsecs; } kprobe:__netif_receive_skb_core { @diff[tid] = hist(nsecs - @start[tid]); delete(@start[tid]); } END { print(@diff); clear(@diff); } This traces the time between the icmp_send kernel function (when the packet is queued for transmit) and the __netif_receive_skb_core function (when the response packet is received). The
Spark SQL works very well with structured row-based data. Vectorized reader and writer for parquet/orc can make I/O much faster. It also used WholeStageCodeGen to improve the performance by Java JIT code. However Java JIT is usually not working very well on utilizing latest SIMD instructions under complicated queries. Apache Arrow provides columnar in-memory layout and SIMD optimized kernels as well as a LLVM based SQL engine Gandiva. These native based libraries can accelerate Spark SQL by reduce the CPU usage for both I/O and execution.
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg. There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes. This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
Cisco Meraki クラウド ネットワーキング
Highlights of Velocity 2010: http://en.oreilly.com/velocity2010
This document provides tips and best practices for high performance server programming. It discusses avoiding blocking, using efficient algorithms and data structures, separating I/O from business logic, and tuning for bottlenecks. It also covers various I/O models like blocking, non-blocking, and asynchronous I/O. Key aspects of designing high performance servers include using non-blocking I/O, event-driven architectures, and avoiding excessive threading.