Trying to figure out why your application is responding late can be difficult, especially if it is because of interference from the operating system. This talk will briefly go over how to write a C program that can analyze what in the Linux system is interfering with your application. It will use trace-cmd to enable kernel trace events as well as tracing lock functions, and it will then go over a quick tutorial on how to use libtracecmd to read the created trace.dat file to uncover what is the cause of interference to you application.
This document provides information on various debugging and profiling tools that can be used for Ruby including:
- lsof to list open files for a process
- strace to trace system calls and signals
- tcpdump to dump network traffic
- google perftools profiler for CPU profiling
- pprof to analyze profiling data
It also discusses how some of these tools have helped identify specific performance issues with Ruby like excessive calls to sigprocmask and memcpy calls slowing down EventMachine with threads.
This document discusses PostgreSQL and Solaris as a low-cost platform for medium to large scale critical scenarios. It provides an overview of PostgreSQL, highlighting features like MVCC, PITR, and ACID compliance. It describes how Solaris and PostgreSQL integrate well, with benefits like DTrace support, scalability on multicore/multiprocessor systems, and Solaris Cluster support. Examples are given for installing PostgreSQL on Solaris using different methods, configuring zones for isolation, using ZFS for storage, and monitoring performance with DTrace scripts.
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Ontico
Запускаем сервер (БД, Web-сервер или что-то свое собственное) и не получаем желаемый RPS. Запускаем top и видим, что 100% выедается CPU. Что дальше, на что расходуется процессорное время? Можно ли подкрутить какие-то ручки, чтобы улучшить производительность? А если параметр CPU не высокий, то куда смотреть дальше?
Мы рассмотрим несколько сценариев проблем производительности, рассмотрим доступные инструменты анализа производительности и разберемся в методологии оптимизации производительности Linux, ответим на вопрос за какие ручки и как крутить.
Presented at LISA18: https://www.usenix.org/conference/lisa18/presentation/babrou
This is a technical dive into how we used eBPF to solve real-world issues uncovered during an innocent OS upgrade. We'll see how we debugged 10x CPU increase in Kafka after Debian upgrade and what lessons we learned. We'll get from high-level effects like increased CPU to flamegraphs showing us where the problem lies to tracing timers and functions calls in the Linux kernel.
The focus is on tools what operational engineers can use to debug performance issues in production. This particular issue happened at Cloudflare on a Kafka cluster doing 100Gbps of ingress and many multiple of that egress.
The document discusses reverse engineering the firmware of Swisscom's Centro Grande modems. It identifies several vulnerabilities found, including a command overflow issue that allows complete control of the device by exceeding the input buffer, and multiple buffer overflow issues that can be exploited to execute code remotely by crafting specially formatted XML files. Details are provided on the exploitation techniques and timeline of coordination with Swisscom to address the vulnerabilities.
Performance tweaks and tools for Linux (Joe Damato)Ontico
The document discusses various Linux performance analysis tools including lsof to list open files, strace to trace system calls, tcpdump to dump network traffic, perftools from Google for profiling CPU usage, and a Ruby library called perftools.rb for profiling Ruby code. Examples are provided for using these tools to analyze memory usage, slow queries, Ruby interpreter signals, thread scheduling overhead, and identifying hot spots in Ruby web applications.
This document discusses ways to provide quality of service (QoS) without using traditional QoS mechanisms for modern storage systems like Ceph. It describes using techniques like limiting client I/O, traffic shaping at gateways, adjusting iSCSI queue depths, and using Linux traffic control (tc) tools to provide bandwidth caps or inject latency. While upstream Ceph efforts work on native QoS, these techniques can provide some control over performance for different clients in the interim. The document cautions that solutions like tc are complex to configure for many clients and may drop packets.
This document provides information on various debugging and profiling tools that can be used for Ruby including:
- lsof to list open files for a process
- strace to trace system calls and signals
- tcpdump to dump network traffic
- google perftools profiler for CPU profiling
- pprof to analyze profiling data
It also discusses how some of these tools have helped identify specific performance issues with Ruby like excessive calls to sigprocmask and memcpy calls slowing down EventMachine with threads.
This document discusses PostgreSQL and Solaris as a low-cost platform for medium to large scale critical scenarios. It provides an overview of PostgreSQL, highlighting features like MVCC, PITR, and ACID compliance. It describes how Solaris and PostgreSQL integrate well, with benefits like DTrace support, scalability on multicore/multiprocessor systems, and Solaris Cluster support. Examples are given for installing PostgreSQL on Solaris using different methods, configuring zones for isolation, using ZFS for storage, and monitoring performance with DTrace scripts.
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Ontico
Запускаем сервер (БД, Web-сервер или что-то свое собственное) и не получаем желаемый RPS. Запускаем top и видим, что 100% выедается CPU. Что дальше, на что расходуется процессорное время? Можно ли подкрутить какие-то ручки, чтобы улучшить производительность? А если параметр CPU не высокий, то куда смотреть дальше?
Мы рассмотрим несколько сценариев проблем производительности, рассмотрим доступные инструменты анализа производительности и разберемся в методологии оптимизации производительности Linux, ответим на вопрос за какие ручки и как крутить.
Presented at LISA18: https://www.usenix.org/conference/lisa18/presentation/babrou
This is a technical dive into how we used eBPF to solve real-world issues uncovered during an innocent OS upgrade. We'll see how we debugged 10x CPU increase in Kafka after Debian upgrade and what lessons we learned. We'll get from high-level effects like increased CPU to flamegraphs showing us where the problem lies to tracing timers and functions calls in the Linux kernel.
The focus is on tools what operational engineers can use to debug performance issues in production. This particular issue happened at Cloudflare on a Kafka cluster doing 100Gbps of ingress and many multiple of that egress.
The document discusses reverse engineering the firmware of Swisscom's Centro Grande modems. It identifies several vulnerabilities found, including a command overflow issue that allows complete control of the device by exceeding the input buffer, and multiple buffer overflow issues that can be exploited to execute code remotely by crafting specially formatted XML files. Details are provided on the exploitation techniques and timeline of coordination with Swisscom to address the vulnerabilities.
Performance tweaks and tools for Linux (Joe Damato)Ontico
The document discusses various Linux performance analysis tools including lsof to list open files, strace to trace system calls, tcpdump to dump network traffic, perftools from Google for profiling CPU usage, and a Ruby library called perftools.rb for profiling Ruby code. Examples are provided for using these tools to analyze memory usage, slow queries, Ruby interpreter signals, thread scheduling overhead, and identifying hot spots in Ruby web applications.
This document discusses ways to provide quality of service (QoS) without using traditional QoS mechanisms for modern storage systems like Ceph. It describes using techniques like limiting client I/O, traffic shaping at gateways, adjusting iSCSI queue depths, and using Linux traffic control (tc) tools to provide bandwidth caps or inject latency. While upstream Ceph efforts work on native QoS, these techniques can provide some control over performance for different clients in the interim. The document cautions that solutions like tc are complex to configure for many clients and may drop packets.
The document summarizes Maycon Vitali's presentation on hacking embedded devices. It includes an agenda covering extracting firmware from devices using tools like BusPirate and flashrom, decompressing firmware to view file systems and binaries, emulating binaries using QEMU, reverse engineering code to find vulnerabilities, and details four vulnerabilities discovered in Ubiquiti networking devices designated as CVEs. The presentation aims to demonstrate common weaknesses in embedded device security and how tools can be used to analyze and hack these ubiquitous connected systems.
- The document discusses various Linux system log files such as /var/log/messages, /var/log/secure, and /var/log/cron and provides examples of log entries.
- It also covers log rotation tools like logrotate and logwatch that are used to manage log files.
- Networking topics like IP addressing, subnet masking, routing, ARP, and tcpdump for packet sniffing are explained along with examples.
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...Anne Nicolas
Having just an RTOS is not enough for a real-time system. The hardware must be deterministic as well as the applications that run on the system. When you are missing deadlines, the first thing that must be done is to find what is the source of the latency that caused the issue. It could be the hardware, the operating system or the application, or even a combination of the above. This talk will discuss how to determine where the latency is using tools that come with the Linux Kernel, and will explain a few cases that caused issues.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
This document provides an overview of Linux performance monitoring tools including mpstat, top, htop, vmstat, iostat, free, strace, and tcpdump. It discusses what each tool measures and how to use it to observe system performance and diagnose issues. The tools presented provide visibility into CPU usage, memory usage, disk I/O, network traffic, and system call activity which are essential for understanding workload performance on Linux systems.
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringNETWAYS
Nowadays system administrators have great choices when it comes down to Linux performance profiling and monitoring. The challenge is to pick the appropriate tools and interpret their results correctly.
This talk is a chance to take a tour through various performance profiling and benchmarking tools, focusing on their benefit for every sysadmin.
More than 25 different tools are presented. Ranging from well known tools like strace, iostat, tcpdump or vmstat to new features like Linux tracepoints or perf_events. You will also learn which tools can be monitored by Icinga and which monitoring plugins are already available for that.
At the end the goal is to gather reference points to look at, whenever you are faced with performance problems.
Take the chance to close your knowledge gaps and learn how to get the most out of your system.
A 2015 presentation to introduce users to Java profiling. The Yourkit Profiler is used for concrete examples. The following topics are covered:
1) When to profile
2) Profiler sampling
3) Profiler instrumentation
4) Where to Start
5) Macro vs micro benchmarking
A close encounter_with_real_world_and_odd_perf_issuesRiyaj Shamsudeen
This document discusses a performance issue where a database experienced high CPU usage in the kernel mode. Tracing tools identified that detaching from multiple shared memory segments during connection release was causing the high CPU. The database server had a NUMA architecture, causing the database instance to create multiple shared memory segments across NUMA nodes. Increasing the shared memory size limit did not resolve it, as the instance was optimizing for NUMA.
The document discusses exploiting a vulnerability in Cisco ASA firewall devices. It begins with background on the target device and vulnerability, then outlines steps for getting access to the firmware, debugging the target, and identifying the vulnerability through static and dynamic analysis. The document then covers techniques for triggering the vulnerability and developing a controlled exploit to achieve remote code execution without user interaction.
This document provides an introduction to DTrace and discusses its key features and capabilities. It covers:
1. What DTrace is and how it can be used to trace operating systems and programs with very low overhead.
2. The different ways DTrace can be used, including tracing system calls, kernel functions, user processes, and custom probes added to programs.
3. How DTrace scripts are structured using probes, filters, and actions. Variables that can be used like timestamps.
4. Examples of using DTrace to trace network activity by probe name, argument definitions, and creating DTrace programs.
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...Yauheni Akhotnikau
Description of a demo project for serving images by using Actor Model and embedded HTTP-server. This project is implemented in C++17 with SObjectizer and RESTinio (OpenSource products from stiffstream).
Creating "Secure" PHP applications, Part 2, Server Hardeningarchwisp
The document provides guidance on server hardening techniques. It discusses using netstat to view listening services on a server and using update-rc.d or chkconfig to disable unnecessary services from starting at boot. It also recommends enabling access control lists (ACLs) in file system mounts, using SELinux or AppArmor to enforce mandatory access controls, and setting reasonable PHP memory limits to prevent potential denial of service attacks. The document stresses the importance of only allowing approved applications to execute and knowing the resource limits of the server to avoid potential outages.
Varnish is an HTTP accelerator that acts as a reverse proxy and cache. It is very fast due to being open source and outsourcing tasks to kernel functions. It relies on a massively multithreaded architecture that is partly event driven. It maps the cache store into memory using mmap and writes directly from mapped memory for maximum performance. Logging includes all request headers. Wikia uses Varnish across 4 datacenters with rapid cache invalidations and a RabbitMQ queue to handle invalidations. SSDs and tuning help optimize performance.
This document discusses the crash reporting mechanism in Tizen. It describes the crash client, which handles crash signals and generates crash reports. It covers Samsung's crash-work-sdk and Intel's corewatcher crash clients. It also discusses the crash server that receives reports and the CrashDB web interface. Finally, it mentions crash reason location algorithms.
While probably the most prominent, Docker is not the only tool for building and managing containers. Originally meant to be a "chroot on steroids" to help debug systemd, systemd-nspawn provides a fairly uncomplicated approach to work with containers. Being part of systemd, it is available on most recent distributions out-of-the-box and requires no additional dependencies.
This deck will introduce a few concepts involved in containers and will guide you through the steps of building a container from scratch. The payload will be a simple service, which will be automatically activated by systemd when the first request arrives.
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
This document provides an overview of Brendan Gregg's presentation on BPF performance analysis at Netflix. It discusses:
- Why BPF is changing the Linux OS model to become more event-based and microkernel-like.
- The internals of BPF including its origins, instruction set, execution model, and how it is integrated into the Linux kernel.
- How BPF enables a new class of custom, efficient, and safe performance analysis tools for analyzing various Linux subsystems like CPUs, memory, disks, networking, applications, and the kernel.
- Examples of specific BPF-based performance analysis tools developed by Netflix, AWS, and others for analyzing tasks, scheduling, page faults
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingScyllaDB
The document compares the performance of running a MySQL database benchmark (Sysbench) on virtual machines versus bare metal machines. On Fedora, the benchmark achieved 6-7% higher transactions per second, queries per second, and lower latency when run on the bare metal host compared to the virtual machine guest. Similarly, on Debian, the benchmark achieved significantly higher transactions per second (over 500 vs under 80) and lower latency when run on bare metal. Tracing tools like trace-cmd can be used to analyze the additional overhead introduced by the virtualization layer.
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
Talk by Brendan Gregg for OSSNA 2017. "Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will be a dive deep on these new tracing, observability, and debugging capabilities, which sooner or later will be available to everyone who uses Linux. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...ScyllaDB
In this presentation, we explore how standard profiling and monitoring methods may fall short in identifying bottlenecks in low-latency data ingestion workflows. Instead, we showcase the power of simple yet clever methods that can uncover hidden performance limitations.
Attendees will discover unconventional techniques, including clever logging, targeted instrumentation, and specialized metrics, to pinpoint bottlenecks accurately. Real-world use cases will be presented to demonstrate the effectiveness of these methods. By the end of the session, attendees will be equipped with alternative approaches to identify bottlenecks and optimize their low-latency data ingestion workflows for high throughput.
Mitigating the Impact of State Management in Cloud Stream Processing SystemsScyllaDB
Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states.
In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing.
Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.
More Related Content
Similar to Using Libtracecmd to Analyze Your Latency and Performance Troubles
The document summarizes Maycon Vitali's presentation on hacking embedded devices. It includes an agenda covering extracting firmware from devices using tools like BusPirate and flashrom, decompressing firmware to view file systems and binaries, emulating binaries using QEMU, reverse engineering code to find vulnerabilities, and details four vulnerabilities discovered in Ubiquiti networking devices designated as CVEs. The presentation aims to demonstrate common weaknesses in embedded device security and how tools can be used to analyze and hack these ubiquitous connected systems.
- The document discusses various Linux system log files such as /var/log/messages, /var/log/secure, and /var/log/cron and provides examples of log entries.
- It also covers log rotation tools like logrotate and logwatch that are used to manage log files.
- Networking topics like IP addressing, subnet masking, routing, ARP, and tcpdump for packet sniffing are explained along with examples.
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...Anne Nicolas
Having just an RTOS is not enough for a real-time system. The hardware must be deterministic as well as the applications that run on the system. When you are missing deadlines, the first thing that must be done is to find what is the source of the latency that caused the issue. It could be the hardware, the operating system or the application, or even a combination of the above. This talk will discuss how to determine where the latency is using tools that come with the Linux Kernel, and will explain a few cases that caused issues.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
This document provides an overview of Linux performance monitoring tools including mpstat, top, htop, vmstat, iostat, free, strace, and tcpdump. It discusses what each tool measures and how to use it to observe system performance and diagnose issues. The tools presented provide visibility into CPU usage, memory usage, disk I/O, network traffic, and system call activity which are essential for understanding workload performance on Linux systems.
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringNETWAYS
Nowadays system administrators have great choices when it comes down to Linux performance profiling and monitoring. The challenge is to pick the appropriate tools and interpret their results correctly.
This talk is a chance to take a tour through various performance profiling and benchmarking tools, focusing on their benefit for every sysadmin.
More than 25 different tools are presented. Ranging from well known tools like strace, iostat, tcpdump or vmstat to new features like Linux tracepoints or perf_events. You will also learn which tools can be monitored by Icinga and which monitoring plugins are already available for that.
At the end the goal is to gather reference points to look at, whenever you are faced with performance problems.
Take the chance to close your knowledge gaps and learn how to get the most out of your system.
A 2015 presentation to introduce users to Java profiling. The Yourkit Profiler is used for concrete examples. The following topics are covered:
1) When to profile
2) Profiler sampling
3) Profiler instrumentation
4) Where to Start
5) Macro vs micro benchmarking
A close encounter_with_real_world_and_odd_perf_issuesRiyaj Shamsudeen
This document discusses a performance issue where a database experienced high CPU usage in the kernel mode. Tracing tools identified that detaching from multiple shared memory segments during connection release was causing the high CPU. The database server had a NUMA architecture, causing the database instance to create multiple shared memory segments across NUMA nodes. Increasing the shared memory size limit did not resolve it, as the instance was optimizing for NUMA.
The document discusses exploiting a vulnerability in Cisco ASA firewall devices. It begins with background on the target device and vulnerability, then outlines steps for getting access to the firmware, debugging the target, and identifying the vulnerability through static and dynamic analysis. The document then covers techniques for triggering the vulnerability and developing a controlled exploit to achieve remote code execution without user interaction.
This document provides an introduction to DTrace and discusses its key features and capabilities. It covers:
1. What DTrace is and how it can be used to trace operating systems and programs with very low overhead.
2. The different ways DTrace can be used, including tracing system calls, kernel functions, user processes, and custom probes added to programs.
3. How DTrace scripts are structured using probes, filters, and actions. Variables that can be used like timestamps.
4. Examples of using DTrace to trace network activity by probe name, argument definitions, and creating DTrace programs.
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...Yauheni Akhotnikau
Description of a demo project for serving images by using Actor Model and embedded HTTP-server. This project is implemented in C++17 with SObjectizer and RESTinio (OpenSource products from stiffstream).
Creating "Secure" PHP applications, Part 2, Server Hardeningarchwisp
The document provides guidance on server hardening techniques. It discusses using netstat to view listening services on a server and using update-rc.d or chkconfig to disable unnecessary services from starting at boot. It also recommends enabling access control lists (ACLs) in file system mounts, using SELinux or AppArmor to enforce mandatory access controls, and setting reasonable PHP memory limits to prevent potential denial of service attacks. The document stresses the importance of only allowing approved applications to execute and knowing the resource limits of the server to avoid potential outages.
Varnish is an HTTP accelerator that acts as a reverse proxy and cache. It is very fast due to being open source and outsourcing tasks to kernel functions. It relies on a massively multithreaded architecture that is partly event driven. It maps the cache store into memory using mmap and writes directly from mapped memory for maximum performance. Logging includes all request headers. Wikia uses Varnish across 4 datacenters with rapid cache invalidations and a RabbitMQ queue to handle invalidations. SSDs and tuning help optimize performance.
This document discusses the crash reporting mechanism in Tizen. It describes the crash client, which handles crash signals and generates crash reports. It covers Samsung's crash-work-sdk and Intel's corewatcher crash clients. It also discusses the crash server that receives reports and the CrashDB web interface. Finally, it mentions crash reason location algorithms.
While probably the most prominent, Docker is not the only tool for building and managing containers. Originally meant to be a "chroot on steroids" to help debug systemd, systemd-nspawn provides a fairly uncomplicated approach to work with containers. Being part of systemd, it is available on most recent distributions out-of-the-box and requires no additional dependencies.
This deck will introduce a few concepts involved in containers and will guide you through the steps of building a container from scratch. The payload will be a simple service, which will be automatically activated by systemd when the first request arrives.
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
This document provides an overview of Brendan Gregg's presentation on BPF performance analysis at Netflix. It discusses:
- Why BPF is changing the Linux OS model to become more event-based and microkernel-like.
- The internals of BPF including its origins, instruction set, execution model, and how it is integrated into the Linux kernel.
- How BPF enables a new class of custom, efficient, and safe performance analysis tools for analyzing various Linux subsystems like CPUs, memory, disks, networking, applications, and the kernel.
- Examples of specific BPF-based performance analysis tools developed by Netflix, AWS, and others for analyzing tasks, scheduling, page faults
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingScyllaDB
The document compares the performance of running a MySQL database benchmark (Sysbench) on virtual machines versus bare metal machines. On Fedora, the benchmark achieved 6-7% higher transactions per second, queries per second, and lower latency when run on the bare metal host compared to the virtual machine guest. Similarly, on Debian, the benchmark achieved significantly higher transactions per second (over 500 vs under 80) and lower latency when run on bare metal. Tracing tools like trace-cmd can be used to analyze the additional overhead introduced by the virtualization layer.
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
Talk by Brendan Gregg for OSSNA 2017. "Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will be a dive deep on these new tracing, observability, and debugging capabilities, which sooner or later will be available to everyone who uses Linux. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
Similar to Using Libtracecmd to Analyze Your Latency and Performance Troubles (20)
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...ScyllaDB
In this presentation, we explore how standard profiling and monitoring methods may fall short in identifying bottlenecks in low-latency data ingestion workflows. Instead, we showcase the power of simple yet clever methods that can uncover hidden performance limitations.
Attendees will discover unconventional techniques, including clever logging, targeted instrumentation, and specialized metrics, to pinpoint bottlenecks accurately. Real-world use cases will be presented to demonstrate the effectiveness of these methods. By the end of the session, attendees will be equipped with alternative approaches to identify bottlenecks and optimize their low-latency data ingestion workflows for high throughput.
Mitigating the Impact of State Management in Cloud Stream Processing SystemsScyllaDB
Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states.
In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing.
Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.
Measuring the Impact of Network Latency at TwitterScyllaDB
Widya Salim and Victor Ma will outline the causal impact analysis, framework, and key learnings used to quantify the impact of reducing Twitter's network latency.
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...ScyllaDB
BlazingMQ is a new open source* distributed message queuing system developed at and published by Bloomberg. It provides highly-performant queues to applications for asynchronous, efficient, and reliable communication. This system has been used at scale at Bloomberg for eight years, where it moves terabytes of data and billions of messages across tens of thousands of queues in production every day.
BlazingMQ provides highly-available, fault-tolerant queues courtesy of replication based on the Raft consensus algorithm. In addition, it provides a rich set of enterprise message routing strategies, enabling users to implement a variety of scenarios for message processing.
Written in C++ from the ground up, BlazingMQ has been architected with low latency as one of its core requirements. This has resulted in some unique design and implementation choices at all levels of the system, such as its lock-free threading model, custom memory allocators, compact wire protocol, multi-hop network topology, and more.
This talk will provide an overview of BlazingMQ. We will then delve into the system’s core design principles, architecture, and implementation details in order to explore the crucial role they play in its performance and reliability.
*BlazingMQ will be released as open source between now and P99 (exact timing is still TBD)
Noise Canceling RUM by Tim Vereecke, AkamaiScyllaDB
Noisy Real User Monitoring (RUM) data can ruin your P99!
We introduce a fresh concept called ""Human Visible Navigations"" (HVN) to tackle this risk; we focus on the experiences you actually care about when talking about the speed of our sites:
- Human: We exclude noise coming from bots and synthetic measurements.
- Visible: We remove any partial or fully hidden experiences. These tend to be very slow but users don’t see this slowness.
- Navigations: We ignore lightning fast back-forward navigations which usually have few optimisation opportunities.
Adopting Human Visible Navigations provides you with these key benefits:
- Fewer changes staying below the radar
- Fewer data fluctuations
- Fewer blindspots when finding bottlenecks
- Better correlation with business metrics
This is supported by plenty of real world examples coming from the world's largest scale modeling site (6M Monthly visits) in combination with aggregated data from the brand new rumarchive.com (open source)
After attending this session; your P99 and other percentiles will become less noisy and easier to tune!
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...ScyllaDB
In this session, Tanel introduces a new open source eBPF tool for efficiently sampling both on-CPU events and off-CPU events for every thread (task) in the OS. Linux standard performance tools (like perf) allow you to easily profile on-CPU threads doing work, but if we want to include the off-CPU timing and reasons for the full picture, things get complicated. Combining eBPF task state arrays with periodic sampling for profiling allows us to get both a system-level overview of where threads spend their time, even when blocked and sleeping, and allow us to drill down into individual thread level, to understand why.
Performance Budgets for the Real World by Tammy EvertsScyllaDB
Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works, what doesn’t, and what we need to improve. In this session, Tammy revisits old assumptions about performance budgets and offers some new best practices. Topics include:
• Understanding performance budgets vs. performance goals
• Aligning budgets with user experience
• Pros and cons of Core Web Vitals
• How to stay on top of your budgets to fight regressions
Reducing P99 Latencies with Generational ZGCScyllaDB
With the low-latency garbage collector ZGC, GC pause times are no longer a big problem in Java. With sub-millisecond pause times there are instead other things in the GC and JVM that can cause application threads to experience unexpected latencies. This talk will dig into a specific use where the GC pauses are no longer the cause of unexpected latencies and look at how adding generations to ZGC help lower the p99 application latencies.
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000XScyllaDB
Linters are a type of database! They are a collection of lint rules — queries that look for rule violations to report — plus a way to execute those queries over a source code dataset.
This is a case study about using database ideas to build a linter that looks for breaking changes in Rust library APIs. Maintainability and performance are key: new Rust releases tend to have mutually-incompatible ways of representing API information, and we cannot afford to reimplement and optimize dozens of rules for each Rust version separately. Fortunately, databases don't require rewriting queries when the underlying storage format or query plan changes! This allows us to ship massive optimizations and support multiple Rust versions without making any changes to the queries that describe lint rules.
Ship now, optimize later"" can be a sustainable development practice after all — join us to see how!
How Netflix Builds High Performance Applications at Global ScaleScyllaDB
We all want to build applications that are blazingly fast. We also want to scale them to users all over the world. Can the two happen together? Can users in the slowest of environments also get a fast experience? Learn how we do this at Netflix: how we understand every user's needs and preferences and build high performance applications that work for every user, every time.
Conquering Load Balancing: Experiences from ScyllaDB DriversScyllaDB
Load balancing seems simple on the surface, with algorithms like round-robin, but the real world loves throwing curveballs. Join me in this session as we delve into the intricacies of load balancing within ScyllaDB Drivers. Discover firsthand experiences from our journey in driver development, where we employed the Power of Two Choices algorithm, optimized the implementation of load balancing in Rust Driver, mitigated cloud costs through zone-aware load balancing and combated the issue of overloading a particular core of ScyllaDB. Be prepared to delve into the practical and theoretical aspects of load balancing, gaining valuable insights along the way.
Interaction Latency: Square's User-Centric Mobile Performance MetricScyllaDB
Mobile performance metrics often take inspiration from the backend world and measure resource usage (CPU usage, memory usage, etc) and workload durations (how long a piece of code takes to run).
However, mobile apps are used by humans and the app performance directly impacts their experience, so we should primarily track user-centric mobile performance metrics. Following the lead of tech giants, the mobile industry at large is now adopting the tracking of app launch time and smoothness (jank during motion).
At Square, our customers spend most of their time in the app long after it's launched, and they don't scroll much, so app launch time and smoothness aren't critical metrics. What should we track instead?
This talk will introduce you to Interaction Latency, a user-centric mobile performance metric inspired from the Web Vital metric Interaction to Next Paint"" (web.dev/inp). We'll go over why apps need to track this, how to properly implement its tracking (it's tricky!), how to aggregate this metric and what thresholds you should target.
How to Avoid Learning the Linux-Kernel Memory ModelScyllaDB
The Linux-kernel memory model (LKMM) is a powerful tool for developing highly concurrent Linux-kernel code, but it also has a steep learning curve. Wouldn't it be great to get most of LKMM's benefits without the learning curve?
This talk will describe how to do exactly that by using the standard Linux-kernel APIs (locking, reference counting, RCU) along with a simple rules of thumb, thus gaining most of LKMM's power with less learning. And the full LKMM is always there when you need it!
99.99% of Your Traces are Trash by Paige CruzScyllaDB
Distributed tracing is still finding its footing in many organizations today, one challenge to overcome is the data volume - keeping 100% of your traces is expensive and unnecessary. Enter sampling - head vs tail how do you decide? Let’s look at the design of Sifter and get familiar with why tail-based sampling is the way to enact a cost-effective tracing solution while actually increasing the system’s observability.
Square's Lessons Learned from Implementing a Key-Value Store with RaftScyllaDB
To put it simply, Raft is used to make a use case (e.g., key-value store, indexing system) more fault tolerant to increase availability using replication (despite server and network failures). Raft has been gaining ground due to its simplicity without sacrificing consistency and performance.
Although we'll cover Raft's building blocks, this is not about the Raft algorithm; it is more about the micro-lessons one can learn from building fault-tolerant, strongly consistent distributed systems using Raft. Things like majority agreement rule (quorum), write-ahead log, split votes & randomness to reduce contention, heartbeats, split-brain syndrome, snapshots & logs replay, client requests dedupe & idempotency, consistency guarantees (linearizability), leases & stale reads, batching & streaming, parallelizing persisting & broadcasting, version control, and more!
And believe it or not, you might be using some of these techniques without even realizing it!
This is inspired by Raft paper (raft.github.io), publications & courses on Raft, and an attempt to implement a key-value store using Raft as a side project.
A Deep Dive Into Concurrent React by Matheus AlbuquerqueScyllaDB
Writing fluid user interfaces becomes more and more challenging as the application complexity increases. In this talk, we’ll explore how proper scheduling improves your app’s experience by diving into some of the concurrent React features, understanding their rationales, and how they work under the hood.
The Latency Stack: Discovering Surprising Sources of LatencyScyllaDB
Usually, when an API call is slow, developers blame ourselves and our code. We held a lock too long, or used a blocking operation, or built an inefficient query. But often, the simple picture of latency as “the time a server takes to process a message” hides a great deal of end-to-end complexity. Debugging tail latencies requires unpacking the abstractions that we normally ignore: virtualization, hidden queues, and network behavior.
In this talk, I’ll describe how developers can diagnose more sources of delay and failure by building a more realistic and broad understanding of networked services. I’ll give some real-world cases when high end-to-end latency or elevated failure rates occurred due to factors we ordinarily might not even measure. Some examples include TCP SYN retransmission; virtualization on the client; and surprising behavior from AWS load balancers. Unfortunately, many measurement techniques don’t cover anything but the portion most directly under developer control. But developers can do better by comparing multiple measurements, applying Little’s law, investing in eBPF probes, and paying attention to the network layer.
Understanding API performance to find and fix issues faster ultimately means understanding the entire stack: the client, your code, and the underlying infrastructure.
From its vantage point in the kernel, eBPF provides a platform for building a new generation of infrastructure tools for things like observability, security and networking. These kinds of facilities used to be implemented as libraries, and then in container environments they were often deployed as sidecars. In this talk let's consider why eBPF can offer numerous advantages over these models, particularly when it comes to performance.
Dev Dives: Mining your data with AI-powered Continuous DiscoveryUiPathCommunity
Want to learn how AI and Continuous Discovery can uncover impactful automation opportunities? Watch this webinar to find out more about UiPath Discovery products!
Watch this session and:
👉 See the power of UiPath Discovery products, including Process Mining, Task Mining, Communications Mining, and Automation Hub
👉 Watch the demo of how to leverage system data, desktop data, or unstructured communications data to gain deeper understanding of existing processes
👉 Learn how you can benefit from each of the discovery products as an Automation Developer
🗣 Speakers:
Jyoti Raghav, Principal Technical Enablement Engineer @UiPath
Anja le Clercq, Principal Technical Enablement Engineer @UiPath
⏩ Register for our upcoming Dev Dives July session: Boosting Tester Productivity with Coded Automation and Autopilot™
👉 Link: https://bit.ly/Dev_Dives_July
This session was streamed live on June 27, 2024.
Check out all our upcoming Dev Dives 2024 sessions at:
🚩 https://bit.ly/Dev_Dives_2024
In this follow-up session on knowledge and prompt engineering, we will explore structured prompting, chain of thought prompting, iterative prompting, prompt optimization, emotional language prompts, and the inclusion of user signals and industry-specific data to enhance LLM performance.
Join EIS Founder & CEO Seth Earley and special guest Nick Usborne, Copywriter, Trainer, and Speaker, as they delve into these methodologies to improve AI-driven knowledge processes for employees and customers alike.
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Test Management as Chapter 5 of ISTQB Foundation. Topics covered are Test Organization, Test Planning and Estimation, Test Monitoring and Control, Test Execution Schedule, Test Strategy, Risk Management, Defect Management
This slide deck is a deep dive the Salesforce latest release - Summer 24, by the famous Stephen Stanley. He has examined the release notes very carefully, and summarised them for the Wellington Salesforce user group, virtual meeting June 27 2024.
9 Ways Pastors Will Use AI Everyday By 2029
These future use cases are only a handful of the many many options generative AI is providing pastors and leaders everywhere. If you learn how AI might enhance and support your ministry, you'll enter into a world that's full of hope for the Gospel.
Learn more at http://www.AIforChurchLeaders.com and http://www.churchtechtoday.com
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
How to Improve Your Ability to Solve Complex Performance ProblemsScyllaDB
This talk is really about problem solving. It’s about how we think about problems and how we resolve those problems in a deeply technical context. The main goal of the talk is the relay the lessons learned from a couple of decades working with and observing some of the best performance troubleshooters in the world.
The talk will be broken into 3 main parts.
1. Explain the basic process we must go through to solve a complex performance problem
2. Discuss some of the main factors that can inhibit our efforts
3. Discuss some of the techniques we can apply to improve our chances, including an almost fool proof method to reach a successful outcome
Specific technical examples from large enterprise customers using relational databases (Oracle primarily) will be used to illustrate the concepts.
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
Blockchain and Cyber Defense Strategies in new genre timesanupriti
Explore robust defense strategies at the intersection of blockchain technology and cybersecurity. This presentation delves into proactive measures and innovative approaches to safeguarding blockchain networks against evolving cyber threats. Discover how secure blockchain implementations can enhance resilience, protect data integrity, and ensure trust in digital transactions. Gain insights into cutting-edge security protocols and best practices essential for mitigating risks in the blockchain ecosystem.
The presentation will delve into the ASIMOV project, a novel initiative that leverages Retrieval-Augmented Generation (RAG) to provide precise, domain-specific assistance to telecommunications engineers and technicians. The session will focus on the unique capabilities of Milvus, the chosen vector database for the project, and its advantages over other vector databases.
Attending this session will give you a deeper understanding of the potential of RAG and Milvus DB in telecommunications engineering. You will learn how to address common challenges in the field and enhance the efficiency of their operations. The session will equip you with the knowledge to make informed decisions about the choice of vector databases, and how best to use them for your use-cases
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecJames Anderson
The lecture titled "Automating AppSec" delves into the critical challenges associated with manual application security (AppSec) processes and outlines strategic approaches for incorporating automation to enhance efficiency, accuracy, and scalability. The lecture is structured to highlight the inherent difficulties in traditional AppSec practices, emphasizing the labor-intensive triage of issues, the complexity of identifying responsible owners for security flaws, and the challenges of implementing security checks within CI/CD pipelines. Furthermore, it provides actionable insights on automating these processes to not only mitigate these pains but also to enable a more proactive and scalable security posture within development cycles.
The Pains of Manual AppSec:
This section will explore the time-consuming and error-prone nature of manually triaging security issues, including the difficulty of prioritizing vulnerabilities based on their actual risk to the organization. It will also discuss the challenges in determining ownership for remediation tasks, a process often complicated by cross-functional teams and microservices architectures. Additionally, the inefficiencies of manual checks within CI/CD gates will be examined, highlighting how they can delay deployments and introduce security risks.
Automating CI/CD Gates:
Here, the focus shifts to the automation of security within the CI/CD pipelines. The lecture will cover methods to seamlessly integrate security tools that automatically scan for vulnerabilities as part of the build process, thereby ensuring that security is a core component of the development lifecycle. Strategies for configuring automated gates that can block or flag builds based on the severity of detected issues will be discussed, ensuring that only secure code progresses through the pipeline.
Triaging Issues with Automation:
This segment addresses how automation can be leveraged to intelligently triage and prioritize security issues. It will cover technologies and methodologies for automatically assessing the context and potential impact of vulnerabilities, facilitating quicker and more accurate decision-making. The use of automated alerting and reporting mechanisms to ensure the right stakeholders are informed in a timely manner will also be discussed.
Identifying Ownership Automatically:
Automating the process of identifying who owns the responsibility for fixing specific security issues is critical for efficient remediation. This part of the lecture will explore tools and practices for mapping vulnerabilities to code owners, leveraging version control and project management tools.
Three Tips to Scale the Shift Left Program:
Finally, the lecture will offer three practical tips for organizations looking to scale their Shift Left security programs. These will include recommendations on fostering a security culture within development teams, employing DevSecOps principles to integrate security throughout the development
Tool Support for Testing as Chapter 6 of ISTQB Foundation 2018. Topics covered are Tool Benefits, Test Tool Classification, Benefits of Test Automation and Risk of Test Automation
Leveraging AI for Software Developer Productivity.pptxpetabridge
Supercharge your software development productivity with our latest webinar! Discover the powerful capabilities of AI tools like GitHub Copilot and ChatGPT 4.X. We'll show you how these tools can automate tedious tasks, generate complete syntax, and enhance code documentation and debugging.
In this talk, you'll learn how to:
- Efficiently create GitHub Actions scripts
- Convert shell scripts
- Develop Roslyn Analyzers
- Visualize code with Mermaid diagrams
And these are just a few examples from a vast universe of possibilities!
Packed with practical examples and demos, this presentation offers invaluable insights into optimizing your development process. Don't miss the opportunity to improve your coding efficiency and productivity with AI-driven solutions.
UiPath Community Day Kraków: Devs4Devs ConferenceUiPathCommunity
We are honored to launch and host this event for our UiPath Polish Community, with the help of our partners - Proservartner!
We certainly hope we have managed to spike your interest in the subjects to be presented and the incredible networking opportunities at hand, too!
Check out our proposed agenda below 👇👇
08:30 ☕ Welcome coffee (30')
09:00 Opening note/ Intro to UiPath Community (10')
Cristina Vidu, Global Manager, Marketing Community @UiPath
Dawid Kot, Digital Transformation Lead @Proservartner
09:10 Cloud migration - Proservartner & DOVISTA case study (30')
Marcin Drozdowski, Automation CoE Manager @DOVISTA
Pawel Kamiński, RPA developer @DOVISTA
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
09:40 From bottlenecks to breakthroughs: Citizen Development in action (25')
Pawel Poplawski, Director, Improvement and Automation @McCormick & Company
Michał Cieślak, Senior Manager, Automation Programs @McCormick & Company
10:05 Next-level bots: API integration in UiPath Studio (30')
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
10:35 ☕ Coffee Break (15')
10:50 Document Understanding with my RPA Companion (45')
Ewa Gruszka, Enterprise Sales Specialist, AI & ML @UiPath
11:35 Power up your Robots: GenAI and GPT in REFramework (45')
Krzysztof Karaszewski, Global RPA Product Manager
12:20 🍕 Lunch Break (1hr)
13:20 From Concept to Quality: UiPath Test Suite for AI-powered Knowledge Bots (30')
Kamil Miśko, UiPath MVP, Senior RPA Developer @Zurich Insurance
13:50 Communications Mining - focus on AI capabilities (30')
Thomasz Wierzbicki, Business Analyst @Office Samurai
14:20 Polish MVP panel: Insights on MVP award achievements and career profiling
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
2. Steven Rostedt
Software Engineer
■ One of the original authors of the PREEMPT_RT patch set
■ Creator and maintainer of ftrace
■ Creator of “make localmodconfig”
■ Creator and maintainer of “ktest.pl” Linux testing
framework
4. Tracing kernel latency on your application
■ The kernel is a black box
● Your application is at the whims of the kernel scheduler
5. Tracing kernel latency on your application
■ The kernel is a black box
● Your application is at the whims of the kernel scheduler
● Interrupts can cause delays in your application
6. Tracing kernel latency on your application
■ The kernel is a black box
● Your application is at the whims of the kernel scheduler
● Interrupts can cause delays in your application
● Kernel lock contention could add to the latency
7. Tracing kernel latency on your application
■ The kernel is a black box
● Your application is at the whims of the kernel scheduler
● Interrupts can cause delays in your application
● Kernel lock contention could add to the latency
■ Tracing can give you insight into the happenings of the kernel
● Monitor the scheduling decisions the kernel is making
● Record when interrupts are happening and for how long
● See how long kernel locks are held
8. Using cyclictest
■ cyclictest tests the latency of the system
● Calls nanosleep() and sees when it woke up to when it expected to wake up
9. Using cyclictest
■ cyclictest tests the latency of the system
● Calls nanosleep() and sees when it woke up to when it expected to wake up
■ The Real Time Linux developers use this to test the jitter of the system
● Loads are run while cyclictest is measuring the latency
● Requires the latency to stay below a threshold
● May run for weeks or months
10. Using cyclictest
■ cyclictest tests the latency of the system
● Calls nanosleep() and sees when it woke up to when it expected to wake up
■ The Real Time Linux developers use this to test the jitter of the system
● Loads are run while cyclictest is measuring the latency
● Requires the latency to stay below a threshold
● May run for weeks or months
■ I will use cyclictest as a example for this talk
11. Using cyclictest
■ cyclictest tests the latency of the system
● Calls nanosleep() and sees when it woke up to when it expected to wake up
■ The Real Time Linux developers use this to test the jitter of the system
● Loads are run while cyclictest is measuring the latency
● Requires the latency to stay below a threshold
● May run for weeks or months
■ I will use cyclictest as a example for this talk
● But this works for any application
15. cyclictest and tracing
■ Can break break when a latency is greater than a given threshold
■ Will write into the kernel tracing buffer
16. cyclictest and tracing
■ Can break break when a latency is greater than a given threshold
■ Will write into the kernel tracing buffer
-b USEC --breaktrace=USEC send “breaktrace” command when latency > USEC
--tracemark write a “tracemark” when -b latency is exceeded
36. libtracecmd to read the trace.dat file
■ Automate the latency process
■ First search from the end of the trace
● Find the marker or flag that tells you where the problem happened
37. libtracecmd to read the trace.dat file
■ Automate the latency process
■ First search from the end of the trace
● Find the marker or flag that tells you where the problem happened
■ Continue backwards looking for other events
■ Search forward to collect timings
38. int main (int argc, char **argv)
{
struct tracecmd_input *handle;
struct tep_handle *tep;
struct data data;
int ret;
handle = tracecmd_open(argv[1], 0);
tep = tracecmd_get_tep(handle);
init_data(tep, &data);
tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false);
printf("cpu=%d threshold=%d latency=%dnn", data.cpu, data.thresh, data.lat);
/* Now we know what CPU it is on, look at just this CPU */
cpus = tep_get_cpus(tep);
cpu_size = CPU_ALLOC_SIZE(cpus);
cpu_set = CPU_ALLOC(cpus);
CPU_ZERO_S(cpu_size, cpu_set);
CPU_SET_S(data.cpu, cpu_size, cpu_set);
/* Find where cyclictest was scheduled in */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_sched, &data, true);
/* Find where cyclictest timer went off */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_expire, &data, true);
39. int main (int argc, char **argv)
{
struct tracecmd_input *handle;
struct tep_handle *tep;
struct data data;
int ret;
handle = tracecmd_open(argv[1], 0);
tep = tracecmd_get_tep(handle);
init_data(tep, &data);
tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false);
printf("cpu=%d threshold=%d latency=%dnn", data.cpu, data.thresh, data.lat);
/* Now we know what CPU it is on, look at just this CPU */
cpus = tep_get_cpus(tep);
cpu_size = CPU_ALLOC_SIZE(cpus);
cpu_set = CPU_ALLOC(cpus);
CPU_ZERO_S(cpu_size, cpu_set);
CPU_SET_S(data.cpu, cpu_size, cpu_set);
/* Find where cyclictest was scheduled in */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_sched, &data, true);
/* Find where cyclictest timer went off */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_expire, &data, true);
40. int main (int argc, char **argv)
{
struct tracecmd_input *handle;
struct tep_handle *tep;
struct data data;
int ret;
handle = tracecmd_open(argv[1], 0);
tep = tracecmd_get_tep(handle);
init_data(tep, &data);
tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false);
printf("cpu=%d threshold=%d latency=%dnn", data.cpu, data.thresh, data.lat);
/* Now we know what CPU it is on, look at just this CPU */
cpus = tep_get_cpus(tep);
cpu_size = CPU_ALLOC_SIZE(cpus);
cpu_set = CPU_ALLOC(cpus);
CPU_ZERO_S(cpu_size, cpu_set);
CPU_SET_S(data.cpu, cpu_size, cpu_set);
/* Find where cyclictest was scheduled in */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_sched, &data, true);
/* Find where cyclictest timer went off */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_expire, &data, true);
41. static int find_trace_marker(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct tep_event *marker;
static struct trace_seq seq;
struct data *d = data;
const char *str;
int thresh;
int type;
int lat;
marker = d->marker;
type = tep_data_type(tep, record);
if (type != marker->id)
return 0;
/* Make sure that the print has the data we want */
if (!seq.buffer)
trace_seq_init(&seq);
tep_print_event(tep, &seq, record, "%s", TEP_PRINT_INFO);
trace_seq_terminate(&seq);
42. static int find_trace_marker(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct tep_event *marker;
static struct trace_seq seq;
struct data *d = data;
const char *str;
int thresh;
int type;
int lat;
marker = d->marker;
type = tep_data_type(tep, record);
if (type != marker->id)
return 0;
/* Make sure that the print has the data we want */
if (!seq.buffer)
trace_seq_init(&seq);
tep_print_event(tep, &seq, record, "%s", TEP_PRINT_INFO);
trace_seq_terminate(&seq);
Skip non-marker events
43. str = strstr(seq.buffer, "hit latency threshold");
if (!str) {
/* This is not the string you are looking for */
trace_seq_reset(&seq);
return 0;
}
sscanf(str, "hit latency threshold (%d > %d)", &lat, &thresh);
d->cpu = record->cpu;
d->lat = lat;
d->thresh = thresh;
d->pid = tep_data_pid(tep, record);
d->marker_time = record->ts;
trace_seq_destroy(&seq);
seq.buffer = NULL;
/* Stop the iterator, as we will now only search the current CPU */
return 1;
}
Parse the tracemarker
44. str = strstr(seq.buffer, "hit latency threshold");
if (!str) {
/* This is not the string you are looking for */
trace_seq_reset(&seq);
return 0;
}
sscanf(str, "hit latency threshold (%d > %d)", &lat, &thresh);
d->cpu = record->cpu;
d->lat = lat;
d->thresh = thresh;
d->pid = tep_data_pid(tep, record);
d->marker_time = record->ts;
trace_seq_destroy(&seq);
seq.buffer = NULL;
/* Stop the iterator, as we will now only search the current CPU */
return 1;
}
Record which CPU
45. int main (int argc, char **argv)
{
struct tracecmd_input *handle;
struct tep_handle *tep;
struct data data;
int ret;
handle = tracecmd_open(argv[1], 0);
tep = tracecmd_get_tep(handle);
init_data(tep, &data);
tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false);
printf("cpu=%d threshold=%d latency=%dnn", data.cpu, data.thresh, data.lat);
/* Now we know what CPU it is on, look at just this CPU */
cpus = tep_get_cpus(tep);
cpu_size = CPU_ALLOC_SIZE(cpus);
cpu_set = CPU_ALLOC(cpus);
CPU_ZERO_S(cpu_size, cpu_set);
CPU_SET_S(data.cpu, cpu_size, cpu_set);
/* Find where cyclictest was scheduled in */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_sched, &data, true);
/* Find where cyclictest timer went off */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_expire, &data, true);
46. int main (int argc, char **argv)
{
struct tracecmd_input *handle;
struct tep_handle *tep;
struct data data;
int ret;
handle = tracecmd_open(argv[1], 0);
tep = tracecmd_get_tep(handle);
init_data(tep, &data);
tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false);
printf("cpu=%d threshold=%d latency=%dnn", data.cpu, data.thresh, data.lat);
/* Now we know what CPU it is on, look at just this CPU */
cpus = tep_get_cpus(tep);
cpu_size = CPU_ALLOC_SIZE(cpus);
cpu_set = CPU_ALLOC(cpus);
CPU_ZERO_S(cpu_size, cpu_set);
CPU_SET_S(data.cpu, cpu_size, cpu_set);
/* Find where cyclictest was scheduled in */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_sched, &data, true);
/* Find where cyclictest timer went off */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_expire, &data, true);
Only follow 1 CPU
47. int main (int argc, char **argv)
{
struct tracecmd_input *handle;
struct tep_handle *tep;
struct data data;
int ret;
handle = tracecmd_open(argv[1], 0);
tep = tracecmd_get_tep(handle);
init_data(tep, &data);
tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false);
printf("cpu=%d threshold=%d latency=%dnn", data.cpu, data.thresh, data.lat);
/* Now we know what CPU it is on, look at just this CPU */
cpus = tep_get_cpus(tep);
cpu_size = CPU_ALLOC_SIZE(cpus);
cpu_set = CPU_ALLOC(cpus);
CPU_ZERO_S(cpu_size, cpu_set);
CPU_SET_S(data.cpu, cpu_size, cpu_set);
/* Find where cyclictest was scheduled in */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_sched, &data, true);
/* Find where cyclictest timer went off */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_expire, &data, true);
Look for sched_switch
48. static int find_sched(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct tep_event *sched;
unsigned long long val;
struct data *d = data;
int type;
sched = d->sched;
type = tep_data_type(tep, record);
if (type != sched->id)
return 0;
tep_read_number_field(d->next_pid, record->data, &val);
d->sched_time = record->ts;
return -1;
}
49. static int find_sched(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct tep_event *sched;
unsigned long long val;
struct data *d = data;
int type;
sched = d->sched;
type = tep_data_type(tep, record);
if (type != sched->id)
return 0;
tep_read_number_field(d->next_pid, record->data, &val);
d->sched_time = record->ts;
return -1;
}
Exit if not sched_switch
50. static int find_sched(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct tep_event *sched;
unsigned long long val;
struct data *d = data;
int type;
sched = d->sched;
type = tep_data_type(tep, record);
if (type != sched->id)
return 0;
tep_read_number_field(d->next_pid, record->data, &val);
d->sched_time = record->ts;
return -1;
}
Record sched_switch time
51. int main (int argc, char **argv)
{
struct tracecmd_input *handle;
struct tep_handle *tep;
struct data data;
int ret;
handle = tracecmd_open(argv[1], 0);
tep = tracecmd_get_tep(handle);
init_data(tep, &data);
tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false);
printf("cpu=%d threshold=%d latency=%dnn", data.cpu, data.thresh, data.lat);
/* Now we know what CPU it is on, look at just this CPU */
cpus = tep_get_cpus(tep);
cpu_size = CPU_ALLOC_SIZE(cpus);
cpu_set = CPU_ALLOC(cpus);
CPU_ZERO_S(cpu_size, cpu_set);
CPU_SET_S(data.cpu, cpu_size, cpu_set);
/* Find where cyclictest was scheduled in */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_sched, &data, true);
/* Find where cyclictest was woken up */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_wakeup, &data, true);
Look for sched_waking
52. static int find_wakeup(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct tep_format_field *waking_pid;
struct tep_event *waking;
unsigned long long val;
struct data *d = data;
int flags;
int type;
waking = d->waking;
waking_pid = d->waking_pid;
type = tep_data_type(tep, record);
if (type != waking->id)
return 0;
tep_read_number_field(waking_pid, record->data, &val);
if (val != d->pid)
return 0;
/* Found the wakeup! */
d->wakeup_time = record->ts;
return -1;
}
Record wake up time
53. int main (int argc, char **argv)
{
[..]
/* Find where cyclictest timer went off */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_expire, &data, true);
/* Find where the timer was added (the start of this cycle */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_start, &data, true);
printf("expected expire: %lldn", data.timer_expires_expect);
printf("timer expired: %lldn", data.timer_expire);
printf(" jitter: (%lld)n", data.timer_expire - data.timer_expires_expect);
printf("wake up time: %lldn", data.wakeup_time);
printf(" from timer: (%lld)n", data.wakeup_time - data.timer_expire);
printf("scehduled time: %lldn", data.sched_time);
printf(" from wakeup: (%lld)n", data.sched_time - data.wakeup_time);
printf("marker time: %lldn", data.marker_time);
printf(" from schedule:(%lld)n", data.marker_time - data.sched_time);
tracecmd_close(handle);
return 0;
}
Look for timer expire event
54. static int find_timer_expire(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct tep_event *timer;
unsigned long long val;
struct data *d = data;
int type;
timer = d->hrtimer_expire;
type = tep_data_type(tep, record);
if (type != timer->id)
return 0;
tep_read_number_field(d->hrtimer_expire_ptr, record->data, &val);
d->hrtimer = val;
tep_read_number_field(d->timer_now, record->data, &val);
d->timer_expire = record->ts;
d->timer_delta = val - record->ts;
return -1;
}
55. static int find_timer_expire(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct tep_event *timer;
unsigned long long val;
struct data *d = data;
int type;
timer = d->hrtimer_expire;
type = tep_data_type(tep, record);
if (type != timer->id)
return 0;
tep_read_number_field(d->hrtimer_expire_ptr, record->data, &val);
d->hrtimer = val;
tep_read_number_field(d->timer_now, record->data, &val);
d->timer_expire = record->ts;
d->timer_delta = val - record->ts;
return -1;
}
Calculate timer event
time to ring buffer time
56. int main (int argc, char **argv)
{
[..]
/* Find where cyclictest was woken up */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_wakeup, &data, true);
/* Find where the timer was added (the start of this cycle */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_start, &data, true);
printf("expected expire: %lldn", data.timer_expires_expect);
printf("timer expired: %lldn", data.timer_expire);
printf(" jitter: (%lld)n", data.timer_expire - data.timer_expires_expect);
printf("wake up time: %lldn", data.wakeup_time);
printf(" from timer: (%lld)n", data.wakeup_time - data.timer_expire);
printf("scehduled time: %lldn", data.sched_time);
printf(" from wakeup: (%lld)n", data.sched_time - data.wakeup_time);
printf("marker time: %lldn", data.marker_time);
printf(" from schedule:(%lld)n", data.marker_time - data.sched_time);
tracecmd_close(handle);
return 0;
}
Look for timer start event
57. static int find_timer_start(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct tep_event *timer;
unsigned long long val;
struct data *d = data;
int type;
timer = d->hrtimer_start;
type = tep_data_type(tep, record);
if (type != timer->id)
return 0;
tep_read_number_field(d->hrtimer_start_ptr, record->data, &val);
if (val != d->hrtimer)
return 0;
tep_read_number_field(d->hrtimer_start_expires, record->data, &val);
d->timer_expires_expect = val - d->timer_delta;
d->timer_start = record->ts;
return -1;
}
58. static int find_timer_start(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct tep_event *timer;
unsigned long long val;
struct data *d = data;
int type;
timer = d->hrtimer_start;
type = tep_data_type(tep, record);
if (type != timer->id)
return 0;
tep_read_number_field(d->hrtimer_start_ptr, record->data, &val);
if (val != d->hrtimer)
return 0;
tep_read_number_field(d->hrtimer_start_expires, record->data, &val);
d->timer_expires_expect = val - d->timer_delta;
d->timer_start = record->ts;
return -1;
}
Calculate expected time
vs actual time
59. int main (int argc, char **argv)
{
[..]
/* Find where cyclictest was woken up */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_wakeup, &data, true);
/* Find where the timer was added (the start of this cycle */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_start, &data, true);
printf("expected expire: %lldn", data.timer_expires_expect);
printf("timer expired: %lldn", data.timer_expire);
printf(" jitter: (%lld)n", data.timer_expire - data.timer_expires_expect);
printf("wake up time: %lldn", data.wakeup_time);
printf(" from timer: (%lld)n", data.wakeup_time - data.timer_expire);
printf("scehduled time: %lldn", data.sched_time);
printf(" from wakeup: (%lld)n", data.sched_time - data.wakeup_time);
printf("marker time: %lldn", data.marker_time);
printf(" from schedule:(%lld)n", data.marker_time - data.sched_time);
tracecmd_close(handle);
return 0;
}
60. Now analyze the trace.dat file
~# trace-cyclic trace-cyclic.dat
cpu=1 threshold=100 latency=121
expected expire: 818035615415764
timer expired: 818035615416617
jitter: (853)
wake up time: 818035615416993
from timer: (376)
scehduled time: 818035615535531
from wakeup: (118538)
marker time: 818035615546097
from schedule:(10566)
Time in nanoseconds
61. Now analyze the trace.dat file
~# trace-cyclic trace-cyclic.dat
cpu=1 threshold=100 latency=121
expected expire: 818035615415764
timer expired: 818035615416617
jitter: (853)
wake up time: 818035615416993
from timer: (376)
scehduled time: 818035615535531
from wakeup: (118538)
marker time: 818035615546097
from schedule:(10566)
Problem here!
62. Can analyze any events you want
■ This is just a small example of what you can do
63. Can analyze any events you want
■ This is just a small example of what you can do
■ Let’s look at locks
64. Can analyze any events you want
■ This is just a small example of what you can do
■ Let’s look at locks
■ Using function tracing, trace all locking functions
● function tracing gives you the parent function too
● Can see where a lock was called
65. Can analyze any events you want
■ This is just a small example of what you can do
■ Let’s look at locks
■ Using function tracing, trace all locking functions
● function tracing gives you the parent function too
● Can see where a lock was called
■ Trace both the lock and unlock functions
● Can get the latency of how long they are held
67. int main (int argc, char **argv)
{
[..]
/* Find where cyclictest was woken up */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_wakeup, &data, true);
/* Find where the timer was added (the start of this cycle */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_start, &data, true);
/* Now that we know all the times, look for the locks (going forward now) */
tracecmd_iterate_events(handle, cpu_set, cpu_size,
find_locks, &data);
printf("expected expire: %lldn", data.timer_expires_expect);
printf("timer expired: %lldn", data.timer_expire);
printf(" jitter: (%lld)n", data.timer_expire - data.timer_expires_expect);
printf("wake up time: %lldn", data.wakeup_time);
printf(" from timer: (%lld)n", data.wakeup_time - data.timer_expire);
printf("scehduled time: %lldn", data.sched_time);
printf(" from wakeup: (%lld)n", data.sched_time - data.wakeup_time);
printf("marker time: %lldn", data.marker_time);
printf(" from schedule:(%lld)n", data.marker_time - data.sched_time);
68. int main (int argc, char **argv)
{
[..]
/* Find where cyclictest was woken up */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_wakeup, &data, true);
/* Find where the timer was added (the start of this cycle */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_timer_start, &data, true);
/* Now that we know all the times, look for the locks (going forward now) */
tracecmd_iterate_events(handle, cpu_set, cpu_size,
find_locks, &data);
printf("expected expire: %lldn", data.timer_expires_expect);
printf("timer expired: %lldn", data.timer_expire);
printf(" jitter: (%lld)n", data.timer_expire - data.timer_expires_expect);
printf("wake up time: %lldn", data.wakeup_time);
printf(" from timer: (%lld)n", data.wakeup_time - data.timer_expire);
printf("scehduled time: %lldn", data.sched_time);
printf(" from wakeup: (%lld)n", data.sched_time - data.wakeup_time);
printf("marker time: %lldn", data.marker_time);
printf(" from schedule:(%lld)n", data.marker_time - data.sched_time);
Go forward and find all locks
69. static int find_locks(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct data *d = data;
enum data_state state;
if (record->ts == d->marker_time)
return -1;
if (record->ts < d->timer_expires_expect)
state = STATE_START;
else if (record->ts < d->timer_expire)
state = STATE_EXPECT;
else if (record->ts < d->wakeup_time)
state = STATE_EXPIRED;
else if (record->ts < d->sched_time)
state = STATE_WAKEUP;
else
state = STATE_SCHED;
handle_locks(tep, record, data, state);
return 0;
}
Use saved times to know where
the lock happened
70. static int handle_locks(struct tep_handle *tep, struct tep_record *record,
struct data *data, enum data_state state)
{
struct lock_stack *lock;
struct lock_list *llist;
unsigned long long pip, ip;
const char *func;
bool start = true;
int type;
int id;
switch (state) {
case STATE_START:
llist = &data->locks_start;
break;
case STATE_EXPECT:
llist = &data->locks_start_expire;
break;
case STATE_EXPIRED:
llist = &data->locks_timer;
break;
case STATE_WAKEUP:
llist = &data->locks_wakeup;
break;
case STATE_SCHED:
llist = &data->locks_sched;
break;
}
type = tep_data_type(tep, record);
if (!data->function || data->function->id != type)
return 0;
Save lock information
with associated state
71. tep_read_number_field(data->function_ip, record->data, &ip);
tep_read_number_field(data->function_pip, record->data, &pip);
func = tep_find_function(tep, ip);
id = find_start_lock(func);
if (id < 0) {
id = find_stop_lock(func);
if (id < 0)
return 1;
start = false;
}
if (start) {
push_lock(data, id, pip, record->ts, func);
return 1;
}
/* Pop the locks until we find our id */
lock = NULL;
do {
if (lock)
free(lock);
lock = pop_stack(data);
} while (lock && lock->id != id);
if (!lock)
return 1;
/* Add the total time of this lock */
end_timer(&llist->list, pip, record->ts, lock->name);
start_timer(&llist->list, pip, lock->time, NULL);
free(lock);
return 1;
}
Calculate how long lock was held
72. static const char *locks[] = {
"spin_", "raw_spin_", "_raw_spin_", "__raw_spin_",
"read_", "raw_read_", "_raw_read_", "__raw_read_",
"write_", "raw_write_", "_raw_write_", "__raw_write_",
};
static int find_start_lock(const char *func)
{
const char *lock;
int len;
int i;
for (i = 0; i < ARRAY_SIZE(locks); i++) {
lock = locks[i];
len = strlen(lock);
if (strncmp(lock, func, len) != 0)
continue;
if (strncmp(func + len, "lock", 4) == 0 ||
strncmp(func + len, "try", 3) == 0)
return i;
break;
}
return -1;
}
All known lock types
73. static const char *locks[] = {
"spin_", "raw_spin_", "_raw_spin_", "__raw_spin_",
"read_", "raw_read_", "_raw_read_", "__raw_read_",
"write_", "raw_write_", "_raw_write_", "__raw_write_",
};
static int find_start_lock(const char *func)
{
const char *lock;
int len;
int i;
for (i = 0; i < ARRAY_SIZE(locks); i++) {
lock = locks[i];
len = strlen(lock);
if (strncmp(lock, func, len) != 0)
continue;
if (strncmp(func + len, "lock", 4) == 0 ||
strncmp(func + len, "try", 3) == 0)
return i;
break;
}
return -1;
}
See if it is a lock (not unlock)
74. int main (int argc, char **argv)
{
[..]
tracecmd_iterate_events(handle, cpu_set, cpu_size,
find_locks, &data);
printf("expected expire: %lldn", data.timer_expires_expect);
printf("timer expired: %lldn", data.timer_expire);
printf(" jitter: (%lld)n", data.timer_expire - data.timer_expires_expect);
printf("wake up time: %lldn", data.wakeup_time);
printf(" from timer: (%lld)n", data.wakeup_time - data.timer_expire);
printf("scehduled time: %lldn", data.sched_time);
printf(" from wakeup: (%lld)n", data.sched_time - data.wakeup_time);
printf("marker time: %lldn", data.marker_time);
printf(" from schedule:(%lld)n", data.marker_time - data.sched_time);
print_lock_list(tep, "Locks taken and released from start to expected time:",
&data.locks_start);
print_lock_list(tep, "Locks held between expected time and timer:",
&data.locks_start_expire);
print_lock_list(tep, "Locks taken from timer to wakeup:", &data.locks_timer);
print_lock_list(tep, "Locks taken from wake up to sched:", &data.locks_wakeup);
print_lock_list(tep, "Locks taken from sched to print:", &data.locks_sched);
76. Analyze the locks in the trace
~# trace-cyclic trace-cyclic.dat
cpu=4 threshold=100 latency=102
expected expire: 850947083216603
timer expired: 850947083217641
jitter: (1038)
wake up time: 850947083217999
from timer: (358)
scehduled time: 850947083316970
from wakeup: (98971)
marker time: 850947083328710
from schedule:(11740)
Locks taken and released from start to expected time:
do_nanosleep+0x5f total time: 610 (1 time) [_raw_spin_lock_irqsave]
dequeue_task_rt+0x28 total time: 180 (1 time) [_raw_spin_lock]
finish_task_switch.isra.0+0x9b total time: 1450 (2 times) [_raw_spin_lock]
get_next_timer_interrupt+0x7b total time: 155 (7 times) [_raw_spin_lock]
hrtimer_get_next_event+0x47 total time: 140 (7 times) [_raw_spin_lock_irqsave]
hrtimer_next_event_without+0x67 total time: 161 (7 times) [_raw_spin_lock_irqsave]
sched_ttwu_pending+0xed total time: 1522 (1 time) [_raw_spin_lock]
poll_freewait+0x3d total time: 182 (7 times) [_raw_spin_lock_irqsave]
[..]
77. Analyze the locks in the trace
~# trace-cyclic trace-cyclic.dat
cpu=4 threshold=100 latency=102
expected expire: 850947083216603
timer expired: 850947083217641
jitter: (1038)
wake up time: 850947083217999
from timer: (358)
scehduled time: 850947083316970
from wakeup: (98971)
marker time: 850947083328710
from schedule:(11740)
Locks taken and released from start to expected time:
do_nanosleep+0x5f total time: 610 (1 time) [_raw_spin_lock_irqsave]
dequeue_task_rt+0x28 total time: 180 (1 time) [_raw_spin_lock]
finish_task_switch.isra.0+0x9b total time: 1450 (2 times) [_raw_spin_lock]
get_next_timer_interrupt+0x7b total time: 155 (7 times) [_raw_spin_lock]
hrtimer_get_next_event+0x47 total time: 140 (7 times) [_raw_spin_lock_irqsave]
hrtimer_next_event_without+0x67 total time: 161 (7 times) [_raw_spin_lock_irqsave]
sched_ttwu_pending+0xed total time: 1522 (1 time) [_raw_spin_lock]
poll_freewait+0x3d total time: 182 (7 times) [_raw_spin_lock_irqsave]
[..]
Biggest difference
78. Analyze the locks in the trace
sigprocmask+0x85 total time: 199 (4 times) [_raw_spin_lock_irq]
n_tty_read+0x217 total time: 198 (2 times) [_raw_spin_lock_irqsave]
n_tty_read+0x5d9 total time: 233 (2 times) [_raw_spin_lock_irqsave]
try_to_wake_up+0x77 total time: 183 (1 time) [_raw_spin_lock_irqsave]
__wake_up_common_lock+0x7e total time: 415 (2 times) [_raw_spin_lock_irqsave]
tcp_poll+0x2b total time: 178 (1 time) [_raw_spin_lock_irqsave]
tcp_sendmsg+0x19 total time: 231 (1 time) [_raw_spin_lock_bh]
nf_conntrack_tcp_packet+0x8e4 total time: 447 (1 time) [_raw_spin_lock_bh]
sch_direct_xmit+0x43 total time: 168 (1 time) [_raw_spin_lock]
Locks held between expected time and timer:
__hrtimer_run_queues+0x120 total time: 456 (1 time) [_raw_spin_lock_irqsave]
Locks taken from wake up to sched:
enqueue_task_rt+0x1e8 total time: 169 (1 time) [_raw_spin_lock]
try_to_wake_up+0x246 total time: 1674 (1 time) [_raw_spin_lock]
try_to_wake_up+0x251 total time: 2346 (1 time) [_raw_spin_lock_irqsave]
hrtimer_interrupt+0x11d total time: 222 (1 time) [_raw_spin_lock_irq]
sch_direct_xmit+0x13a total time: 168948 (1 time) [_raw_spin_lock]
__dev_queue_xmit+0x7fe total time: 279 (1 time) [_raw_spin_lock]
rcu_note_context_switch+0x386 total time: 237 (1 time) [_raw_spin_lock]
Locks taken from sched to print:
finish_task_switch.isra.0+0x9b total time: 2056 (1 time) [_raw_spin_lock]
79. Analyze the locks in the trace
sigprocmask+0x85 total time: 199 (4 times) [_raw_spin_lock_irq]
n_tty_read+0x217 total time: 198 (2 times) [_raw_spin_lock_irqsave]
n_tty_read+0x5d9 total time: 233 (2 times) [_raw_spin_lock_irqsave]
try_to_wake_up+0x77 total time: 183 (1 time) [_raw_spin_lock_irqsave]
__wake_up_common_lock+0x7e total time: 415 (2 times) [_raw_spin_lock_irqsave]
tcp_poll+0x2b total time: 178 (1 time) [_raw_spin_lock_irqsave]
tcp_sendmsg+0x19 total time: 231 (1 time) [_raw_spin_lock_bh]
nf_conntrack_tcp_packet+0x8e4 total time: 447 (1 time) [_raw_spin_lock_bh]
sch_direct_xmit+0x43 total time: 168 (1 time) [_raw_spin_lock]
Locks held between expected time and timer:
__hrtimer_run_queues+0x120 total time: 456 (1 time) [_raw_spin_lock_irqsave]
Locks taken from wake up to sched:
enqueue_task_rt+0x1e8 total time: 169 (1 time) [_raw_spin_lock]
try_to_wake_up+0x246 total time: 1674 (1 time) [_raw_spin_lock]
try_to_wake_up+0x251 total time: 2346 (1 time) [_raw_spin_lock_irqsave]
hrtimer_interrupt+0x11d total time: 222 (1 time) [_raw_spin_lock_irq]
sch_direct_xmit+0x13a total time: 168948 (1 time) [_raw_spin_lock]
__dev_queue_xmit+0x7fe total time: 279 (1 time) [_raw_spin_lock]
rcu_note_context_switch+0x386 total time: 237 (1 time) [_raw_spin_lock]
Locks taken from sched to print:
finish_task_switch.isra.0+0x9b total time: 2056 (1 time) [_raw_spin_lock]
Long lock held!