Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
Video: https://youtu.be/eO94l0aGLCA?t=3m37s . Talk by Brendan Gregg for ACM Applicative 2016
"System Methodology - Holistic Performance Analysis on Modern Systems
Traditional systems performance engineering makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. For modern systems, we can choose the metrics, and can choose ones we need to support new holistic performance analysis methodologies. These methodologies provide faster, more accurate, and more complete analysis, and can provide a starting point for unfamiliar systems.
Methodologies are especially helpful for modern applications and their workloads, which can pose extremely complex problems with no obvious starting point. There are also continuous deployment environments such as the Netflix cloud, where these problems must be solved in shorter time frames. Fortunately, with advances in system observability and tracers, we have virtually endless custom metrics to aid performance analysis. The problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.
System methodologies provide a starting point for analysis, as well as guidance for quickly moving through the metrics to root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, and more.
This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. The focus is on single systems (any operating system), including single cloud instances, and quickly locating performance issues or exonerating the system. Many methodologies will be discussed, along with recommendations for their implementation, which may be as documented checklists of tools, or custom dashboards of supporting metrics. In general, you will learn to think differently about your systems, and how to ask better questions."
Tracing Summit 2014, Düsseldorf. What can Linux learn from DTrace: what went well, and what didn't go well, on its path to success? This talk will discuss not just the DTrace software, but lessons from the marketing and adoption of a system tracer, and an inside look at how DTrace was really deployed and used in production environments. It will also cover ongoing problems with DTrace, and how Linux may surpass them and continue to advance the field of system tracing. A world expert and core contributor to DTrace, Brendan now works at Netflix on Linux performance with the various Linux tracers (ftrace, perf_events, eBPF, SystemTap, ktap, sysdig, LTTng, and the DTrace Linux ports), and will summarize his experiences and suggestions for improvements. He has also been contributing to various tracers: recently promoting ftrace and perf_events adoption through articles and front-end scripts, and testing eBPF.
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
Java Performance Analysis on Linux with Flame Graphs
This document discusses using Linux perf_events (perf) profiling tools to analyze Java performance on Linux. It describes how perf can provide complete visibility into Java, JVM, GC and system code but that Java profilers have limitations. It presents the solution of using perf to collect mixed-mode flame graphs that include Java method names and symbols. It also discusses fixing issues with broken Java stacks and missing symbols on x86 architectures in perf profiles.
This document summarizes a presentation on flame graphs for profiling CPU and memory performance on FreeBSD. It introduces flame graphs as a way to visualize stack profiles to easily compare performance across systems. Examples are given profiling MySQL workload CPU usage on two hosts to identify a 30% performance difference. Commands are provided to generate flame graphs from DTrace profiles of CPU stack sampling and page faults.
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg.
There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes.
This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
My talk for BayLISA, Oct 2013, launching the Systems Performance book. Operating system performance analysis and tuning leads to a better end-user experience and lower costs, especially for cloud computing environments that pay by the operating system instance. This book covers concepts, strategy, tools and tuning for Unix operating systems, with a focus on Linux- and Solaris-based systems. The book covers the latest tools and techniques, including static and dynamic tracing, to get the most out of your systems.
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your AMI is one of the core foundations for running applications and services effectively on Amazon EC2. In this session, you'll learn how to optimize your AMI, including how you can measure and diagnose system performance and tune parameters for improved CPU and network performance. We'll cover application-specific examples from Netflix on how optimized AMIs can lead to improved performance.
This document provides a performance engineer's predictions for computing performance trends in 2021 and beyond. The engineer discusses trends in processors, memory, disks, networking, runtimes, kernels, hypervisors, and observability. For processors, predictions include multi-socket systems becoming less common, the future of simultaneous multithreading being unclear, practical core count limits being reached in the 2030s, and more processor vendors including ARM-based and RISC-V options. Memory predictions focus on many workloads being memory-bound currently.
Introduction to DTrace (Dynamic Tracing), written by Brendan Gregg and delivered in 2007. While aimed at a Solaris-based audience, this introduction is still largely relevant today (2012). Since then, DTrace has appeared in other operating systems (Mac OS X, FreeBSD, and is being ported to Linux), and, many user-level providers have been developed to aid tracing of other languages.
Linux Performance Analysis: New Tools and Old Secrets
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
MeetBSDCA 2014 Performance Analysis for BSD, by Brendan Gregg. A tour of five relevant topics: observability tools, methodologies, benchmarking, profiling, and tracing. Tools summarized include pmcstat and DTrace.
EuroBSDcon 2017 System Performance Analysis Methodologies
keynote by Brendan Gregg. "Traditional performance monitoring makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. Modern BSD has advanced tracers and PMC tools, providing virtually endless metrics to aid performance analysis. It's time we really used them, but the problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.
There's a new way to approach performance analysis that can guide you through the metrics. Instead of starting with traditional metrics and figuring out their use, you start with the questions you want answered then look for metrics to answer them. Methodologies can provide these questions, as well as a starting point for analysis and guidance for locating the root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, chain graphs, and more.
This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. Many methodologies will be discussed, from the production proven to the cutting edge, along with recommendations for their implementation on BSD systems. In general, you will learn to think differently about analyzing your systems, and make better use of the modern tools that BSD provides."
The document summarizes a talk on container performance analysis. It discusses identifying bottlenecks at the host, container, and kernel level using various Linux performance tools. It then provides an overview of how containers work in Linux using namespaces and control groups (cgroups). Finally, it demonstrates some example commands like docker stats, systemd-cgtop, and bcc/BPF tools that can be used to analyze containers and cgroups from the host system.
Keynote for PerconaLive 2018 by Brendan Gregg. Video: https://youtu.be/sV3XfrfjrPo?t=30m51s . "At over one thousand code commits per week, it's hard to keep up with Linux developments. This keynote will summarize recent Linux performance features, for a wide audience: the KPTI patches for Meltdown, eBPF for performance observability, Kyber for disk I/O scheduling, BBR for TCP congestion control, and more. This is about exposure: knowing what exists, so you can learn and use it later when needed. Get the most out of your systems, whether they are databases or application servers, with the latest Linux kernels and exciting features."
Delivered at the FISL13 conference in Brazil: http://www.youtube.com/watch?v=K9w2cipqfvc
This talk introduces the USE Method: a simple strategy for performing a complete check of system performance health, identifying common bottlenecks and errors. This methodology can be used early in a performance investigation to quickly identify the most severe system performance issues, and is a methodology the speaker has used successfully for years in both enterprise and cloud computing environments. Checklists have been developed to show how the USE Method can be applied to Solaris/illumos-based and Linux-based systems.
Many hardware and software resource types have been commonly overlooked, including memory and I/O busses, CPU interconnects, and kernel locks. Any of these can become a system bottleneck. The USE Method provides a way to find and identify these.
This approach focuses on the questions to ask of the system, before reaching for the tools. Tools that are ultimately used include all the standard performance tools (vmstat, iostat, top), and more advanced tools, including dynamic tracing (DTrace), and hardware performance counters.
Other performance methodologies are included for comparison: the Problem Statement Method, Workload Characterization Method, and Drill-Down Analysis Method.
Video: http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x ; This talk for SCaLE11x covers system performance analysis methodologies and the Linux tools to support them, so that you can get the most out of your systems and solve performance issues quickly. This includes a wide variety of tools, including basics like top(1), advanced tools like perf, and new tools like the DTrace for Linux prototypes.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
This document discusses the challenges of building and debugging DIRT (data-intensive real-time) applications in production. It provides examples from the mobile push-to-talk app Voxer, which is described as a canonical DIRT app. Specific issues covered include application restarts inducing latency bubbles, dropped TCP connections causing latency outliers, and identifying sources of slow disk I/O. Tools like DTrace are highlighted as being essential for instrumentation and problem diagnosis in DIRT apps.
HTTP applications concentrate many performance issues:
- They are a common way to let internal & external users access and modify data.
- They rely on a delivery chain which contains many elements, which are all performance drivers: browser, workstation, network, front-server, application server, file server, database, images, etc...
- They raise specific troubleshooting issues: among others, the end user feedback is based on the concept of page, while most network based performance analysis is based on every transaction / object in the page (html, css, image, script, etc.)
This one-hour webinar will enable you to:
- Understand the challenges of performance troubleshooting for HTTP Applications.
- View a series of concrete diagnostic cases with Performance Vision newest version.
Performance analysis aims to capture, analyze, and evaluate key components of performance through systematic observation. Coaches observe to better understand technical, tactical, behavioral, and physical aspects of performance. They then provide feedback to improve future practice. Coaches use various methods like notation, video, biomechanics, tests, and questionnaires to gather both qualitative and quantitative data on performance. Technology applications and software programs help support detailed analysis.
A presentation from SEO Campixx Barcamp 2011 in Berlin. Web Performance Optimization is about making websites faster. Here i discussed different measures and show the impact on competitive advantage and possibly rankings on Google. Undeniably you can say that better performance leads to more sales and better usability in terms of bouncing rates. View image slides here: http://b0i.de/wpopresentation
A presentation that provides an overview of software testing approaches including "schools" of software testing and a variety of testing techniques and practices.
"DTracing the Cloud", Brendan Gregg, illumosday 2012
Cloud computing facilitates rapid deployment and scaling, often pushing high load at applications under continual development. DTrace allows immediate analysis of issues on live production systems even in these demanding environments – no need to restart or run a special debug kernel.
For the illumos kernel, DTrace has been enhanced to support cloud computing, providing more observation capabilities to zones as used by Joyent SmartMachine customers. DTrace is also frequently used by the cloud operators to analyze systems and verify performance isolation of tenants.
This talk covers DTrace in the illumos-based cloud, showing examples of real-world performance wins.
A brief talk on systems performance for the July 2013 meetup "A Midsummer Night's System", video: http://www.youtube.com/watch?v=P3SGzykDE4Q. This summarizes how systems performance has changed from the 1990's to today. This was the reason for writing a new book on systems performance, to provide a reference that is up to date, covering new tools, technologies, and methodologies.
LinuxCon Europe, 2014. Video: https://www.youtube.com/watch?v=SN7Z0eCn0VY . There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This talk summarizes the three types of performance tools: observability, benchmarking, and tuning, providing a tour of what exists and why they exist. Advanced tools including those based on tracepoints, kprobes, and uprobes are also included: perf_events, ktap, SystemTap, LTTng, and sysdig. You'll gain a good understanding of the performance tools landscape, knowing what to reach for to get the most out of your systems.
Measuring the Performance of Single Page Applications
Single page applications are a problem for RUM tools because there are no easy ways to tell when a new page component has been requested asynchronously as a result of an intentional user action. Many network requests are back-end service calls initiated periodically by the app – for example, a ping to check if content has been updated, or to check if the current user should still be signed in to their account.
Even with requests that are initiated by a user action, not all may fit into the definition of a “page view.” For example, a user typing into a search box that has auto-complete capabilities will often result in network requests, but these requests result in very small amounts of data transfer, happen very frequently, and do not count toward page views. The scene is further complicated by SPA frameworks like Angular, Backbone, and others.
In this talk, we’ll learn about some of the tricks used by boomerang to measure the performance of single page applications, going as far as capturing errors and waterfall information across browsers.
Application Performance Management - Solving the Performance Puzzle
The document outlines a methodology for Application Performance Management (APM). It discusses various components of an APM strategy including top-down monitoring, bottom-up monitoring, reporting and analytics, and aligning with ITIL processes. Top-down monitoring focuses on real-time application monitoring using techniques like synthetic transactions. Bottom-up monitoring ties into infrastructure monitoring tools. Reporting and analytics is used to analyze performance data and establish baselines. APM supports various ITIL processes like incident management, problem management and service level management.
An Introduction to Software Performance Engineering
Software performance engineering is becoming increasingly important to businesses as they look to improve the non-functional performance of applications and get more out of IT investments. By leveraging performance engineering techniques, IT professionals can be indispensable in building and optimizing scalable systems. This
introductory course will teach you the essentials of software
performance engineering including :
• The performance challenges faced by Enterprise IT today
• What is software performance engineering (SPE)?
• Best practices for building scalable software systems
• The approaches to integrating SPE into IT project lifecycles
• Common frameworks for measuring application performance and service levels
• The impact of SPE on software developers, testers, capacity planes,
and other IT professionals
• Case studies from the finance, retail, and insurance industries
Instructor: Walter Kuketz, SVP and CTO, Collaborative Consulting
This training is sponsored by Correlsense, Collaborative Consulting,
and New Horizons
Learn how Site24x7 gives you end-to-end application performance visibility for your Java, .NET and Ruby web transactions with metrics of all components starting from URLs to SQL queries.
The document discusses effective use of performance analysis in coaching rugby. It provides examples of how performance analysis has evolved from basic notation to advanced digital tools for game analysis, technique analysis, and player tracking. It emphasizes using permanent records of performance to stimulate athlete learning in both training and competition environments. The coach's role is to develop a strategic, periodized approach to performance analysis to support long-term athlete development.
Using dynaTrace to optimise application performance
The document discusses Nisa Retail's use of dynaTrace to improve service and cut costs. It provides an overview of Nisa Retail and Intechnica, and how dynaTrace was implemented at Nisa Retail to deliver business value. dynaTrace provided end-to-end application monitoring across all tiers, full transaction tracing, and proactive service level engineering to help optimize performance. This improved the user experience and helped Nisa Retail do more with less staff and budget.
Dynatrace is an APM solution that provides deep visibility into application performance across complex, distributed environments. It uses PurePath technology to capture timing and code-level context for all transactions end-to-end. This allows Dynatrace to identify performance issues and their root causes faster than other tools. Dynatrace can monitor Apache Tomcat servers and provide metrics on JVM performance, database queries, requests, and more. It helps diagnose common issues like inefficient database access, microservice problems, and coding issues.
Video: https://www.youtube.com/watch?v=uibLwoVKjec . Talk by Brendan Gregg for Sysdig CCWFS 2016. Abstract:
"You have a system with an advanced programmatic tracer: do you know what to do with it? Brendan has used numerous tracers in production environments, and has published hundreds of tracing-based tools. In this talk he will share tips and know-how for creating CLI tracing tools and GUI visualizations, to solve real problems effectively. Programmatic tracing is an amazing superpower, and this talk will show you how to wield it!"
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
The document summarizes a talk on container performance analysis. It discusses identifying bottlenecks at the host, container, and kernel level using various Linux performance tools. It also provides an overview of how containers work in Linux using namespaces and control groups (cgroups). Specifically, it demonstrates analyzing resource usage and limitations for containers using tools like docker stats, systemd-cgtop, and investigating namespaces.
Instrumenting the real-time web: Node.js in production
This document discusses instrumenting and running Node.js applications in production environments. It describes how Node.js is well-suited for building "DIRTy" real-time web applications due to its asynchronous and event-driven architecture. The document advocates for using dynamic instrumentation tools like DTrace to measure latency in Node.js and visualize latency data through techniques like 4D heatmaps to debug performance issues.
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
This document provides an overview of systemd and how it differs from traditional init systems. It discusses systemd units and how to manage services using systemctl. It covers customizing units using drop-ins, managing resources with cgroups, converting init scripts, and using the systemd journal. The presentation aims to demystify systemd and provide administrators with practical guidance on using its main features.
This document summarizes a presentation about tuning parallel code on Solaris. It discusses:
1) Using tools like DTrace, prstat, and vmstat to analyze performance issues like thread scheduling and I/O problems in parallel applications on Solaris.
2) Two examples of using DTrace to analyze thread scheduling and troubleshoot I/O performance problems in a virtualized Windows server.
3) How the examples demonstrated using DTrace to identify unbalanced thread scheduling and discover that a domain controller was disabling disk write caching, slowing performance.
System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree is an extension to Device Tree to describe all the hardware on an SoC, including heterogeneous CPU clusters and secure resources not typically visible to an Operating System like Linux. This full view allows the System Device Tree to be the "One true source" of the entire hardware description and helps to prevent the common (and hard-to-debug) problem of conflicting resources and system consistency. Lopper is an Open Source framework to parse and manipulate System Device Tree. With Lopper, it is possible to generate multiple traditional Device Trees from a single larger System Device Tree. This presentation will provide an overview of System Device Tree and will discuss the latest updates of the specification and tooling. The talk will illustrate multiple use-cases for System Device Tree with concrete examples, such as Linux running on the more powerful CPU cluster and Zephyr running on a smaller Cortex-R cluster. It will also show how to use Lopper to generate multiple traditional Device Trees targeting different OSes, not just Linux but also Zephyr/other RTOSes. Finally, an end-to-end demo based on Yocto to build a complete heterogeneous system with multiple OSes and RTOSes running on different clusters on a single reference board will be shown.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
PostgreSQL High Availability in a Containerized World
This document discusses high availability for PostgreSQL in a containerized environment. It outlines typical enterprise requirements for high availability including recovery time objectives and recovery point objectives. Shared storage-based high availability is described as well as the advantages and disadvantages of PostgreSQL replication. The use of Linux containers and orchestration tools like Kubernetes and Consul for managing containerized PostgreSQL clusters is also covered. The document advocates for using PostgreSQL replication along with services and self-healing tools to provide highly available and scalable PostgreSQL deployments in modern container environments.
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
1. The document discusses using OpenStack for a 4G core network, including performance issues and solutions when virtualizing the EPC network functions using OpenStack.
2. Key performance issues identified include high CPU usage, competing for CPU resources, latency, throughput, and packet loss. Solutions proposed are CPU pinning, NUMA awareness, hugepages, DPDK, SR-IOV, and offloading processing to smart NICs.
3. Going forward, the next steps discussed are using OVS-DPDK for offloading, SDN, containers, and cloud architectures for 5G.
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe several techniques available in R to speed up workloads like these, by running multiple iterations simultaneously, in parallel.
Many of these techniques require the use of a cluster of machines running R, and I'll provide examples of using cloud-based services to provision clusters for parallel computations. In particular, I will describe how you can use the SparklyR package to distribute data manipulations using the dplyr syntax, on a cluster of servers provisioned in the Azure cloud.
Presented by David Smith at Data Day Texas in Austin, January 27 2018.
Vinetalk: The missing piece for cluster managers to enable accelerator sharing
Vinetalk is a software abstraction layer that allows cluster managers like Mesos and Kubernetes to offer fractions of GPU resources, enabling more efficient sharing of accelerators. Existing cluster managers cannot share accelerators because device drivers do not support it. Vinetalk implements an abstraction layer that decouples executors from vendor-specific drivers, representing accelerators as virtual access queues. This allows multiple tasks to concurrently use the same physical accelerator. Vinetalk has been shown to reduce queuing times for tasks sharing a GPU compared to Mesos alone. It also easier for developers to use, hiding proprietary device APIs, and has low overhead of 1-5% due to memory transfers.
This document provides an introduction to CUDA programming. It discusses the programmer's view of the GPU as a co-processor with its own memory, and how GPUs are well-suited for data-parallel applications with many independent computations. It describes how CUDA uses a grid of blocks of threads to run kernels in parallel. Memory is organized into global, constant, shared, and local memory. Kernels launch a grid of blocks, and threads within blocks can cooperate through shared memory and synchronization.
Spark and Deep Learning frameworks with distributed workloads
The increasing complexity of learning algorithms and deep neural networks, combined with size of data and parameters, has made it challenging to exploit existing large-scale data processing pipelines for training and inference.
Approaches are outlined for preprocessing, training, inference, and deployment across datasets that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks.
This document provides an overview of Oracle performance tuning fundamentals. It discusses key concepts like wait events, statistics, CPU utilization, and the importance of understanding the operating system, database, and business needs. It also introduces tools for monitoring performance like AWR, ASH, and dynamic views. The goal is to establish a foundational understanding of Oracle performance concepts and monitoring techniques.
This document discusses tuning Oracle GoldenGate for optimal performance. It begins with an overview of GoldenGate architecture and use cases, then discusses the importance of baseline monitoring. Key metrics to monitor are identified as lag times, checkpoint information, CPU usage, memory usage, and disk I/O. The document provides examples of commands to gather baseline data on these metrics. It then discusses configuring GoldenGate for parallel processing using multiple process groups to optimize performance. Overall it provides guidance on setting baselines and configuring GoldenGate to minimize lag times and resource utilization.
The document discusses challenges with processor benchmarking and provides recommendations. It summarizes a case study where a popular CPU benchmark claimed a new processor was 2.6x faster than Intel, but detailed analysis found the benchmark was testing division speed, which accounted for only 0.1% of cycles on Netflix servers. The document advocates for low-level, active benchmarking and profiling over statistical analysis. It also provides a checklist for evaluating benchmarks and cautions that increased processor complexity and cloud environments make accurate benchmarking more difficult.
Performance Wins with eBPF: Getting Started (2021)
This document provides an overview of using eBPF (extended Berkeley Packet Filter) to quickly get performance wins as a sysadmin. It recommends installing BCC and bpftrace tools to easily find issues like periodic processes, misconfigurations, unexpected TCP sessions, or slow file system I/O. A case study examines using biosnoop to identify which processes were causing disk latency issues. The document suggests thinking like a sysadmin first by running tools, then like a programmer if a problem requires new tools. It also outlines recommended frontends depending on use cases and provides references to learn more about BPF.
Systems@Scale 2021 BPF Performance Getting Started
Talk for Facebook Systems@Scale 2021 by Brendan Gregg: "BPF (eBPF) tracing is the superpower that can analyze everything, helping you find performance wins, troubleshoot software, and more. But with many different front-ends and languages, and years of evolution, finding the right starting point can be hard. This talk will make it easy, showing how to install and run selected BPF tools in the bcc and bpftrace open source projects for some quick wins. Think like a sysadmin, not like a programmer."
Talk by Brendan Gregg for USENIX LISA 2021. https://www.youtube.com/watch?v=5nN1wjA_S30 . "The future of computer performance involves clouds with hardware hypervisors and custom processors, servers running a new type of BPF software to allow high-speed applications and kernel customizations, observability of everything in production, new Linux kernel technologies, and more. This talk covers interesting developments in systems and computing performance, their challenges, and where things are headed."
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
Keynote by Brendan Gregg for the eBPF summit, 2020. How to get started finding performance wins using the BPF (eBPF) technology. This short talk covers the quickest and easiest way to find performance wins using BPF observability tools on Linux.
The document discusses performance analysis methodologies, beginning with some anti-methodologies like blaming others or only using familiar tools. It then covers common methodologies like using ad hoc checklists of steps, characterizing the workload, and performing drill-down analysis using tools like the USE method and latency analysis to diagnose a database slowdown issue caused by memory pressure.
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
Talk by Brendan Gregg for USENIX ATC 2017.
"Flame graphs are a simple stack trace visualization that helps answer an everyday problem: how is software consuming resources, especially CPUs, and how did this change since the last software version? Flame graphs have been adopted by many languages, products, and companies, including Netflix, and have become a standard tool for performance analysis. They were published in "The Flame Graph" article in the June 2016 issue of Communications of the ACM, by their creator, Brendan Gregg.
This talk describes the background for this work, and the challenges encountered when profiling stack traces and resolving symbols for different languages, including for just-in-time compiler runtimes. Instructions will be included generating mixed-mode flame graphs on Linux, and examples from our use at Netflix with Java. Advanced flame graph types will be described, including differential, off-CPU, chain graphs, memory, and TCP events. Finally, future work and unsolved problems in this area will be discussed."
Slides for JavaOne 2015 talk by Brendan Gregg, Netflix (video/audio, of some sort, hopefully pending: follow @brendangregg on twitter for updates). Description: "At Netflix we dreamed of one visualization to show all CPU consumers: Java methods, GC, JVM internals, system libraries, and the kernel. With the help of Oracle this is now possible on x86 systems using system profilers (eg, Linux perf_events) and the new JDK option -XX:+PreserveFramePointer. This lets us create Java mixed-mode CPU flame graphs, exposing all CPU consumers. We can also use system profilers to analyze memory page faults, TCP events, storage I/O, and scheduler events, also with Java method context. This talk describes the background for this work, instructions generating Java mixed-mode flame graphs, and examples from our use at Netflix where Java on x86 is the primary platform for the Netflix cloud."
ACM Applicative System Methodology 2016Brendan Gregg
Video: https://youtu.be/eO94l0aGLCA?t=3m37s . Talk by Brendan Gregg for ACM Applicative 2016
"System Methodology - Holistic Performance Analysis on Modern Systems
Traditional systems performance engineering makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. For modern systems, we can choose the metrics, and can choose ones we need to support new holistic performance analysis methodologies. These methodologies provide faster, more accurate, and more complete analysis, and can provide a starting point for unfamiliar systems.
Methodologies are especially helpful for modern applications and their workloads, which can pose extremely complex problems with no obvious starting point. There are also continuous deployment environments such as the Netflix cloud, where these problems must be solved in shorter time frames. Fortunately, with advances in system observability and tracers, we have virtually endless custom metrics to aid performance analysis. The problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.
System methodologies provide a starting point for analysis, as well as guidance for quickly moving through the metrics to root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, and more.
This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. The focus is on single systems (any operating system), including single cloud instances, and quickly locating performance issues or exonerating the system. Many methodologies will be discussed, along with recommendations for their implementation, which may be as documented checklists of tools, or custom dashboards of supporting metrics. In general, you will learn to think differently about your systems, and how to ask better questions."
Tracing Summit 2014, Düsseldorf. What can Linux learn from DTrace: what went well, and what didn't go well, on its path to success? This talk will discuss not just the DTrace software, but lessons from the marketing and adoption of a system tracer, and an inside look at how DTrace was really deployed and used in production environments. It will also cover ongoing problems with DTrace, and how Linux may surpass them and continue to advance the field of system tracing. A world expert and core contributor to DTrace, Brendan now works at Netflix on Linux performance with the various Linux tracers (ftrace, perf_events, eBPF, SystemTap, ktap, sysdig, LTTng, and the DTrace Linux ports), and will summarize his experiences and suggestions for improvements. He has also been contributing to various tracers: recently promoting ftrace and perf_events adoption through articles and front-end scripts, and testing eBPF.
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
This document discusses using Linux perf_events (perf) profiling tools to analyze Java performance on Linux. It describes how perf can provide complete visibility into Java, JVM, GC and system code but that Java profilers have limitations. It presents the solution of using perf to collect mixed-mode flame graphs that include Java method names and symbols. It also discusses fixing issues with broken Java stacks and missing symbols on x86 architectures in perf profiles.
This document summarizes a presentation on flame graphs for profiling CPU and memory performance on FreeBSD. It introduces flame graphs as a way to visualize stack profiles to easily compare performance across systems. Examples are given profiling MySQL workload CPU usage on two hosts to identify a 30% performance difference. Commands are provided to generate flame graphs from DTrace profiles of CPU stack sampling and page faults.
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg.
There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes.
This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
Systems Performance: Enterprise and the CloudBrendan Gregg
My talk for BayLISA, Oct 2013, launching the Systems Performance book. Operating system performance analysis and tuning leads to a better end-user experience and lower costs, especially for cloud computing environments that pay by the operating system instance. This book covers concepts, strategy, tools and tuning for Unix operating systems, with a focus on Linux- and Solaris-based systems. The book covers the latest tools and techniques, including static and dynamic tracing, to get the most out of your systems.
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
Your AMI is one of the core foundations for running applications and services effectively on Amazon EC2. In this session, you'll learn how to optimize your AMI, including how you can measure and diagnose system performance and tune parameters for improved CPU and network performance. We'll cover application-specific examples from Netflix on how optimized AMIs can lead to improved performance.
This document provides a performance engineer's predictions for computing performance trends in 2021 and beyond. The engineer discusses trends in processors, memory, disks, networking, runtimes, kernels, hypervisors, and observability. For processors, predictions include multi-socket systems becoming less common, the future of simultaneous multithreading being unclear, practical core count limits being reached in the 2030s, and more processor vendors including ARM-based and RISC-V options. Memory predictions focus on many workloads being memory-bound currently.
Introduction to DTrace (Dynamic Tracing), written by Brendan Gregg and delivered in 2007. While aimed at a Solaris-based audience, this introduction is still largely relevant today (2012). Since then, DTrace has appeared in other operating systems (Mac OS X, FreeBSD, and is being ported to Linux), and, many user-level providers have been developed to aid tracing of other languages.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
MeetBSDCA 2014 Performance Analysis for BSD, by Brendan Gregg. A tour of five relevant topics: observability tools, methodologies, benchmarking, profiling, and tracing. Tools summarized include pmcstat and DTrace.
EuroBSDcon 2017 System Performance Analysis MethodologiesBrendan Gregg
keynote by Brendan Gregg. "Traditional performance monitoring makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. Modern BSD has advanced tracers and PMC tools, providing virtually endless metrics to aid performance analysis. It's time we really used them, but the problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.
There's a new way to approach performance analysis that can guide you through the metrics. Instead of starting with traditional metrics and figuring out their use, you start with the questions you want answered then look for metrics to answer them. Methodologies can provide these questions, as well as a starting point for analysis and guidance for locating the root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, chain graphs, and more.
This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. Many methodologies will be discussed, from the production proven to the cutting edge, along with recommendations for their implementation on BSD systems. In general, you will learn to think differently about analyzing your systems, and make better use of the modern tools that BSD provides."
The document summarizes a talk on container performance analysis. It discusses identifying bottlenecks at the host, container, and kernel level using various Linux performance tools. It then provides an overview of how containers work in Linux using namespaces and control groups (cgroups). Finally, it demonstrates some example commands like docker stats, systemd-cgtop, and bcc/BPF tools that can be used to analyze containers and cgroups from the host system.
Linux Performance 2018 (PerconaLive keynote)Brendan Gregg
Keynote for PerconaLive 2018 by Brendan Gregg. Video: https://youtu.be/sV3XfrfjrPo?t=30m51s . "At over one thousand code commits per week, it's hard to keep up with Linux developments. This keynote will summarize recent Linux performance features, for a wide audience: the KPTI patches for Meltdown, eBPF for performance observability, Kyber for disk I/O scheduling, BBR for TCP congestion control, and more. This is about exposure: knowing what exists, so you can learn and use it later when needed. Get the most out of your systems, whether they are databases or application servers, with the latest Linux kernels and exciting features."
Delivered at the FISL13 conference in Brazil: http://www.youtube.com/watch?v=K9w2cipqfvc
This talk introduces the USE Method: a simple strategy for performing a complete check of system performance health, identifying common bottlenecks and errors. This methodology can be used early in a performance investigation to quickly identify the most severe system performance issues, and is a methodology the speaker has used successfully for years in both enterprise and cloud computing environments. Checklists have been developed to show how the USE Method can be applied to Solaris/illumos-based and Linux-based systems.
Many hardware and software resource types have been commonly overlooked, including memory and I/O busses, CPU interconnects, and kernel locks. Any of these can become a system bottleneck. The USE Method provides a way to find and identify these.
This approach focuses on the questions to ask of the system, before reaching for the tools. Tools that are ultimately used include all the standard performance tools (vmstat, iostat, top), and more advanced tools, including dynamic tracing (DTrace), and hardware performance counters.
Other performance methodologies are included for comparison: the Problem Statement Method, Workload Characterization Method, and Drill-Down Analysis Method.
Video: http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x ; This talk for SCaLE11x covers system performance analysis methodologies and the Linux tools to support them, so that you can get the most out of your systems and solve performance issues quickly. This includes a wide variety of tools, including basics like top(1), advanced tools like perf, and new tools like the DTrace for Linux prototypes.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Real-time in the real world: DIRT in productionbcantrill
This document discusses the challenges of building and debugging DIRT (data-intensive real-time) applications in production. It provides examples from the mobile push-to-talk app Voxer, which is described as a canonical DIRT app. Specific issues covered include application restarts inducing latency bubbles, dropped TCP connections causing latency outliers, and identifying sources of slow disk I/O. Tools like DTrace are highlighted as being essential for instrumentation and problem diagnosis in DIRT apps.
HTTP applications concentrate many performance issues:
- They are a common way to let internal & external users access and modify data.
- They rely on a delivery chain which contains many elements, which are all performance drivers: browser, workstation, network, front-server, application server, file server, database, images, etc...
- They raise specific troubleshooting issues: among others, the end user feedback is based on the concept of page, while most network based performance analysis is based on every transaction / object in the page (html, css, image, script, etc.)
This one-hour webinar will enable you to:
- Understand the challenges of performance troubleshooting for HTTP Applications.
- View a series of concrete diagnostic cases with Performance Vision newest version.
Performance analysis aims to capture, analyze, and evaluate key components of performance through systematic observation. Coaches observe to better understand technical, tactical, behavioral, and physical aspects of performance. They then provide feedback to improve future practice. Coaches use various methods like notation, video, biomechanics, tests, and questionnaires to gather both qualitative and quantitative data on performance. Technology applications and software programs help support detailed analysis.
A presentation from SEO Campixx Barcamp 2011 in Berlin. Web Performance Optimization is about making websites faster. Here i discussed different measures and show the impact on competitive advantage and possibly rankings on Google. Undeniably you can say that better performance leads to more sales and better usability in terms of bouncing rates. View image slides here: http://b0i.de/wpopresentation
A presentation that provides an overview of software testing approaches including "schools" of software testing and a variety of testing techniques and practices.
"DTracing the Cloud", Brendan Gregg, illumosday 2012
Cloud computing facilitates rapid deployment and scaling, often pushing high load at applications under continual development. DTrace allows immediate analysis of issues on live production systems even in these demanding environments – no need to restart or run a special debug kernel.
For the illumos kernel, DTrace has been enhanced to support cloud computing, providing more observation capabilities to zones as used by Joyent SmartMachine customers. DTrace is also frequently used by the cloud operators to analyze systems and verify performance isolation of tenants.
This talk covers DTrace in the illumos-based cloud, showing examples of real-world performance wins.
A brief talk on systems performance for the July 2013 meetup "A Midsummer Night's System", video: http://www.youtube.com/watch?v=P3SGzykDE4Q. This summarizes how systems performance has changed from the 1990's to today. This was the reason for writing a new book on systems performance, to provide a reference that is up to date, covering new tools, technologies, and methodologies.
LinuxCon Europe, 2014. Video: https://www.youtube.com/watch?v=SN7Z0eCn0VY . There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This talk summarizes the three types of performance tools: observability, benchmarking, and tuning, providing a tour of what exists and why they exist. Advanced tools including those based on tracepoints, kprobes, and uprobes are also included: perf_events, ktap, SystemTap, LTTng, and sysdig. You'll gain a good understanding of the performance tools landscape, knowing what to reach for to get the most out of your systems.
Measuring the Performance of Single Page ApplicationsNicholas Jansma
Single page applications are a problem for RUM tools because there are no easy ways to tell when a new page component has been requested asynchronously as a result of an intentional user action. Many network requests are back-end service calls initiated periodically by the app – for example, a ping to check if content has been updated, or to check if the current user should still be signed in to their account.
Even with requests that are initiated by a user action, not all may fit into the definition of a “page view.” For example, a user typing into a search box that has auto-complete capabilities will often result in network requests, but these requests result in very small amounts of data transfer, happen very frequently, and do not count toward page views. The scene is further complicated by SPA frameworks like Angular, Backbone, and others.
In this talk, we’ll learn about some of the tricks used by boomerang to measure the performance of single page applications, going as far as capturing errors and waterfall information across browsers.
Application Performance Management - Solving the Performance PuzzleLDragich
The document outlines a methodology for Application Performance Management (APM). It discusses various components of an APM strategy including top-down monitoring, bottom-up monitoring, reporting and analytics, and aligning with ITIL processes. Top-down monitoring focuses on real-time application monitoring using techniques like synthetic transactions. Bottom-up monitoring ties into infrastructure monitoring tools. Reporting and analytics is used to analyze performance data and establish baselines. APM supports various ITIL processes like incident management, problem management and service level management.
An Introduction to Software Performance EngineeringCorrelsense
Software performance engineering is becoming increasingly important to businesses as they look to improve the non-functional performance of applications and get more out of IT investments. By leveraging performance engineering techniques, IT professionals can be indispensable in building and optimizing scalable systems. This
introductory course will teach you the essentials of software
performance engineering including :
• The performance challenges faced by Enterprise IT today
• What is software performance engineering (SPE)?
• Best practices for building scalable software systems
• The approaches to integrating SPE into IT project lifecycles
• Common frameworks for measuring application performance and service levels
• The impact of SPE on software developers, testers, capacity planes,
and other IT professionals
• Case studies from the finance, retail, and insurance industries
Instructor: Walter Kuketz, SVP and CTO, Collaborative Consulting
This training is sponsored by Correlsense, Collaborative Consulting,
and New Horizons
Learn how Site24x7 gives you end-to-end application performance visibility for your Java, .NET and Ruby web transactions with metrics of all components starting from URLs to SQL queries.
The document discusses effective use of performance analysis in coaching rugby. It provides examples of how performance analysis has evolved from basic notation to advanced digital tools for game analysis, technique analysis, and player tracking. It emphasizes using permanent records of performance to stimulate athlete learning in both training and competition environments. The coach's role is to develop a strategic, periodized approach to performance analysis to support long-term athlete development.
Using dynaTrace to optimise application performanceRichard Bishop
The document discusses Nisa Retail's use of dynaTrace to improve service and cut costs. It provides an overview of Nisa Retail and Intechnica, and how dynaTrace was implemented at Nisa Retail to deliver business value. dynaTrace provided end-to-end application monitoring across all tiers, full transaction tracing, and proactive service level engineering to help optimize performance. This improved the user experience and helped Nisa Retail do more with less staff and budget.
Dynatrace is an APM solution that provides deep visibility into application performance across complex, distributed environments. It uses PurePath technology to capture timing and code-level context for all transactions end-to-end. This allows Dynatrace to identify performance issues and their root causes faster than other tools. Dynatrace can monitor Apache Tomcat servers and provide metrics on JVM performance, database queries, requests, and more. It helps diagnose common issues like inefficient database access, microservice problems, and coding issues.
Video: https://www.youtube.com/watch?v=uibLwoVKjec . Talk by Brendan Gregg for Sysdig CCWFS 2016. Abstract:
"You have a system with an advanced programmatic tracer: do you know what to do with it? Brendan has used numerous tracers in production environments, and has published hundreds of tracing-based tools. In this talk he will share tips and know-how for creating CLI tracing tools and GUI visualizations, to solve real problems effectively. Programmatic tracing is an amazing superpower, and this talk will show you how to wield it!"
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Container Performance Analysis Brendan Gregg, NetflixDocker, Inc.
The document summarizes a talk on container performance analysis. It discusses identifying bottlenecks at the host, container, and kernel level using various Linux performance tools. It also provides an overview of how containers work in Linux using namespaces and control groups (cgroups). Specifically, it demonstrates analyzing resource usage and limitations for containers using tools like docker stats, systemd-cgtop, and investigating namespaces.
Instrumenting the real-time web: Node.js in productionbcantrill
This document discusses instrumenting and running Node.js applications in production environments. It describes how Node.js is well-suited for building "DIRTy" real-time web applications due to its asynchronous and event-driven architecture. The document advocates for using dynamic instrumentation tools like DTrace to measure latency in Node.js and visualize latency data through techniques like 4D heatmaps to debug performance issues.
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
This document provides an overview of systemd and how it differs from traditional init systems. It discusses systemd units and how to manage services using systemctl. It covers customizing units using drop-ins, managing resources with cgroups, converting init scripts, and using the systemd journal. The presentation aims to demystify systemd and provide administrators with practical guidance on using its main features.
This document summarizes a presentation about tuning parallel code on Solaris. It discusses:
1) Using tools like DTrace, prstat, and vmstat to analyze performance issues like thread scheduling and I/O problems in parallel applications on Solaris.
2) Two examples of using DTrace to analyze thread scheduling and troubleshoot I/O performance problems in a virtualized Windows server.
3) How the examples demonstrated using DTrace to identify unbalanced thread scheduling and discover that a domain controller was disabling disk write caching, slowing performance.
System Device Tree and Lopper: Concrete Examples - ELC NA 2022Stefano Stabellini
System Device Tree is an extension to Device Tree to describe all the hardware on an SoC, including heterogeneous CPU clusters and secure resources not typically visible to an Operating System like Linux. This full view allows the System Device Tree to be the "One true source" of the entire hardware description and helps to prevent the common (and hard-to-debug) problem of conflicting resources and system consistency. Lopper is an Open Source framework to parse and manipulate System Device Tree. With Lopper, it is possible to generate multiple traditional Device Trees from a single larger System Device Tree. This presentation will provide an overview of System Device Tree and will discuss the latest updates of the specification and tooling. The talk will illustrate multiple use-cases for System Device Tree with concrete examples, such as Linux running on the more powerful CPU cluster and Zephyr running on a smaller Cortex-R cluster. It will also show how to use Lopper to generate multiple traditional Device Trees targeting different OSes, not just Linux but also Zephyr/other RTOSes. Finally, an end-to-end demo based on Yocto to build a complete heterogeneous system with multiple OSes and RTOSes running on different clusters on a single reference board will be shown.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
PostgreSQL High Availability in a Containerized WorldJignesh Shah
This document discusses high availability for PostgreSQL in a containerized environment. It outlines typical enterprise requirements for high availability including recovery time objectives and recovery point objectives. Shared storage-based high availability is described as well as the advantages and disadvantages of PostgreSQL replication. The use of Linux containers and orchestration tools like Kubernetes and Consul for managing containerized PostgreSQL clusters is also covered. The document advocates for using PostgreSQL replication along with services and self-healing tools to provide highly available and scalable PostgreSQL deployments in modern container environments.
1. The document discusses using OpenStack for a 4G core network, including performance issues and solutions when virtualizing the EPC network functions using OpenStack.
2. Key performance issues identified include high CPU usage, competing for CPU resources, latency, throughput, and packet loss. Solutions proposed are CPU pinning, NUMA awareness, hugepages, DPDK, SR-IOV, and offloading processing to smart NICs.
3. Going forward, the next steps discussed are using OVS-DPDK for offloading, SDN, containers, and cloud architectures for 5G.
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe several techniques available in R to speed up workloads like these, by running multiple iterations simultaneously, in parallel.
Many of these techniques require the use of a cluster of machines running R, and I'll provide examples of using cloud-based services to provision clusters for parallel computations. In particular, I will describe how you can use the SparklyR package to distribute data manipulations using the dplyr syntax, on a cluster of servers provisioned in the Azure cloud.
Presented by David Smith at Data Day Texas in Austin, January 27 2018.
Vinetalk is a software abstraction layer that allows cluster managers like Mesos and Kubernetes to offer fractions of GPU resources, enabling more efficient sharing of accelerators. Existing cluster managers cannot share accelerators because device drivers do not support it. Vinetalk implements an abstraction layer that decouples executors from vendor-specific drivers, representing accelerators as virtual access queues. This allows multiple tasks to concurrently use the same physical accelerator. Vinetalk has been shown to reduce queuing times for tasks sharing a GPU compared to Mesos alone. It also easier for developers to use, hiding proprietary device APIs, and has low overhead of 1-5% due to memory transfers.
002 - Introduction to CUDA Programming_1.pptceyifo9332
This document provides an introduction to CUDA programming. It discusses the programmer's view of the GPU as a co-processor with its own memory, and how GPUs are well-suited for data-parallel applications with many independent computations. It describes how CUDA uses a grid of blocks of threads to run kernels in parallel. Memory is organized into global, constant, shared, and local memory. Kernels launch a grid of blocks, and threads within blocks can cooperate through shared memory and synchronization.
Spark and Deep Learning frameworks with distributed workloadsS N
The increasing complexity of learning algorithms and deep neural networks, combined with size of data and parameters, has made it challenging to exploit existing large-scale data processing pipelines for training and inference.
Approaches are outlined for preprocessing, training, inference, and deployment across datasets that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks.
This document provides an overview of Oracle performance tuning fundamentals. It discusses key concepts like wait events, statistics, CPU utilization, and the importance of understanding the operating system, database, and business needs. It also introduces tools for monitoring performance like AWR, ASH, and dynamic views. The goal is to establish a foundational understanding of Oracle performance concepts and monitoring techniques.
This document discusses tuning Oracle GoldenGate for optimal performance. It begins with an overview of GoldenGate architecture and use cases, then discusses the importance of baseline monitoring. Key metrics to monitor are identified as lag times, checkpoint information, CPU usage, memory usage, and disk I/O. The document provides examples of commands to gather baseline data on these metrics. It then discusses configuring GoldenGate for parallel processing using multiple process groups to optimize performance. Overall it provides guidance on setting baselines and configuring GoldenGate to minimize lag times and resource utilization.
Similar to Performance Analysis: new tools and concepts from the cloud (20)
The document discusses challenges with processor benchmarking and provides recommendations. It summarizes a case study where a popular CPU benchmark claimed a new processor was 2.6x faster than Intel, but detailed analysis found the benchmark was testing division speed, which accounted for only 0.1% of cycles on Netflix servers. The document advocates for low-level, active benchmarking and profiling over statistical analysis. It also provides a checklist for evaluating benchmarks and cautions that increased processor complexity and cloud environments make accurate benchmarking more difficult.
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
This document provides an overview of using eBPF (extended Berkeley Packet Filter) to quickly get performance wins as a sysadmin. It recommends installing BCC and bpftrace tools to easily find issues like periodic processes, misconfigurations, unexpected TCP sessions, or slow file system I/O. A case study examines using biosnoop to identify which processes were causing disk latency issues. The document suggests thinking like a sysadmin first by running tools, then like a programmer if a problem requires new tools. It also outlines recommended frontends depending on use cases and provides references to learn more about BPF.
Talk for Facebook Systems@Scale 2021 by Brendan Gregg: "BPF (eBPF) tracing is the superpower that can analyze everything, helping you find performance wins, troubleshoot software, and more. But with many different front-ends and languages, and years of evolution, finding the right starting point can be hard. This talk will make it easy, showing how to install and run selected BPF tools in the bcc and bpftrace open source projects for some quick wins. Think like a sysadmin, not like a programmer."
Computing Performance: On the Horizon (2021)Brendan Gregg
Talk by Brendan Gregg for USENIX LISA 2021. https://www.youtube.com/watch?v=5nN1wjA_S30 . "The future of computer performance involves clouds with hardware hypervisors and custom processors, servers running a new type of BPF software to allow high-speed applications and kernel customizations, observability of everything in production, new Linux kernel technologies, and more. This talk covers interesting developments in systems and computing performance, their challenges, and where things are headed."
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
Performance Wins with BPF: Getting StartedBrendan Gregg
Keynote by Brendan Gregg for the eBPF summit, 2020. How to get started finding performance wins using the BPF (eBPF) technology. This short talk covers the quickest and easiest way to find performance wins using BPF observability tools on Linux.
Talk for YOW! by Brendan Gregg. "Systems performance studies the performance of computing systems, including all physical components and the full software stack to help you find performance wins for your application and kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (ftrace, bcc/BPF, and bpftrace/BPF), advice about what is and isn't important to learn, and case studies to see how it is applied. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud.
"
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
This document provides an overview of Brendan Gregg's presentation on BPF performance analysis at Netflix. It discusses:
- Why BPF is changing the Linux OS model to become more event-based and microkernel-like.
- The internals of BPF including its origins, instruction set, execution model, and how it is integrated into the Linux kernel.
- How BPF enables a new class of custom, efficient, and safe performance analysis tools for analyzing various Linux subsystems like CPUs, memory, disks, networking, applications, and the kernel.
- Examples of specific BPF-based performance analysis tools developed by Netflix, AWS, and others for analyzing tasks, scheduling, page faults
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
BPF (Berkeley Packet Filter) has evolved from a limited virtual machine for efficient packet filtering to a new type of software called extended BPF. Extended BPF allows for custom, efficient, and production-safe performance analysis tools and observability programs to be run in the Linux kernel through BPF. It enables new event-based applications running as BPF programs attached to various kernel events like kprobes, uprobes, tracepoints, sockets, and more. Major companies like Facebook, Google, and Netflix are using BPF programs for tasks like intrusion detection, container security, firewalling, and observability with over 150,000 AWS instances running BPF programs. BPF provides a new program model and security features compared
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
This document discusses Brendan Gregg's opinions on various tracing tools including sysdig, perf, ftrace, eBPF, bpftrace, and BPF perf tools. It provides a table comparing the scope, capability, and ease of use of these tools. It then gives an example of using BPF perf tools to analyze readahead performance. Finally, it outlines desired additions to tracing capabilities and BPF helpers as well as challenges in areas like function tracing without frame pointers.
Here is a bpftrace program to measure scheduler latency for ICMP echo requests:
#!/usr/local/bin/bpftrace
kprobe:icmp_send {
@start[tid] = nsecs;
}
kprobe:__netif_receive_skb_core {
@diff[tid] = hist(nsecs - @start[tid]);
delete(@start[tid]);
}
END {
print(@diff);
clear(@diff);
}
This traces the time between the icmp_send kernel function (when the packet is queued for transmit) and the __netif_receive_skb_core function (when the response packet is received). The
This document summarizes Brendan Gregg's experiences working at Netflix for over 4.5 years. Some key points include:
- The company culture at Netflix is openly documented and encourages independent decision making, open communication, and sharing information broadly.
- Gregg's first meeting involved an expected "intense debate" but was actually professional and respectful.
- Netflix values judgment, communication, curiosity, courage, and other traits that allow the culture and architecture to complement each other.
- The cloud architecture is designed to be resilient through practices like chaos engineering and rapid deployments without approvals, in line with the culture of freedom and responsibility.
The document describes a biolatency tool that traces block device I/O latency using eBPF. It discusses how the tool was originally written in the bcc framework using C/BPF, but has since been rewritten in the bpftrace framework using a simpler one-liner script. It provides examples of the bcc and bpftrace implementations of biolatency.
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
Keynote by Brendan Gregg for YOW! 2018. Video: https://www.youtube.com/watch?v=03EC8uA30Pw . Description: "At Netflix, improving the performance of our cloud means happier customers and lower costs, and involves root cause
analysis of applications, runtimes, operating systems, and hypervisors, in an environment of 150k cloud instances
that undergo numerous production changes each week. Apart from the developers who regularly optimize their own code
, we also have a dedicated performance team to help with any issue across the cloud, and to build tooling to aid in
this analysis. In this session we will summarize the Netflix environment, procedures, and tools we use and build t
o do root cause analysis on cloud performance issues. The analysis performed may be cloud-wide, using self-service
GUIs such as our open source Atlas tool, or focused on individual instances, and use our open source Vector tool, f
lame graphs, Java debuggers, and tooling that uses Linux perf, ftrace, and bcc/eBPF. You can use these open source
tools in the same way to find performance wins in your own environment."
Talk by Brendan Gregg and Martin Spier for the Linkedin Performance Engineering meetup on Nov 8, 2018. FlameScope is a visualization for performance profiles that helps you study periodic activity, variance, and perturbations, with a heat map for navigation and flame graphs for code analysis.
Talk by Brendan Gregg for All Things Open 2018. "At over one thousand code commits per week, it's hard to keep up with Linux developments. This keynote will summarize recent Linux performance features,
for a wide audience: the KPTI patches for Meltdown, eBPF for performance observability and the new open source tools that use it, Kyber for disk I/O sc
heduling, BBR for TCP congestion control, and more. This is about exposure: knowing what exists, so you can learn and use it later when needed. Get the
most out of your systems with the latest Linux kernels and exciting features."
Talk for USENIX LISA17: "Containers pose interesting challenges for performance monitoring and analysis, requiring new analysis methodologies and tooling. Resource-oriented analysis, as is common with systems performance tools and GUIs, must now account for both hardware limits and soft limits, as implemented using cgroups. A reverse diagnosis methodology can be applied to identify whether a container is resource constrained, and by which hard or soft resource. The interaction between the host and containers can also be examined, and noisy neighbors identified or exonerated. Performance tooling can need special usage or workarounds to function properly from within a container or on the host, to deal with different privilege levels and name spaces. At Netflix, we're using containers for some microservices, and care very much about analyzing and tuning our containers to be as fast and efficient as possible. This talk will show you how to identify bottlenecks in the host or container configuration, in the applications by profiling in a container environment, and how to dig deeper into kernel and container internals."
How Social Media Hackers Help You to See Your Wife's Message.pdfHackersList
In the modern digital era, social media platforms have become integral to our daily lives. These platforms, including Facebook, Instagram, WhatsApp, and Snapchat, offer countless ways to connect, share, and communicate.
7 Most Powerful Solar Storms in the History of Earth.pdfEnterprise Wired
Solar Storms (Geo Magnetic Storms) are the motion of accelerated charged particles in the solar environment with high velocities due to the coronal mass ejection (CME).
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
Best Programming Language for Civil EngineersAwais Yaseen
The integration of programming into civil engineering is transforming the industry. We can design complex infrastructure projects and analyse large datasets. Imagine revolutionizing the way we build our cities and infrastructure, all by the power of coding. Programming skills are no longer just a bonus—they’re a game changer in this era.
Technology is revolutionizing civil engineering by integrating advanced tools and techniques. Programming allows for the automation of repetitive tasks, enhancing the accuracy of designs, simulations, and analyses. With the advent of artificial intelligence and machine learning, engineers can now predict structural behaviors under various conditions, optimize material usage, and improve project planning.
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxSynapseIndia
Your comprehensive guide to RPA in healthcare for 2024. Explore the benefits, use cases, and emerging trends of robotic process automation. Understand the challenges and prepare for the future of healthcare automation
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc
Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk.
What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year?
Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year.
This webinar will review:
- Key changes to privacy regulations in 2024
- Key themes in privacy and data governance in 2024
- How to maximize your privacy program in the second half of 2024
YOUR RELIABLE WEB DESIGN & DEVELOPMENT TEAM — FOR LASTING SUCCESS
WPRiders is a web development company specialized in WordPress and WooCommerce websites and plugins for customers around the world. The company is headquartered in Bucharest, Romania, but our team members are located all over the world. Our customers are primarily from the US and Western Europe, but we have clients from Australia, Canada and other areas as well.
Some facts about WPRiders and why we are one of the best firms around:
More than 700 five-star reviews! You can check them here.
1500 WordPress projects delivered.
We respond 80% faster than other firms! Data provided by Freshdesk.
We’ve been in business since 2015.
We are located in 7 countries and have 22 team members.
With so many projects delivered, our team knows what works and what doesn’t when it comes to WordPress and WooCommerce.
Our team members are:
- highly experienced developers (employees & contractors with 5 -10+ years of experience),
- great designers with an eye for UX/UI with 10+ years of experience
- project managers with development background who speak both tech and non-tech
- QA specialists
- Conversion Rate Optimisation - CRO experts
They are all working together to provide you with the best possible service. We are passionate about WordPress, and we love creating custom solutions that help our clients achieve their goals.
At WPRiders, we are committed to building long-term relationships with our clients. We believe in accountability, in doing the right thing, as well as in transparency and open communication. You can read more about WPRiders on the About us page.
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Chris Swan
Have you noticed the OpenSSF Scorecard badges on the official Dart and Flutter repos? It's Google's way of showing that they care about security. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
You can do the same for your projects, and this presentation will show you how, with an emphasis on the unique challenges that come up when working with Dart and Flutter.
The session will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across an organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
Best Practices for Effectively Running dbt in Airflow.pdfTatiana Al-Chueyr
As a popular open-source library for analytics engineering, dbt is often used in combination with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models.
This webinar will cover a step-by-step guide to Cosmos, an open source package from Astronomer that helps you easily run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through:
- Standard ways of running dbt (and when to utilize other methods)
- How Cosmos can be used to run and visualize your dbt projects in Airflow
- Common challenges and how to address them, including performance, dependency conflicts, and more
- How running dbt projects in Airflow helps with cost optimization
Webinar given on 9 July 2024
How RPA Help in the Transportation and Logistics Industry.pptxSynapseIndia
Revolutionize your transportation processes with our cutting-edge RPA software. Automate repetitive tasks, reduce costs, and enhance efficiency in the logistics sector with our advanced solutions.
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
Performance Analysis: new tools and concepts from the cloud
1. Performance Analysis:
new tools and concepts
from the cloud
Brendan Gregg
Lead Performance Engineer, Joyent
brendan.gregg@joyent.com
SCaLE10x
Jan, 2012
2. whoami
• I do performance analysis
• I also write performance tools out of necessity
• Was Brendan @ Sun Microsystems, Oracle,
now Joyent
3. Joyent
• Cloud computing provider
• Cloud computing software
• SmartOS
• host OS, and guest via OS virtualization
• Linux, Windows
• guest via KVM
4. Agenda
• Data
• Example problems & solutions
• How cloud environments complicate performance
• Theory
• Performance analysis
• Summarize new tools & concepts
• This talk uses SmartOS and DTrace to illustrate
concepts that are applicable to most OSes.
5. Data
• Example problems:
• CPU
• Memory
• Disk
• Network
• Some have neat solutions, some messy, some none
• This is real world
• Some I’ve covered before, some I haven’t
7. CPU utilization: problem
• Would like to identify:
• single or multiple CPUs at 100% utilization
• average, minimum and maximum CPU utilization
• CPU utilization balance (tight or loose distribution)
• time-based characteristics
changing/bursting? burst interval, burst length
• For small to large environments
• entire datacenters or clouds
15. CPU utilization
• Available in Cloud Analytics (Joyent)
• Clicking highlights and shows details; eg, hostname:
16. CPU utilization
• Utilization heat map also suitable and used for:
• disks
• network interfaces
• Utilization as a metric can be a bit misleading
• really a percent busy over a time interval
• devices may accept more work at 100% busy
• may not directly relate to performance impact
18. CPU usage
• Given a CPU is hot, what is it doing?
• Beyond just vmstat’s usr/sys ratio
• Profiling (sampling at an interval) the program
counter or stack back trace
• user-land stack for %usr
• kernel stack for %sys
• Many tools can do this to some degree
• Developer Studios/DTrace/oprofile/...
23. CPU usage: Flame Graphs
• Just some Perl that turns DTrace output into an
interactive SVG: mouse-over elements for details
• It’s on github
• http://github.com/brendangregg/FlameGraph
• Works on kernel stacks, and both user+kernel
• Shouldn’t be hard to have it process oprofile, etc.
24. CPU usage: on the Cloud
• Flame Graphs were born out of necessity on
Cloud environments:
• Perf issues need quick resolution
(you just got hackernews’d)
• Everyone is running different versions of everything
(don’t assume you’ve seen the last of old CPU-hot
code-path issues that have been fixed)
25. CPU usage: summary
• Data can be available
• For cloud computing: easy for operators to fetch on
OS virtualized environments; otherwise agent driven,
and possibly other difficulties (access to CPU
instrumentation counter-based interrupts)
• Using a new visualization
26. CPU latency
• CPU dispatcher queue latency
• thread is ready-to-run, and waiting its turn
• Observable in coarse ways:
• vmstat’s r
• high load averages
• Less course, with microstate accounting
• prstat -mL’s LAT
• How much is it affecting application performance?
27. CPU latency: zonedispqlat.d
• Using DTrace to trace kernel scheduler events:
#./zonedisplat.d
Tracing...
Note: outliers (> 1 secs) may be artifacts due to the use of scalar globals
(sorry).
CPU disp queue latency by zone (ns):
dbprod-045
value
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
[...]
------------- Distribution ------------- count
|
0
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@
10210
|@@@@@@@@@@
3829
|@
514
|
94
|
0
|
0
|
0
|
0
|
0
|
0
|
1
|
0
|
0
|
1
|
0
28. CPU latency: zonedispqlat.d
• CPU dispatcher queue latency by zonename
(zonedispqlat.d), work in progress:
#!/usr/sbin/dtrace -s
#pragma D option quiet
dtrace:::BEGIN
{
printf("Tracing...n");
printf("Note: outliers (> 1 secs) may be artifacts due to the ");
printf("use of scalar globals (sorry).nn");
}
sched:::enqueue
{
/* scalar global (I don't think this can be thread local) */
start[args[0]->pr_lwpid, args[1]->pr_pid] = timestamp;
}
sched:::dequeue
/this->start = start[args[0]->pr_lwpid, args[1]->pr_pid]/
{
this->time = timestamp - this->start;
/* workaround since zonename isn't a member of args[1]... */
this->zone = ((proc_t *)args[1]->pr_addr)->p_zone->zone_name;
@[stringof(this->zone)] = quantize(this->time);
start[args[0]->pr_lwpid, args[1]->pr_pid] = 0;
}
tick-1sec
{
printf("CPU disp queue latency by zone (ns):n");
printa(@);
trunc(@);
}
Save timestamp
on enqueue;
calculate delta
on dequeue
29. CPU latency: zonedispqlat.d
• Instead of zonename, this could be process name, ...
• Tracing scheduler enqueue/dequeue events and
saving timestamps costs CPU overhead
• they are frequent
• I’d prefer to only trace dequeue, and reuse the
existing microstate accounting timestamps
• but one problem is a clash between unscaled and
scaled timestamps
30. CPU latency: on the Cloud
• With virtualization, you can have:
high CPU latency with idle CPUs
due to an instance consuming their quota
• OS virtualization
• not visible in vmstat r
• is visible as part of prstat -mL’s LAT
• more kstats recently added to SmartOS
including nsec_waitrq (total run queue wait by zone)
• Hardware virtualization
• vmstat st (stolen)
31. CPU latency: caps
• CPU cap latency from the host (zonecapslat.d):
#!/usr/sbin/dtrace -s
#pragma D option quiet
sched:::cpucaps-sleep
{
start[args[0]->pr_lwpid, args[1]->pr_pid] = timestamp;
}
sched:::cpucaps-wakeup
/this->start = start[args[0]->pr_lwpid, args[1]->pr_pid]/
{
this->time = timestamp - this->start;
/* workaround since zonename isn't a member of args[1]... */
this->zone = ((proc_t *)args[1]->pr_addr)->p_zone->zone_name;
@[stringof(this->zone)] = quantize(this->time);
start[args[0]->pr_lwpid, args[1]->pr_pid] = 0;
}
tick-1sec
{
printf("CPU caps latency by zone (ns):n");
printa(@);
trunc(@);
}
32. CPU latency: summary
• Partial data available
• New tools/metrics created
• although current DTrace solutions have overhead;
we should be able to improve that
• although, new kstats may be sufficient
34. Memory: problem
• Riak database has endless memory growth.
• expected 9GB, after two days:
$ prstat -c 1
Please wait...
PID USERNAME SIZE
RSS STATE
21722 103
43G
40G cpu0
15770 root
7760K 540K sleep
95 root
0K
0K sleep
12827 root
128M
73M sleep
10319 bgregg
10M 6788K sleep
10402 root
22M 288K sleep
[...]
PRI NICE
59
0
57
0
99 -20
100
59
0
59
0
TIME
72:23:41
23:28:57
7:37:47
0:49:36
0:00:00
0:18:45
CPU
2.6%
0.9%
0.2%
0.1%
0.0%
0.0%
PROCESS/NLWP
beam.smp/594
zoneadmd/5
zpool-zones/166
node/5
sshd/1
dtrace/1
• Eventually hits paging and terrible performance
• needing a restart
• Is this a memory leak?
Or application growth?
35. Memory: scope
• Identify the subsystem and team responsible
Subsystem
Team
Application
Voxer
Riak
Basho
Erlang
Ericsson
SmartOS
Joyent
36. Memory: heap profiling
• What is in the heap?
$ pmap 14719
14719:
beam.smp
0000000000400000
000000000062D000
000000000067F000
00000001005C0000
00000002005BE000
0000000300382000
00000004002E2000
00000004FFFD3000
00000005FFF91000
00000006FFF4C000
00000007FF9EF000
[...]
2168K
328K
4193540K
4194296K
4192016K
4193664K
4191172K
4194040K
4194028K
4188812K
588224K
r-x-rw--rw--rw--rw--rw--rw--rw--rw--rw--rw---
/opt/riak/erts-5.8.5/bin/beam.smp
/opt/riak/erts-5.8.5/bin/beam.smp
/opt/riak/erts-5.8.5/bin/beam.smp
[ anon ]
[ anon ]
[ anon ]
[ anon ]
[ anon ]
[ anon ]
[ anon ]
[ heap ]
• ... and why does it keep growing?
• Would like to answer these in production
• Without restarting apps. Experimentation (backend=mmap,
other allocators) wasn’t working.
37. Memory: heap profiling
• libumem was used for multi-threaded performance
• libumem == user-land slab allocator
• detailed observability can be enabled, allowing heap
profiling and leak detection
• While designed with speed and production use in
mind, it still comes with some cost (time and space),
and aren’t on by default.
• UMEM_DEBUG=audit
42. Memory: heap growth
• Tracing why the heap grows via brk():
# dtrace -n 'syscall::brk:entry /execname == "beam.smp"/ { ustack(); }'
dtrace: description 'syscall::brk:entry ' matched 1 probe
CPU
ID
FUNCTION:NAME
10
18
brk:entry
libc.so.1`_brk_unlocked+0xa
libumem.so.1`vmem_sbrk_alloc+0x84
libumem.so.1`vmem_xalloc+0x669
libumem.so.1`vmem_alloc+0x14f
libumem.so.1`vmem_xalloc+0x669
libumem.so.1`vmem_alloc+0x14f
libumem.so.1`umem_alloc+0x72
libumem.so.1`malloc+0x59
libstdc++.so.6.0.14`_Znwm+0x20
libstdc++.so.6.0.14`_Znam+0x9
eleveldb.so`_ZN7leveldb9ReadBlockEPNS_16RandomAccessFileERKNS_11Rea...
eleveldb.so`_ZN7leveldb5Table11BlockReaderEPvRKNS_11ReadOptionsERKN...
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIterator13InitDataBl...
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIterator4SeekERKNS_5...
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIterator4SeekERKNS_5...
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115MergingIterator4SeekERKNS_5S...
eleveldb.so`_ZN7leveldb12_GLOBAL__N_16DBIter4SeekERKNS_5SliceE+0xcc
eleveldb.so`eleveldb_get+0xd3
beam.smp`process_main+0x6939
beam.smp`sched_thread_func+0x1cf
beam.smp`thr_wrapper+0xbe
This shows
the user-land
stack trace
for every
heap growth
43. Memory: heap growth
• More DTrace showed the size of the malloc()s
causing the brk()s:
# dtrace -x dynvarsize=4m -n '
pid$target::malloc:entry { self->size = arg0; }
syscall::brk:entry /self->size/ { printf("%d bytes", self->size); }
pid$target::malloc:return { self->size = 0; }' -p 17472
dtrace:
CPU
0
0
description 'pid$target::malloc:entry ' matched 7 probes
ID
FUNCTION:NAME
44
brk:entry 8343520 bytes
44
brk:entry 8343520 bytes
[...]
• These 8 Mbyte malloc()s grew the heap
• Even though the heap has Gbytes not in use
• This is starting to look like an OS issue
44. Memory: allocator internals
• More tools were created:
• Show memory entropy (+ malloc - free)
along with heap growth, over time
• Show codepath taken for allocations
compare successful with unsuccessful (heap growth)
• Show allocator internals: sizes, options, flags
• And run in the production environment
• Briefly; tracing frequent allocs does cost overhead
• Casting light into what was a black box
46. Memory: solution
• These new tools and metrics pointed to the
allocation algorithm “instant fit”
• Someone had suggested this earlier; the tools provided solid evidence that this
really was the case here
• A new version of libumem was built to force use of
VM_BESTFIT
• and added by Robert Mustacchi as a tunable:
UMEM_OPTIONS=allocator=best
• Customer restarted Riak with new libumem version
• Problem solved
47. Memory: on the Cloud
• With OS virtualization, you can have:
Paging without scanning
• paging == swapping blocks with physical storage
• swapping == swapping entire threads between main
memory and physical storage
• Resource control paging is unrelated to the page
scanner, so, no vmstat scan rate (sr) despite
anonymous paging
• More new tools: DTrace sysinfo:::anonpgin by
process name, zonename
48. Memory: summary
• Superficial data available, detailed info not
• not by default
• Many new tools were created
• not easy, but made possible with DTrace
51. Disk: on the Cloud
• Tenants can’t see each other
• Maybe a neighbor is doing a backup?
• Maybe a neighbor is running a benchmark?
• Can’t see their processes (top/prstat)
• Blame what you can’t see
52. Disk: VFS
• Applications usually talk to a file system
• and are hurt by file system latency
• Disk I/O can be:
• unrelated to the application: asynchronous tasks
• inflated from what the application requested
• deflated
“ “
• blind to issues caused higher up the kernel stack
53. Disk: issues with iostat(1)
• Unrelated:
• other applications / tenants
• file system prefetch
• file system dirty data fushing
• Inflated:
• rounded up to the next file system record size
• extra metadata for on-disk format
• read-modify-write of RAID5
54. Disk: issues with iostat(1)
• Deflated:
• read caching
• write buffering
• Blind:
• lock contention in the file system
• CPU usage by the file system
• file system software bugs
• file system queue latency
55. Disk: issues with iostat(1)
• blind (continued):
• disk cache flush latency (if your file system does it)
• file system I/O throttling latency
• I/O throttling is a new ZFS feature for cloud
environments
• adds artificial latency to file system I/O to throttle it
• added by Bill Pijewski and Jerry Jelenik of Joyent
58. Disk: file system latency
• Tracing zfs events using zfsslower.d:
# ./zfsslower.d 10
TIME
2011 May 17 01:23:12
2011 May 17 01:23:13
2011 May 17 01:23:33
2011 May 17 01:23:33
2011 May 17 01:23:51
^C
PROCESS
mysqld
mysqld
mysqld
mysqld
httpd
D
R
W
W
W
R
KB
16
16
16
16
56
ms
19
10
11
10
14
FILE
/z01/opt/mysql5-64/data/xxxxx/xxxxx.ibd
/z01/var/mysql/xxxxx/xxxxx.ibd
/z01/var/mysql/xxxxx/xxxxx.ibd
/z01/var/mysql/xxxxx/xxxxx.ibd
/z01/home/xxxxx/xxxxx/xxxxx/xxxxx/xxxxx
• Argument is the minimum latency in milliseconds
59. Disk: file system latency
• Can trace this from other locations too:
• VFS layer: filter on desired file system types
• syscall layer: filter on file descriptors for file systems
• application layer: trace file I/O calls
60. Disk: file system latency
• And using SystemTap:
# ./vfsrlat.stp
Tracing... Hit Ctrl-C to end
^C
[..]
ext4 (ns):
value |-------------------------------------------------- count
256 |
0
512 |
0
1024 |
16
2048 |
17
4096 |
4
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 16321
16384 |
50
32768 |
1
65536 |
13
131072 |
0
262144 |
0
• Traces vfs.read to vfs.read.return, and gets the FS
type via: $file->f_path->dentry->d_inode->i_sb->s_type->name
• Warning: this script has crashed ubuntu/CentOS; I’m told RHEL is better
61. Disk: file system visualizations
• File system latency as a heat map (Cloud Analytics):
• This screenshot shows severe outliers
62. Disk: file system visualizations
• Sometimes the heat map is very surprising:
• This screenshot is from the Oracle ZFS Storage Appliance
65. Network: problem
• TCP SYNs queue in-kernel until they are accept()ed
• The queue length is the TCP listen backlog
• may be set in listen()
• and limited by a system tunable (usually 128)
• on SmartOS: tcp_conn_req_max_q
• What if the queue remains full
• eg, application is overwhelmed with other work,
• or CPU starved
• ... and another SYN arrives?
66. Network: TCP listen drops
• Packet is dropped by the kernel
• fortunately a counter is bumped:
$ netstat -s | grep Drop
tcpTimRetransDrop
=
56
tcpTimKeepaliveProbe= 1594
tcpListenDrop
=3089298
tcpHalfOpenDrop
=
0
icmpOutDrops
=
0
sctpTimRetrans
=
0
sctpTimHearBeatProbe=
0
sctpListenDrop
=
0
tcpTimKeepalive
tcpTimKeepaliveDrop
tcpListenDropQ0
tcpOutSackRetrans
icmpOutErrors
sctpTimRetransDrop
sctpTimHearBeatDrop
sctpInClosed
= 2582
=
41
=
0
=1400832
=
0
=
0
=
0
=
0
• Remote host waits, and then retransmits
• TCP retransmit interval; usually 1 or 3 seconds
70. Network: tcpconnreqmaxq.d
tcp_conn_req_cnt_q distributions:
cpid:3063
value
-1
0
1
max_q:8
------------- Distribution ------------- count
|
0
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
|
0
cpid:11504
value
-1
0
1
2
4
8
16
32
64
128
256
max_q:128
------------- Distribution ------------- count
|
0
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
7279
|@@
405
|@
255
|@
138
Length of queue
|
81
|
83
measured
|
62
on SYN event
|
67
|
34
|
0
tcpListenDrops:
cpid:11504
max_q:128
value
in
use
34
71. Network: tcplistendrop.d
• More details can be fetched as needed:
# ./tcplistendrop.d
TIME
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
2012 Jan 19 01:22:49
[...]
SRC-IP
10.17.210.103
10.17.210.108
10.17.210.116
10.17.210.117
10.17.210.112
10.17.210.106
10.12.143.16
10.17.210.100
10.17.210.99
10.17.210.98
10.17.210.101
PORT
25691
18423
38883
10739
27988
28824
65070
56392
24628
11686
34629
->
->
->
->
->
->
->
->
->
->
->
DST-IP
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212
192.192.240.212
PORT
80
80
80
80
80
80
80
80
80
80
80
• Just tracing the drop code-path
• Don’t need to pay the overhead of sniffing all packets
72. Network: DTrace code
• Key code from tcplistendrop.d:
fbt::tcp_input_listener:entry { self->mp = args[1]; }
fbt::tcp_input_listener:return { self->mp = 0; }
mib:::tcpListenDrop
/self->mp/
{
this->iph = (ipha_t *)self->mp->b_rptr;
this->tcph = (tcph_t *)(self->mp->b_rptr + 20);
printf("%-20Y %-18s %-5d -> %-18s %-5dn", walltimestamp,
inet_ntoa(&this->iph->ipha_src),
ntohs(*(uint16_t *)this->tcph->th_lport),
inet_ntoa(&this->iph->ipha_dst),
ntohs(*(uint16_t *)this->tcph->th_fport));
}
• This uses the unstable interface fbt provider
• a stable tcp provider now exists, which is better for
more common tasks - like connections by IP
73. Network: summary
• For TCP, while many counters are available, they are
system wide integers
• Custom tools can show more details
• addresses and ports
• kernel state
• needs kernel access and dynamic tracing
74. Data Recap
• Problem types
• CPU utilization
scaleability
• CPU usage
scaleability
• CPU latency
observability
• Memory
observability
• Disk
observability
• Network
observability
75. Data Recap
• Problem types, solution types
• CPU utilization scaleability
• CPU usage
scaleability
• CPU latency
observability
• Memory
observability
• Disk
observability
• Network
observability
visualizations
metrics
78. Performance Issues
• Strategy
• Step 1: is there a problem?
• Step 2: which subsystem/team is responsible?
• Difficult to get past these steps without reliable
metrics
79. Problem Space
• Myths
• Vendors provide good metrics with good coverage
• The problem is to line-graph them
• Realities
• Metrics can be wrong, incomplete and misleading,
requiring time and expertise to interpret
• Line graphs can hide issues
80. Problem Space
• Cloud computing confuses matters further:
• hiding metrics from neighbors
• throttling performance due to invisible neighbors
81. Example Problems
• Included:
• Understanding utilization across 5,312 CPUs
• Using disk I/O metrics to explain application
performance
• A lack of metrics for memory growth, packet drops, ...
82. Example Solutions: tools
• Device utilization heat maps for CPUs
• Flame graphs for CPU profiling
• CPU dispatcher queue latency by zone
• CPU caps latency by zone
• malloc() size profiling
• Heap growth stack backtraces
• File system latency distributions
• File system latency tracing
• TCP accept queue length distribution
• TCP listen drop tracing with details
83. Key Concepts
• Visualizations
• heat maps for device utilization and latency
• flame graphs
• Custom metrics often necessary
• Latency-based for issue analysis
• If coding isn’t practical/timely, use dynamic tracing
• Cloud Computing
• Provide observability (often to show what the problem isn’t)
• Develop new metrics for resource control effects
84. DTrace
• Many problems were only solved thanks to DTrace
• In the SmartOS cloud environment:
• The compute node (global zone) can DTrace
everything (except for KVM guests, for which it has a
limited view: resource I/O + some MMU events, so far)
• SmartMachines (zones) have the DTrace syscall,
profile (their user-land only), pid and USDT providers
• Joyent Cloud Analytics uses DTrace from the global
zone to give extended details to customers
85. Performance
• The more you know, the more you don’t
• Hopefully I’ve turned some unknown-unknowns
into known-unknowns
86. Thank you
• Resources:
• http://dtrace.org/blogs/brendan
• More CPU utilization visualizations:
http://dtrace.org/blogs/brendan/2011/12/18/visualizing-device-utilization/
• Flame Graphs: http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/
and http://github.com/brendangregg/FlameGraph
• More iostat(1) & file system latency discussion:
http://dtrace.org/blogs/brendan/tag/filesystem-2/
• Cloud Analytics:
• OSCON slides: http://dtrace.org/blogs/dap/files/2011/07/ca-oscon-data.pdf
• Joyent: http://joyent.com
• brendan@joyent.com