SlideShare a Scribd company logo
Brought to you by
Three Perspectives on
Measuring Latency
Geoffrey Beausire
SRE at Criteo
Measuring Latency
■ Focusing on measuring latency for monitoring
■ Service Level Indicators (SLI) and Objectives (SLO)
● https://sre.google/sre-book/service-level-objectives/
■ You can also check out How to Measure Latency by Heinrich Hartmann
● https://www.p99conf.io/session/how-to-measure-latency/
A Practical Example
What we do:
● AdTech
● Auction based (Real-time bidding)
● Highest bidder gets to display an ad
● Participate in 10M auctions per second
Challenges:
● Need a lot of data to bid the right price
● Low response time (<100ms P99)

Recommended for you

Continuous Performance Testing
Continuous Performance TestingContinuous Performance Testing
Continuous Performance Testing

Applying the power of Continuous Delivery to performance testing. Process, techniques, best practices. This talk describes a pragmatic approach to building a robust performance testing strategy.

continuous deliveryperformancetesting
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional Management

Ricardo Jiménez-Peris, PhD, researcher and founder of LeanXcale, explains how his latest invention allows linear scale of relational databases.

big datadatabaseiot
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)

This document discusses building resilient predictive data pipelines. It begins by distinguishing between ETL and predictive data pipelines, noting that predictive pipelines require high availability with downtimes of less than an hour. The document then outlines design goals for resilient data pipelines, including being scalable, available, instrumented/monitored/alert-enabled, and quickly recoverable. It proposes using AWS services like SQS, SNS, S3, and Auto Scaling Groups to build such pipelines. The document also recommends using Apache Airflow for workflow automation and scheduling to reliably manage pipelines as directed acyclic graphs. It presents an architecture using these techniques and assesses how well it meets the outlined design goals.

airflowcloud computingdata pipelines
What’s a Key/value store:
■ Allow to retrieve a value with a key
■ Trade features (scan/relational) for scalability and performance
■ Commonly used for cache
At Criteo:
■ 217 clusters spread in 7 data centers
■ 250M QPS
■ SLO: 99% of request <1ms
■ 1 trillions records
■ Powered by Memcached and Aerospike
■ 3500 servers (Bare metal Kubernetes)
Criteo’s Key/Value Store Infrastructure
The Challenges
Key/Value Store (KVS) - Challenges
■ On the critical path of every request
■ Used by many teams and many applications
■ Many servers = more incidents (hardware, network, …)
Key/Value Store (KVS) - Monitoring goals
■ Alerted when there is an issue on our service
● Avoid false positives or non actionable alerts
■ Troubleshoot root causes quickly and efficiently (client, server, in-between)
■ Expose SLI that our users can trust
■ Help users resolve issues autonomously

Recommended for you

Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...

Chronix is a time series database designed specifically for anomaly detection in operational data. It offers several advantages over general purpose time series databases: 1) Chronix uses domain specific optimizations like optional timestamp compression, custom data records, and compression techniques tailored to the repetitive patterns in operational data. 2) It provides a programming interface to pre-compute representations of time series data and add domain-specific columns to speed up anomaly detection queries. 3) Chronix supports exploratory and correlating analyses through its multi-dimensional storage and ability to query on any combination of attributes. It also offers high-level domain-specific analysis functions evaluated server-side.

time seriesbig datadatabase
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computing

Create-Net is a research center that offers cloud computing research, consulting, training, and webinars. This webinar discusses monitoring in the cloud computing era, beginning with introductions to Ceilometer and Monasca. Ceilometer is OpenStack's metering framework that collects data from OpenStack services through agents and notifications. It stores data in a database and provides an API. Monasca is a monitoring as a service platform that processes metrics and events at scale through microservices and stores data for querying and visualization. The webinar concludes with a discussion of trends in cloud monitoring.

ceilometermonascaopenstack
Get the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost downGet the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost down

Amazon Redshift offers many powerful features. Yet, there are many instances where customers encounter sloppy performance and cost upheavals beyond control. Scaling AWS Redshift clusters to meet the increasing compute and reporting needs, while ensuring optimal cost, performance and security standards is quite a challenge for many organizations. This webinar covered the following, • Understand key design/architectural considerations of AWS Redshift • Tips & Tricks to optimize Cost & Performance • How Agilisium helped clients reduce AWS Redshift run cost up to 40% Presented by: Jay Palaniappan - CTO & Head of Innovation Labs || Smitha Basavaraju - Big Data Architect || Arun Chinnadurai - Associate Director – BD

#costoptimization#performanceoptimization#datawarehousing
Key/Value Store (KVS) - Monitoring goals
■ Alerted when there is an issue on our service
● Avoid false positives or non actionable alerts
■ Troubleshoot root causes quickly and efficiently (client, server, in-between)
■ Expose SLI that our users can trust
■ Help users resolve issues autonomously
To achieve this we need high quality latency measurements!
Where to Measure Latency?
The Key/Value Store Infrastructure
KVS Cluster
Client
KVS Cluster
KVS Cluster
Client
Server Side Monitoring
KVS Cluster
Client
KVS Cluster
KVS Cluster
Client

Recommended for you

High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive Microservices

High-speed reactive microservices (HSRM) are microservices that are in-memory, non-blocking, own their data through leasing, and use streams and batching. They provide advantages like lower costs, ability to handle more traffic with fewer resources, and cohesive codebases. The example service described handles 30k recommendations/second on a single thread through batching, streaming, and data faulting. The document discusses attributes of HSRM like single writer rules and service stores, and related concepts like reactive programming, streams, and service sharding.

reactiveqbitcloud
In Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging service

The document discusses in-flux limiting for a multi-tenant logging service. It describes Symantec's logging and metrics architecture using Kafka, Elasticsearch, and InfluxDB. It addresses the issue of ingestion spikes overwhelming InfluxDB and presents a solution to normalize event rates using buffers that allocate ingestion quotas per tenant. The design implements rate limiting using a scheduled task pattern in Storm to track each tenant's event rate over a configurable window and throttle events if the threshold is exceeded.

hadoop summit
Performance eng prakash.sahu
Performance eng prakash.sahuPerformance eng prakash.sahu
Performance eng prakash.sahu

This document discusses performance engineering for batch and web applications. It begins by outlining why performance testing is important. Key factors that influence performance testing include response time, throughput, tuning, and benchmarking. Throughput represents the number of transactions processed in a given time period and should increase linearly with load. Response time is the duration between a request and first response. Tuning improves performance by configuring parameters without changing code. The performance testing process involves test planning, creating test scripts, executing tests, monitoring tests, and analyzing results. Methods for analyzing heap dumps and thread dumps to identify bottlenecks are also provided. The document concludes with tips for optimizing PostgreSQL performance by adjusting the shared_buffers configuration parameter.

performancen+1wily introscope
Client
Node
Server
Client
Node
Server
Latency measurements
Client
Node
Storage
Server
Latency measurements
Node
Storage
Software
Node
Storage
Server
Client
Node
Storage
Server
Node
Storage
Server

Recommended for you

Rate limits and all about
Rate limits and all aboutRate limits and all about
Rate limits and all about

The presentation explains how to setup rate limits, how to work with 429 code, how rate limits are implemented in kubernetes, cni, loadbalancer and so on

ratelimitsmicroservicessystemdesign
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance

The objective of the engagement is for Citi to have an understanding and path forward to monitor their Confluent Platform and - Platform Monitoring - Maintenance and Upgrade

Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong

Integrating content delivery networks into your application infrastructure can offer many benefits, including major performance improvements for your applications. So understanding how CDNs perform — especially for your specific use cases — is vital. However, testing for measurement is complicated and nuanced, and results in metric overload and confusion. It's becoming increasingly important to understand measurement techniques, what they're telling you, and how to apply them to your actual content. In this session, we'll examine the challenges around measuring CDN performance and focus on the different methods for measurement. We'll discuss what to measure, important metrics to focus on, and different ways that numbers may mislead you. More specifically, we'll cover: Different techniques for measuring CDN performance Differentiating between network footprint and object delivery performance Choosing the right content to test Core metrics to focus on and how each impacts real traffic Understanding cache hit ratio, why it can be misleading, and how to measure for it

future of cdnscdnanalytics
Node
Storage
Software
Node
Storage
Server
Client
Node
Storage
Server
Node
Storage
Server
Node
Storage
Software
Node
Storage
Server
Client
Node
Storage
Server
Node
Storage
Server
Node
Storage
Software
Node
Storage
Server
Client
Node
Storage
Server
Node
Storage
Server
Latency measurements
Client
Service A
Service B
Service C

Recommended for you

Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce

We’ll present details about Argus, a time-series monitoring and alerting platform developed at Salesforce to provide insight into the health of infrastructure as an alternative to systems such as Graphite and Seyren.

salesforceargushbase
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce

Tom Valine and Bhinav Sura (Salesforce) We’ll present details about Argus, a time-series monitoring and alerting platform developed at Salesforce to provide insight into the health of infrastructure as an alternative to systems such as Graphite and Seyren.

argushbasehbasecon
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...

BlazingMQ is a new open source* distributed message queuing system developed at and published by Bloomberg. It provides highly-performant queues to applications for asynchronous, efficient, and reliable communication. This system has been used at scale at Bloomberg for eight years, where it moves terabytes of data and billions of messages across tens of thousands of queues in production every day. BlazingMQ provides highly-available, fault-tolerant queues courtesy of replication based on the Raft consensus algorithm. In addition, it provides a rich set of enterprise message routing strategies, enabling users to implement a variety of scenarios for message processing. Written in C++ from the ground up, BlazingMQ has been architected with low latency as one of its core requirements. This has resulted in some unique design and implementation choices at all levels of the system, such as its lock-free threading model, custom memory allocators, compact wire protocol, multi-hop network topology, and more. This talk will provide an overview of BlazingMQ. We will then delve into the system’s core design principles, architecture, and implementation details in order to explore the crucial role they play in its performance and reliability. *BlazingMQ will be released as open source between now and P99 (exact timing is still TBD)

Client
Service A
Service B
Service C
Latency measurements
Server Side Monitoring - Summary
■ Paramount to understand the behavior of the system aka why it is slow
■ Service time should be measured as close as possible from the client
■ Troubleshooting performance is easier knowing the latency of dependencies:
● Latency with storage
● Latency with other servers
■ Metrics need to be high precision: average is not enough, at least p99!
Server Side - Downsides
■ Doesn’t take into account what is between the user and the server (network,
load balancer, API gateways, kernel, etc..)
■ Might not exist or be limited/low resolution
■ Might be impacted directly by user workloads (heavy queries)
This can make it hard to depend on them for SLx (and alerting)
Client Server

Recommended for you

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...

At Knewton we operate across five different VPCs a total of 29 clusters, each ranging from 3 nodes to 24 nodes. For a team of three to maintain this is not herculean, however good tools to diagnose issues and gather information in a distributed manner are vital to moving quickly and minimizing engineering time spent. The database team at Knewton has been successfully using a combination of Ansible and custom open sourced tools to maintain and improve the Cassandra deployment at Knewton. I will be talking about several of these tools and giving examples of how we are using them. Specifically I will discuss the cassandra-tracing tool, which analyzes the contents of the system_traces keyspace, and the cassandra-stat tool, which gives real-time output of the operations of a cassandra cluster. Distributed administration with ad-hoc Ansible will also be covered and I will walk through examples of using these commands to identify and remediate clusterwide issues. About the Speaker Jeffrey Berger Lead Database Engineer, Knewton Dr. Jeffrey Berger is currently the lead database engineer at Knewton, an education tech startup in NYC. He joined the tech scene in NYC in 2013 and spent two years working with MongoDB, becoming a certified MongoDB administrator and a MongoDB Master. He received his Cassandra Administrator certification at Cassandra Summit 2015. He holds a Ph.D. in Theoretical Physics from Penn State and spent several years working on high energy nuclear interactions.

eventc*real-time output
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way

This document discusses Cassandra drivers and how to optimize queries. It begins with an introduction to Cassandra drivers and examples of basic usage in Java, Python and Ruby. It then covers the differences between synchronous and asynchronous queries. Prepared statements and consistency levels are also discussed. The document explores how consistency levels, driver policies and node outages impact performance and latency. Hinted handoff is described as a performance optimization that stores hints for missed writes on down nodes. Lastly, it provides best practices around driver usage.

big dataapache cassandranosql
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...

In this presentation, we explore how standard profiling and monitoring methods may fall short in identifying bottlenecks in low-latency data ingestion workflows. Instead, we showcase the power of simple yet clever methods that can uncover hidden performance limitations. Attendees will discover unconventional techniques, including clever logging, targeted instrumentation, and specialized metrics, to pinpoint bottlenecks accurately. Real-world use cases will be presented to demonstrate the effectiveness of these methods. By the end of the session, attendees will be equipped with alternative approaches to identify bottlenecks and optimize their low-latency data ingestion workflows for high throughput.

Client Server
Hardware/
Virtualisation/
Kernel
Client Server
Hardware/
Virtualisation/
Kernel
Network
Client Server
Hardware/
Virtualisation/
Kernel
Network
Load Balancer/
API gateway
Client side monitoring
KVS Cluster
Client
KVS Cluster
KVS Cluster
Client

Recommended for you

Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems

Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states. In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing. Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.

Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter

Widya Salim and Victor Ma will outline the causal impact analysis, framework, and key learnings used to quantify the impact of reducing Twitter's network latency.

Noise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, AkamaiNoise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, Akamai

Noisy Real User Monitoring (RUM) data can ruin your P99! We introduce a fresh concept called ""Human Visible Navigations"" (HVN) to tackle this risk; we focus on the experiences you actually care about when talking about the speed of our sites: - Human: We exclude noise coming from bots and synthetic measurements. - Visible: We remove any partial or fully hidden experiences. These tend to be very slow but users don’t see this slowness. - Navigations: We ignore lightning fast back-forward navigations which usually have few optimisation opportunities. Adopting Human Visible Navigations provides you with these key benefits: - Fewer changes staying below the radar - Fewer data fluctuations - Fewer blindspots when finding bottlenecks - Better correlation with business metrics This is supported by plenty of real world examples coming from the world's largest scale modeling site (6M Monthly visits) in combination with aggregated data from the brand new rumarchive.com (open source) After attending this session; your P99 and other percentiles will become less noisy and easier to tune!

Client Side Monitoring - Summary
Advantages:
■ End to end view of the service: actual response time
Downsides:
■ Might require standardized clients
■ Precision requires a lot of metrics (histograms)
■ Impacted by other factors (latency will increase if the application is starved
for CPU, GC pauses…)
How Can We Improve?
Blackbox Monitoring
KVS Cluster
Client
KVS Cluster
KVS Cluster
Client
Probe (fake client)
Blackbox Monitoring
Goals of the probe:
■ Simulate the behavior of clients
■ Repeatable and stable over time
■ Highly isolated (avoid interference)

Recommended for you

Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts

Understanding the impacts of running a containerized Go application inside Kubernetes with a focus on the CPU.

Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...

In this session, Tanel introduces a new open source eBPF tool for efficiently sampling both on-CPU events and off-CPU events for every thread (task) in the OS. Linux standard performance tools (like perf) allow you to easily profile on-CPU threads doing work, but if we want to include the off-CPU timing and reasons for the full picture, things get complicated. Combining eBPF task state arrays with periodic sampling for profiling allows us to get both a system-level overview of where threads spend their time, even when blocked and sleeping, and allow us to drill down into individual thread level, to understand why.

Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts

Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works, what doesn’t, and what we need to improve. In this session, Tammy revisits old assumptions about performance budgets and offers some new best practices. Topics include: • Understanding performance budgets vs. performance goals • Aligning budgets with user experience • Pros and cons of Core Web Vitals • How to stay on top of your budgets to fight regressions

■ Select a number of operations (they should reflect the behavior of users)
● Write (Create/Update)
● Read
● Delete
■ Schedule these operations at repeated interval on every cluster
■ Add a lot of observability (high resolution histograms, logs, tracing, …)
■ Optimize for latency (not throughput or cpu usage)
Implementation
Implementation You can also measure:
■ Availability: can be derived from
latency operations
■ Durability: check that no data
was lost
■ Cross datacenter replication
■ …
■ Select a number of operations (they should reflect the behavior of users)
● Write (Create/Update)
● Read
● Delete
■ Schedule these operations at repeated interval on every cluster
■ Add a lot of observability (high resolution histograms, logs, tracing, …)
■ Optimize for latency (not throughput or cpu usage)
Implementation You can also measure:
■ Availability: can be derived from
latency operations
■ Durability: check that no data
was lost
■ Cross datacenter replication
■ …
■ Select a number of operations (they should reflect the behavior of users)
● Write (Create/Update)
● Read
● Delete
■ Schedule these operations at repeated interval on every cluster
■ Add a lot of observability (high resolution histograms, logs, tracing, …)
■ Optimize for latency (not throughput or cpu usage)
The resulting metrics can be used as Service Level Indicators (SLI)!
In Practice
Internally blackbox monitoring is used actively on many tech:
■ Key/Value stores: Memcached, Aerospike
■ Kafka
■ Metrics (Graphite, Prometheus, VictoriaMetrics)
■ Logs (ELK)
■ Other databases: Ceph, ScyllaDB, Elasticsearch
We open-sourced:
https://github.com/criteo/blackbox-prober

Recommended for you

Using Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance TroublesUsing Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance Troubles

Trying to figure out why your application is responding late can be difficult, especially if it is because of interference from the operating system. This talk will briefly go over how to write a C program that can analyze what in the Linux system is interfering with your application. It will use trace-cmd to enable kernel trace events as well as tracing lock functions, and it will then go over a quick tutorial on how to use libtracecmd to read the created trace.dat file to uncover what is the cause of interference to you application.

Reducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGCReducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGC

With the low-latency garbage collector ZGC, GC pause times are no longer a big problem in Java. With sub-millisecond pause times there are instead other things in the GC and JVM that can cause application threads to experience unexpected latencies. This talk will dig into a specific use where the GC pauses are no longer the cause of unexpected latencies and look at how adding generations to ZGC help lower the p99 application latencies.

5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X

Linters are a type of database! They are a collection of lint rules — queries that look for rule violations to report — plus a way to execute those queries over a source code dataset. This is a case study about using database ideas to build a linter that looks for breaking changes in Rust library APIs. Maintainability and performance are key: new Rust releases tend to have mutually-incompatible ways of representing API information, and we cannot afford to reimplement and optimize dozens of rules for each Rust version separately. Fortunately, databases don't require rewriting queries when the underlying storage format or query plan changes! This allows us to ship massive optimizations and support multiple Rust versions without making any changes to the queries that describe lint rules. Ship now, optimize later"" can be a sustainable development practice after all — join us to see how!

Blackbox Monitoring - Summary
Advantages:
■ End to end view of the service
■ Reliable over time
■ Precise
■ Easy to create
Downsides:
■ Artificial
Conclusion
■ Each approaches have trade-offs:
● Server side: precise but lacks the big picture
● Client side: end-to-end but noisy
● Blackbox: end-to-end and reliable but synthetic
■ Measure everything!
■ Use all of them
Brought to you by
Geoffrey Beausire
g.beausire@criteo.com
@geobeau
is hiring: careers.criteo.com

More Related Content

Similar to Three Perspectives on Measuring Latency

Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 Adventures in Observability: How in-house ClickHouse deployment enabled Inst... Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Altinity Ltd
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
Lucidworks
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Amazon Web Services
 
Continuous Performance Testing
Continuous Performance TestingContinuous Performance Testing
Continuous Performance Testing
Mark Price
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional Management
Ricardo Jimenez-Peris
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
Sid Anand
 
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Florian Lautenschlager
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computing
CREATE-NET
 
Get the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost downGet the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost down
Agilisium Consulting
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive Microservices
Rick Hightower
 
In Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging service
DataWorks Summit/Hadoop Summit
 
Performance eng prakash.sahu
Performance eng prakash.sahuPerformance eng prakash.sahu
Performance eng prakash.sahu
Dr. Prakash Sahu
 
Rate limits and all about
Rate limits and all aboutRate limits and all about
Rate limits and all about
Alexander Tokarev
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
confluent
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
Fastly
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
ScyllaDB
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
DataStax Academy
 

Similar to Three Perspectives on Measuring Latency (20)

Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 Adventures in Observability: How in-house ClickHouse deployment enabled Inst... Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
Continuous Performance Testing
Continuous Performance TestingContinuous Performance Testing
Continuous Performance Testing
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional Management
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
 
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computing
 
Get the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost downGet the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost down
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive Microservices
 
In Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging service
 
Performance eng prakash.sahu
Performance eng prakash.sahuPerformance eng prakash.sahu
Performance eng prakash.sahu
 
Rate limits and all about
Rate limits and all aboutRate limits and all about
Rate limits and all about
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 

More from ScyllaDB

Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
ScyllaDB
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 
Noise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, AkamaiNoise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, Akamai
ScyllaDB
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
ScyllaDB
 
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
ScyllaDB
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
ScyllaDB
 
Using Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance TroublesUsing Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance Troubles
ScyllaDB
 
Reducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGCReducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGC
ScyllaDB
 
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
ScyllaDB
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
ScyllaDB
 
Conquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB DriversConquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB Drivers
ScyllaDB
 
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric
ScyllaDB
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
ScyllaDB
 
99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz
ScyllaDB
 
Square's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with RaftSquare's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with Raft
ScyllaDB
 
Making Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of RustMaking Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of Rust
ScyllaDB
 
A Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus AlbuquerqueA Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus Albuquerque
ScyllaDB
 
The Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of LatencyThe Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of Latency
ScyllaDB
 
eBPF vs Sidecars by Liz Rice at Isovalent
eBPF vs Sidecars by Liz Rice at IsovalenteBPF vs Sidecars by Liz Rice at Isovalent
eBPF vs Sidecars by Liz Rice at Isovalent
ScyllaDB
 

More from ScyllaDB (20)

Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 
Noise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, AkamaiNoise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, Akamai
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
 
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
 
Using Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance TroublesUsing Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance Troubles
 
Reducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGCReducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGC
 
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
 
Conquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB DriversConquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB Drivers
 
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
 
99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz
 
Square's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with RaftSquare's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with Raft
 
Making Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of RustMaking Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of Rust
 
A Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus AlbuquerqueA Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus Albuquerque
 
The Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of LatencyThe Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of Latency
 
eBPF vs Sidecars by Liz Rice at Isovalent
eBPF vs Sidecars by Liz Rice at IsovalenteBPF vs Sidecars by Liz Rice at Isovalent
eBPF vs Sidecars by Liz Rice at Isovalent
 

Recently uploaded

WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Toru Tamaki
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 

Recently uploaded (20)

WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 

Three Perspectives on Measuring Latency

  • 1. Brought to you by Three Perspectives on Measuring Latency Geoffrey Beausire SRE at Criteo
  • 2. Measuring Latency ■ Focusing on measuring latency for monitoring ■ Service Level Indicators (SLI) and Objectives (SLO) ● https://sre.google/sre-book/service-level-objectives/ ■ You can also check out How to Measure Latency by Heinrich Hartmann ● https://www.p99conf.io/session/how-to-measure-latency/
  • 4. What we do: ● AdTech ● Auction based (Real-time bidding) ● Highest bidder gets to display an ad ● Participate in 10M auctions per second Challenges: ● Need a lot of data to bid the right price ● Low response time (<100ms P99)
  • 5. What’s a Key/value store: ■ Allow to retrieve a value with a key ■ Trade features (scan/relational) for scalability and performance ■ Commonly used for cache At Criteo: ■ 217 clusters spread in 7 data centers ■ 250M QPS ■ SLO: 99% of request <1ms ■ 1 trillions records ■ Powered by Memcached and Aerospike ■ 3500 servers (Bare metal Kubernetes) Criteo’s Key/Value Store Infrastructure
  • 7. Key/Value Store (KVS) - Challenges ■ On the critical path of every request ■ Used by many teams and many applications ■ Many servers = more incidents (hardware, network, …)
  • 8. Key/Value Store (KVS) - Monitoring goals ■ Alerted when there is an issue on our service ● Avoid false positives or non actionable alerts ■ Troubleshoot root causes quickly and efficiently (client, server, in-between) ■ Expose SLI that our users can trust ■ Help users resolve issues autonomously
  • 9. Key/Value Store (KVS) - Monitoring goals ■ Alerted when there is an issue on our service ● Avoid false positives or non actionable alerts ■ Troubleshoot root causes quickly and efficiently (client, server, in-between) ■ Expose SLI that our users can trust ■ Help users resolve issues autonomously To achieve this we need high quality latency measurements!
  • 10. Where to Measure Latency?
  • 11. The Key/Value Store Infrastructure KVS Cluster Client KVS Cluster KVS Cluster Client
  • 12. Server Side Monitoring KVS Cluster Client KVS Cluster KVS Cluster Client
  • 21. Client Service A Service B Service C Latency measurements
  • 22. Server Side Monitoring - Summary ■ Paramount to understand the behavior of the system aka why it is slow ■ Service time should be measured as close as possible from the client ■ Troubleshooting performance is easier knowing the latency of dependencies: ● Latency with storage ● Latency with other servers ■ Metrics need to be high precision: average is not enough, at least p99!
  • 23. Server Side - Downsides ■ Doesn’t take into account what is between the user and the server (network, load balancer, API gateways, kernel, etc..) ■ Might not exist or be limited/low resolution ■ Might be impacted directly by user workloads (heavy queries) This can make it hard to depend on them for SLx (and alerting)
  • 28. Client side monitoring KVS Cluster Client KVS Cluster KVS Cluster Client
  • 29. Client Side Monitoring - Summary Advantages: ■ End to end view of the service: actual response time Downsides: ■ Might require standardized clients ■ Precision requires a lot of metrics (histograms) ■ Impacted by other factors (latency will increase if the application is starved for CPU, GC pauses…)
  • 30. How Can We Improve?
  • 31. Blackbox Monitoring KVS Cluster Client KVS Cluster KVS Cluster Client Probe (fake client)
  • 32. Blackbox Monitoring Goals of the probe: ■ Simulate the behavior of clients ■ Repeatable and stable over time ■ Highly isolated (avoid interference)
  • 33. ■ Select a number of operations (they should reflect the behavior of users) ● Write (Create/Update) ● Read ● Delete ■ Schedule these operations at repeated interval on every cluster ■ Add a lot of observability (high resolution histograms, logs, tracing, …) ■ Optimize for latency (not throughput or cpu usage) Implementation
  • 34. Implementation You can also measure: ■ Availability: can be derived from latency operations ■ Durability: check that no data was lost ■ Cross datacenter replication ■ … ■ Select a number of operations (they should reflect the behavior of users) ● Write (Create/Update) ● Read ● Delete ■ Schedule these operations at repeated interval on every cluster ■ Add a lot of observability (high resolution histograms, logs, tracing, …) ■ Optimize for latency (not throughput or cpu usage)
  • 35. Implementation You can also measure: ■ Availability: can be derived from latency operations ■ Durability: check that no data was lost ■ Cross datacenter replication ■ … ■ Select a number of operations (they should reflect the behavior of users) ● Write (Create/Update) ● Read ● Delete ■ Schedule these operations at repeated interval on every cluster ■ Add a lot of observability (high resolution histograms, logs, tracing, …) ■ Optimize for latency (not throughput or cpu usage) The resulting metrics can be used as Service Level Indicators (SLI)!
  • 36. In Practice Internally blackbox monitoring is used actively on many tech: ■ Key/Value stores: Memcached, Aerospike ■ Kafka ■ Metrics (Graphite, Prometheus, VictoriaMetrics) ■ Logs (ELK) ■ Other databases: Ceph, ScyllaDB, Elasticsearch We open-sourced: https://github.com/criteo/blackbox-prober
  • 37. Blackbox Monitoring - Summary Advantages: ■ End to end view of the service ■ Reliable over time ■ Precise ■ Easy to create Downsides: ■ Artificial
  • 38. Conclusion ■ Each approaches have trade-offs: ● Server side: precise but lacks the big picture ● Client side: end-to-end but noisy ● Blackbox: end-to-end and reliable but synthetic ■ Measure everything! ■ Use all of them
  • 39. Brought to you by Geoffrey Beausire g.beausire@criteo.com @geobeau is hiring: careers.criteo.com