SlideShare a Scribd company logo
Golang observability
(in practice)
Eran Levy
@levyeran
https://medium.com/@levyeran
We're hiring!
Agenda
● Cloud native observability
● Logs
● Metrics
● Tracing
● Best practices
● Where do we go next
● Q?
WIFM?
● Know the available tools for observability
● How to get started?
● Best practices

Recommended for you

MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)

This presentation introduces the concept of monitoring - focusing on why and how and finally on the tools to use. It introduces Prometheus (metrics gathering, processing, alerting), application instrumentation and Prometheus exporters and finally it introduces Grafana as a common companion for dashboarding, alerting and notifications. This presentations also introduces the handson workshop - for which materials are available from https://github.com/lucasjellema/monitoring-workshop-prometheus-grafana

prometheusmonitoringgrafana
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus

In this session, we will start with the importance of monitoring of services and infrastructure. We will discuss about Prometheus an opensource monitoring tool. We will discuss the architecture of Prometheus. We will also discuss some visualization tools which can be used over Prometheus. Then we will have a quick demo for Prometheus and Grafana.

knoldusknolxknow a knolder
Terraform and Weave GitOps: Build a Fully Automated Application Stack
Terraform and Weave GitOps: Build a Fully Automated Application StackTerraform and Weave GitOps: Build a Fully Automated Application Stack
Terraform and Weave GitOps: Build a Fully Automated Application Stack

This document discusses using GitOps and the Weaveworks Terraform Controller to manage AWS Lambda functions on Kubernetes. Key points include: - Flux is used to bootstrap the Terraform Controller on Kubernetes which then reconciles any changes to the Terraform manifest stored in Git. - The Terraform manifest defines an AWS Lambda resource and references the Git repo, AWS credentials secret, and outputs secret. - AWS access keys are stored as a Kubernetes secret referenced by the Terraform configuration to provision the Lambda function.

terraformawsweave gitops
Microservices might be good for your business...
But understanding what's going on is
another story
(Image: Netflix)
Observability
“Observability”, according to this definition, is a superset
of “monitoring”, providing certain benefits and insights
that “monitoring” tools come a cropper at. - Cindy
Sridharan
“Observability”, on the other hand, aims to provide highly
granular insights into the behavior of systems
along with rich context, perfect for debugging
purposes. - Cindy Sridharan
Understanding the full-cycle of a given flow and gain insights while asking your questions along the way
(Twitter engineering blog)
Like!
Logs Metrics Traces
Lets drill-down...

Recommended for you

Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana

Prometheus is an open-source monitoring system that collects metrics from configured targets, stores time-series data, and allows users to query and visualize the data. It works by scraping metrics over HTTP from applications and servers, storing the data in its time-series database, and providing a UI and query language to analyze the data. Prometheus is useful for monitoring system metrics like CPU usage and memory as well as application metrics like HTTP requests and errors.

prometheusgrafanamonitoring
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdfOSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf

This document provides an overview of OpenTelemetry, including: - OpenTelemetry is an observability framework that assists in generating and capturing telemetry data from cloud-native software across traces, metrics, and logs. - It includes vendor-agnostic APIs, SDKs, and tools for generating, collecting, and exporting telemetry data to analysis tools. - OpenTelemetry has reached general availability for tracing and is in release candidate for metrics, with client libraries available for many popular programming languages.

osmcopen sourcemonitoring
GitOps - Operation By Pull Request
GitOps - Operation By Pull RequestGitOps - Operation By Pull Request
GitOps - Operation By Pull Request

Presentation given at Cloud Native Copenhagen, Cloud Native Aalborg, and Cloud Native Aarhus in December 2020

gitopscloudnative
Logs
● Search for a specific pattern in a given time-window or dig into application
specific logs
● Write logs to stdout/stderr and the k8s cluster shall take care of the shipping
to a central logging infrastructure
● Pick the right package for your need:
��� Built-in “log” package - not structured, not leveled, mostly for dev - std log with timestamp
○ Logrus - JSON format, structured, leveled, hooks (note hooks lock)
○ uber-go/zap - fast (benchmarks: https://github.com/uber-go/zap/tree/master/benchmarks),
structured, leveled - performance focused - string formatting, reflection and small allocations
are CPU-intensive
○ golang/glog - if performance and volume are highly important, you might consider this one -
didn’t get the chance to use
Demo
Logs - Best Practices
● Logs are expensive! String formatting and interface{} reflections are CPU
intensive
● Aim for logs standardization i.e. common fields, standard messages - it
should help in prod
● Prefer log actionable messages and avoid maintaining too many log levels i.e
warn
● Don’t manage logging concurrency - the packages already take care of that
● Hooks (i.e logrus) - use them wisely (mutex locks)
Another log aggregation approach - Loki by Grafana

Recommended for you

Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheus

Monitoring containerised apps creates a whole new set of challenges that traditional monitoring systems struggle with. In this talk, Brice Fernandes from Weaveworks will introduce and demo the open source Prometheus monitoring toolkit and its integration with Kubernetes. After this talk, you'll be able to use Prometheus to monitor your microservices on a Kubernetes cluster. We'll cover: - An introduction to Kubernetes to manage containers; - The monitoring maturity model; - An overview of whitebox and blackbox monitoring; - Monitoring with Prometheus; - Using PromQL (the Prometheus Query Language) to monitor your app in a dynamic system

cloudcloud computingweaveworks
Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021

In this training webinar, Samantha Wang will walk you through the basics of Telegraf. Telegraf is the open source server agent which is used to collect metrics from your stacks, sensors and systems. It is InfluxDB’s native data collector that supports nearly 300 inputs and outputs. Learn how to send data from a variety of systems, apps, databases and services in the appropriate format to InfluxDB. Discover tips and tricks on how to write your own plugins. The know-how learned here can be applied to a multitude of use cases and sectors. This one-hour session will include the training and time for live Q&A. Join this training as Samantha Wang dives into: Types of Telegraf plugins (i.e. input, output, aggregator and processor) Specific plugins including Execd input plugins and the Starlark processor plugin How to install and start using Telegraf

influxdbtime series databasetime series platform
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elk

Log Management Log Monitoring Log Analysis Need for Log Analysis Problem with Log Analysis Some of Log Management Tool What is ELK Stack ELK Stack Working Beats Different Types of Server Logs Example of Winlog beat, Packetbeat, Apache2 and Nginx Server log analysis Mimikatz Malicious File Detection using ELK Practical Setup Conclusion

nginx logapache2 loglog analysis
Metrics
● Metrics provide quantitative information about processes running inside the
system, including counters, gauges, and histograms (Opentelemetry)
● Measure business impact and user experience -
○ Add custom metrics
○ build dashboards
○ generate alerts
● “The four golden signals of monitoring are latency, traffic, errors, and
saturation.” (Google SRE)
● Modern metrics are stored in a time-series database - metric name and
key/value tags that create multi-dimensional space
grpc_io_server_server_latency_count{grpc_server_method="tokenizer.Tokenizer/GetTokens"} 7
(source: sysdig.com)
Integrate your metrics backend of choice
● Prefer using vendor neutral APIs such as Opencesus (soon Opentelemetry) to
dedicated stats backend clients (i.e. Prometheus go sdk)
● Metrics aren’t sampled - you would like to spot percentile latencies i.e. 99P
● Client libraries usually aggregate the collected metrics data in-process and
send to the backend server (prometheus, stackdriver, honeycomb, others)
● Standardize your KPIs to build meaningful dashboards
Opencensus service approach
Opentelemetry adopt that approach
● Agent vs Agentless
● Collector
● Demo docker-compose

Recommended for you

Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus

Presented at GDG Devfest Ukraine 2018. Prometheus has become the defacto monitoring system for cloud native applications, with systems like Kubernetes and Etcd natively exposing Prometheus metrics. In this talk Tom will explore all the moving part for a working Prometheus-on-Kubernetes monitoring system, including kube-state-metrics, node-exporter, cAdvisor and Grafana. You will learn about the various methods for getting to a working setup: the manual approach, using CoreOS’s Prometheus Operator, or using Prometheus Ksonnet Mixin. Tom will also share some little tips and tricks for getting the most out of your Prometheus monitoring, including the common pitfalls and what you should be alerting on.

kubernetesprometheusmonitoring
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics

Prometheus is an open-source monitoring system that collects metrics from configured targets, stores time series data, and allows users to query and alert on that data. It is designed for dynamic cloud environments and has built-in service discovery integration. Core features include simplicity, efficiency, a dimensional data model, the PromQL query language, and service discovery.

prometheusgrafanakubernetes
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki

Loki is an open source logging aggregation system that indexes the metadata of logs rather than the full contents. It consists of several microservices including the distributor, ingester, query frontend, and querier. The distributor routes logs to the ingesters which store the data in chunks in object storage. The querier handles log queries. Promtail is an agent that can be deployed to scrape logs from files and systemd on servers and ship them to Loki with labels for indexing. Compared to other logging solutions, Loki stores data more cost efficiently and is optimized for scaling.

devopsgrafanamonitoring
Before we move on - Opencensus terminology
● Measure - the metric type that we are going to record - latency ms unit
● Measurement - recorded data point - 5 ms
● Aggregation - count, sum, distribution
● Exporter - backend of choice exporter
● View - coupling of aggregation, measure and tags
Demo
Distributed Tracing
● Tracing, aka distributed tracing, provides insight into the full life-cycles, aka traces, of requests to
the system, allowing you to pinpoint failures and performance issues (Opentelemtry)
● Enables engineers to understand which services were participated in a given end-to-end trace
Go Observability (in practice)

Recommended for you

Grafana
GrafanaGrafana
Grafana

This document provides an overview of Grafana, an open source metrics dashboard and graph editor for Graphite, InfluxDB and OpenTSDB. It discusses Grafana's features such as rich graphing, time series querying, templated queries, annotations, dashboard search and export/import. The document also covers Grafana's history and alternatives. It positions Grafana as providing richer features than Graphite Web and highlights features like multiple y-axes, unit formats, mixing graph types, thresholds and tooltips.

grafana
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects

The monolith to cloud-native, microservices evolution has driven a shift from monitoring to observability. OpenTelemetry, a merger of the OpenTracing and OpenCensus projects, is enabling Observability 2.0. This talk gives an overview of the OpenTelemetry project and then outlines some production-proven architectures for improving the observability of your applications and systems.

observabilitycloud computing
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유

NDC18에서 발표하였습니다. 현재 보고 계신 슬라이드는 1부 입니다.(총 2부) - 1부 링크: https://goo.gl/3v4DAa - 2부 링크: https://goo.gl/wpoZpY (SlideShare에 슬라이드 300장 제한으로 2부로 나누어 올렸습니다. 불편하시더라도 양해 부탁드립니다.)

awsdatapipelineelasticmapreduce
Go Observability (in practice)
Integrate your tracing system of choice
● Prefer vendor neutral APIs such as Opentracing/Opencensus (soon
Opentelemetry) to dedicated tracing client
● Trace critical business operations and calls to other services (ServiceA -> DB)
● Context propagation is the key - use “context” to propagate traces where
possible
● Prefer sidecar agents instead of calling directly to backend services where
possible (i.e. zipkin receives request by its collector)
● Opencensus agent is an interesting approach that enables you gain better
flexibility (i.e. dynamically change the backend service)
● Large systems can produce large amount of traces - large traffic and resource
intensive - choose the right Sampling strategy
Remember Opencensus service approach?
Same goes here…
Jaeger agent
Jaeger got something bit similar but its jaeger oriented - you can obviously use
that as well but you won’t get all the benefits that OC can provide

Recommended for you

Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus

A general introduction and demo of the prometheus monitoring solution and ecosystem with a live demo, given at FLOSSUK 2018.

prometheusopen sourcemonitoring
Logs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK StackLogs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK Stack

This document summarizes a presentation about logs and metrics gathering with the OpenShift EFK stack. It introduces the OpenShift logging team and their objectives of collecting distributed logs in a common data model with security and scalability. It describes the main components of Fluendt for collection and normalization and Elasticsearch for storage. It provides examples of using the logging stack with OpenShift, OpenStack, and oVirt and advice for custom application logging.

loggingopenshiftkubernetes
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systems

Back in the days, you had a single machine and you could scroll down the single log file to figure out what is going on. In this Big Data world you need to combine a lot of logs together to figure out what is going on. Data is coming in huge volumes, with high speed so choosing important information and getting rid of noise becomes real challenge. There is a need for a centralized monitoring platform which will aid the engineers operating the systems, and serve the right information at the right time. This talk will try to help you understand all the challenges and you will get an idea which tools and technology stacks are good fit to successfully monitor Big Data systems. The focus will be on open source and free solutions. The problem can be separated in two domains which both are the subject of this talk: metrics stack to gather simple metrics on central place and log stack to aggregate logs from different machines to central place. We will finish up with a combined stack and ideas how it can be improved even further with alerting and automated failover scenarios.

monitoringdistributed systemscassandra
Demo
Best practices
- Standardization is a key - tags (tracing- i.e. Semantic Conventions), fields
(logs), metrics
- Enable engineers create alerts based on their metrics easily (i.e helm charts)
- Prefer sidecar agents instead of calling directly to backend services where
possible (agent vs agentless)
- Prefer vendor neutral APIs and instrumentation packages
- Choose tracer Sampling strategy - huge traffic, resource intensive
Where do we go next?
● Opentelemetry
(source: opentelemetry.io)
● Cloudevents.io
● Evolving architecture - Trace graph
● Use traces to spot problems that affect KPIs
Where do we go next?

Recommended for you

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)

Prometheus is a next-generation monitoring system with a time series database at it's core. Once you have a time series database, what do you do with it though? This talk will look at getting data in, and more importantly how to use the data you collect productively. Contact us at prometheus@robustperception.io

graphprometheusstatistics
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)

Brian Brazil is an engineer passionate about reliable systems. He has experience at Google SRE and Boxever. He is the founder of Robust Perception and a contributor to open source projects including Prometheus. Prometheus is a monitoring system designed for microservices that allows inclusive, scalable monitoring across languages and services. It uses labels, queries, and federation to provide powerful yet manageable monitoring of dynamic environments.

microservicesinclusive monitoringmonitoring
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...

This document discusses observability for modern applications. It begins by defining observability as the ability to observe what is happening inside a system. Observability helps measure key performance indicators and allows teams to react faster to issues. In cloud native environments, observability fits by instrumenting applications to capture logs, traces, metrics and health data which are then transmitted to analytics tools. The document outlines the different pillars of application instrumentation - logs to see what happened, traces to see how it happened, metrics to see how much happened, and health checks to see system status. It discusses OpenTelemetry as an open source observability framework to address prior vendor lock-in issues and competing standards.

agile gurugram
Questions?

More Related Content

What's hot

Continuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event KeynoteContinuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event Keynote
Weaveworks
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
OCoderFest
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
Paul Podolny
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
Lucas Jellema
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
Knoldus Inc.
 
Terraform and Weave GitOps: Build a Fully Automated Application Stack
Terraform and Weave GitOps: Build a Fully Automated Application StackTerraform and Weave GitOps: Build a Fully Automated Application Stack
Terraform and Weave GitOps: Build a Fully Automated Application Stack
Weaveworks
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
Lhouceine OUHAMZA
 
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdfOSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
NETWAYS
 
GitOps - Operation By Pull Request
GitOps - Operation By Pull RequestGitOps - Operation By Pull Request
GitOps - Operation By Pull Request
Kasper Nissen
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheus
Brice Fernandes
 
Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021
InfluxData
 
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elk
Rushika Shah
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
Grafana Labs
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
Juraj Hantak
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
Knoldus Inc.
 
Grafana
GrafanaGrafana
Grafana
NoelMc Grath
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
Kevin Brockhoff
 
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
Hyojun Jeon
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
Julien Pivotto
 
Logs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK StackLogs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK Stack
Josef Karásek
 

What's hot (20)

Continuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event KeynoteContinuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event Keynote
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Terraform and Weave GitOps: Build a Fully Automated Application Stack
Terraform and Weave GitOps: Build a Fully Automated Application StackTerraform and Weave GitOps: Build a Fully Automated Application Stack
Terraform and Weave GitOps: Build a Fully Automated Application Stack
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdfOSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
 
GitOps - Operation By Pull Request
GitOps - Operation By Pull RequestGitOps - Operation By Pull Request
GitOps - Operation By Pull Request
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheus
 
Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021
 
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elk
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
Grafana
GrafanaGrafana
Grafana
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Logs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK StackLogs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK Stack
 

Similar to Go Observability (in practice)

Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systems
Nenad Bozic
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Brian Brazil
 
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)
Brian Brazil
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
AgileNetwork
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptx
OpsTree solutions
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
distributedtracing
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
Itiel Shwartz
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
AppDynamics
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
Brian Brazil
 
I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekend
Nicolas Carlier
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
The differing ways to monitor and instrument
The differing ways to monitor and instrumentThe differing ways to monitor and instrument
The differing ways to monitor and instrument
Jonah Kowall
 
Sql server lesson12
Sql server lesson12Sql server lesson12
Sql server lesson12
Ala Qunaibi
 
Sql server lesson12
Sql server lesson12Sql server lesson12
Sql server lesson12
Ala Qunaibi
 
How to apply machine learning into your CI/CD pipeline
How to apply machine learning into your CI/CD pipelineHow to apply machine learning into your CI/CD pipeline
How to apply machine learning into your CI/CD pipeline
Alon Weiss
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
Robert Kubiś
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
Brian Brazil
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep Learning
Jorge Cardoso
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Brian Brazil
 

Similar to Go Observability (in practice) (20)

Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systems
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptx
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekend
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
The differing ways to monitor and instrument
The differing ways to monitor and instrumentThe differing ways to monitor and instrument
The differing ways to monitor and instrument
 
Sql server lesson12
Sql server lesson12Sql server lesson12
Sql server lesson12
 
Sql server lesson12
Sql server lesson12Sql server lesson12
Sql server lesson12
 
How to apply machine learning into your CI/CD pipeline
How to apply machine learning into your CI/CD pipelineHow to apply machine learning into your CI/CD pipeline
How to apply machine learning into your CI/CD pipeline
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep Learning
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 

Recently uploaded

Conservation of Taksar through Economic Regeneration
Conservation of Taksar through Economic RegenerationConservation of Taksar through Economic Regeneration
Conservation of Taksar through Economic Regeneration
PriyankaKarn3
 
Net Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK EmpireNet Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK Empire
Global Network for Zero
 
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.docCCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
Dss
 
IS Code SP 23: Handbook on concrete mixes
IS Code SP 23: Handbook  on concrete mixesIS Code SP 23: Handbook  on concrete mixes
IS Code SP 23: Handbook on concrete mixes
Mani Krishna Sarkar
 
Phone Us ❤ X000XX000X ❤ #ℂall #gIRLS In Chennai By Chenai @ℂall @Girls Hotel ...
Phone Us ❤ X000XX000X ❤ #ℂall #gIRLS In Chennai By Chenai @ℂall @Girls Hotel ...Phone Us ❤ X000XX000X ❤ #ℂall #gIRLS In Chennai By Chenai @ℂall @Girls Hotel ...
Phone Us ❤ X000XX000X ❤ #ℂall #gIRLS In Chennai By Chenai @ℂall @Girls Hotel ...
Miss Khusi #V08
 
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
YanKing2
 
Vernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsxVernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsx
Tool and Die Tech
 
21CV61- Module 3 (CONSTRUCTION MANAGEMENT AND ENTREPRENEURSHIP.pptx
21CV61- Module 3 (CONSTRUCTION MANAGEMENT AND ENTREPRENEURSHIP.pptx21CV61- Module 3 (CONSTRUCTION MANAGEMENT AND ENTREPRENEURSHIP.pptx
21CV61- Module 3 (CONSTRUCTION MANAGEMENT AND ENTREPRENEURSHIP.pptx
sanabts249
 
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdfOCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
Muanisa Waras
 
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and PreventionUnderstanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Bert Blevins
 
Chlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptxChlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptx
yadavsuyash008
 
Lecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdfLecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdf
peacekipu
 
How to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POSHow to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POS
Celine George
 
Response & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITHResponse & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITH
IIIT Hyderabad
 
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
Mani Krishna Sarkar
 
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeRohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
binna singh$A17
 
Biology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtuBiology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtu
santoshpatilrao33
 
Rotary Intersection in traffic engineering.pptx
Rotary Intersection in traffic engineering.pptxRotary Intersection in traffic engineering.pptx
Rotary Intersection in traffic engineering.pptx
surekha1287
 
Lecture 3 Biomass energy...............ppt
Lecture 3 Biomass energy...............pptLecture 3 Biomass energy...............ppt
Lecture 3 Biomass energy...............ppt
RujanTimsina1
 
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdfGUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
ProexportColombia1
 

Recently uploaded (20)

Conservation of Taksar through Economic Regeneration
Conservation of Taksar through Economic RegenerationConservation of Taksar through Economic Regeneration
Conservation of Taksar through Economic Regeneration
 
Net Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK EmpireNet Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK Empire
 
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.docCCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
 
IS Code SP 23: Handbook on concrete mixes
IS Code SP 23: Handbook  on concrete mixesIS Code SP 23: Handbook  on concrete mixes
IS Code SP 23: Handbook on concrete mixes
 
Phone Us ❤ X000XX000X ❤ #ℂall #gIRLS In Chennai By Chenai @ℂall @Girls Hotel ...
Phone Us ❤ X000XX000X ❤ #ℂall #gIRLS In Chennai By Chenai @ℂall @Girls Hotel ...Phone Us ❤ X000XX000X ❤ #ℂall #gIRLS In Chennai By Chenai @ℂall @Girls Hotel ...
Phone Us ❤ X000XX000X ❤ #ℂall #gIRLS In Chennai By Chenai @ℂall @Girls Hotel ...
 
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
 
Vernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsxVernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsx
 
21CV61- Module 3 (CONSTRUCTION MANAGEMENT AND ENTREPRENEURSHIP.pptx
21CV61- Module 3 (CONSTRUCTION MANAGEMENT AND ENTREPRENEURSHIP.pptx21CV61- Module 3 (CONSTRUCTION MANAGEMENT AND ENTREPRENEURSHIP.pptx
21CV61- Module 3 (CONSTRUCTION MANAGEMENT AND ENTREPRENEURSHIP.pptx
 
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdfOCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
OCS Training - Rig Equipment Inspection - Advanced 5 Days_IADC.pdf
 
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and PreventionUnderstanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
 
Chlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptxChlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptx
 
Lecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdfLecture 6 - The effect of Corona effect in Power systems.pdf
Lecture 6 - The effect of Corona effect in Power systems.pdf
 
How to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POSHow to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POS
 
Response & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITHResponse & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITH
 
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
 
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeRohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
 
Biology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtuBiology for computer science BBOC407 vtu
Biology for computer science BBOC407 vtu
 
Rotary Intersection in traffic engineering.pptx
Rotary Intersection in traffic engineering.pptxRotary Intersection in traffic engineering.pptx
Rotary Intersection in traffic engineering.pptx
 
Lecture 3 Biomass energy...............ppt
Lecture 3 Biomass energy...............pptLecture 3 Biomass energy...............ppt
Lecture 3 Biomass energy...............ppt
 
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdfGUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
 

Go Observability (in practice)

  • 1. Golang observability (in practice) Eran Levy @levyeran https://medium.com/@levyeran
  • 3. Agenda ● Cloud native observability ● Logs ● Metrics ● Tracing ● Best practices ● Where do we go next ● Q?
  • 4. WIFM? ● Know the available tools for observability ● How to get started? ● Best practices
  • 5. Microservices might be good for your business... But understanding what's going on is another story (Image: Netflix)
  • 6. Observability “Observability”, according to this definition, is a superset of “monitoring”, providing certain benefits and insights that “monitoring” tools come a cropper at. - Cindy Sridharan “Observability”, on the other hand, aims to provide highly granular insights into the behavior of systems along with rich context, perfect for debugging purposes. - Cindy Sridharan Understanding the full-cycle of a given flow and gain insights while asking your questions along the way (Twitter engineering blog)
  • 8. Logs Metrics Traces Lets drill-down...
  • 9. Logs ● Search for a specific pattern in a given time-window or dig into application specific logs ● Write logs to stdout/stderr and the k8s cluster shall take care of the shipping to a central logging infrastructure ● Pick the right package for your need: ○ Built-in “log” package - not structured, not leveled, mostly for dev - std log with timestamp ○ Logrus - JSON format, structured, leveled, hooks (note hooks lock) ○ uber-go/zap - fast (benchmarks: https://github.com/uber-go/zap/tree/master/benchmarks), structured, leveled - performance focused - string formatting, reflection and small allocations are CPU-intensive ○ golang/glog - if performance and volume are highly important, you might consider this one - didn’t get the chance to use
  • 10. Demo
  • 11. Logs - Best Practices ● Logs are expensive! String formatting and interface{} reflections are CPU intensive ● Aim for logs standardization i.e. common fields, standard messages - it should help in prod ● Prefer log actionable messages and avoid maintaining too many log levels i.e warn ● Don’t manage logging concurrency - the packages already take care of that ● Hooks (i.e logrus) - use them wisely (mutex locks)
  • 12. Another log aggregation approach - Loki by Grafana
  • 13. Metrics ● Metrics provide quantitative information about processes running inside the system, including counters, gauges, and histograms (Opentelemetry) ● Measure business impact and user experience - ○ Add custom metrics ○ build dashboards ○ generate alerts ● “The four golden signals of monitoring are latency, traffic, errors, and saturation.” (Google SRE) ● Modern metrics are stored in a time-series database - metric name and key/value tags that create multi-dimensional space grpc_io_server_server_latency_count{grpc_server_method="tokenizer.Tokenizer/GetTokens"} 7
  • 15. Integrate your metrics backend of choice ● Prefer using vendor neutral APIs such as Opencesus (soon Opentelemetry) to dedicated stats backend clients (i.e. Prometheus go sdk) ● Metrics aren’t sampled - you would like to spot percentile latencies i.e. 99P ● Client libraries usually aggregate the collected metrics data in-process and send to the backend server (prometheus, stackdriver, honeycomb, others) ● Standardize your KPIs to build meaningful dashboards
  • 16. Opencensus service approach Opentelemetry adopt that approach ● Agent vs Agentless ● Collector ● Demo docker-compose
  • 17. Before we move on - Opencensus terminology ● Measure - the metric type that we are going to record - latency ms unit ● Measurement - recorded data point - 5 ms ● Aggregation - count, sum, distribution ● Exporter - backend of choice exporter ● View - coupling of aggregation, measure and tags
  • 18. Demo
  • 19. Distributed Tracing ● Tracing, aka distributed tracing, provides insight into the full life-cycles, aka traces, of requests to the system, allowing you to pinpoint failures and performance issues (Opentelemtry) ● Enables engineers to understand which services were participated in a given end-to-end trace
  • 22. Integrate your tracing system of choice ● Prefer vendor neutral APIs such as Opentracing/Opencensus (soon Opentelemetry) to dedicated tracing client ● Trace critical business operations and calls to other services (ServiceA -> DB) ● Context propagation is the key - use “context” to propagate traces where possible ● Prefer sidecar agents instead of calling directly to backend services where possible (i.e. zipkin receives request by its collector) ● Opencensus agent is an interesting approach that enables you gain better flexibility (i.e. dynamically change the backend service) ● Large systems can produce large amount of traces - large traffic and resource intensive - choose the right Sampling strategy
  • 23. Remember Opencensus service approach? Same goes here…
  • 24. Jaeger agent Jaeger got something bit similar but its jaeger oriented - you can obviously use that as well but you won’t get all the benefits that OC can provide
  • 25. Demo
  • 26. Best practices - Standardization is a key - tags (tracing- i.e. Semantic Conventions), fields (logs), metrics - Enable engineers create alerts based on their metrics easily (i.e helm charts) - Prefer sidecar agents instead of calling directly to backend services where possible (agent vs agentless) - Prefer vendor neutral APIs and instrumentation packages - Choose tracer Sampling strategy - huge traffic, resource intensive
  • 27. Where do we go next? ● Opentelemetry (source: opentelemetry.io)
  • 28. ● Cloudevents.io ● Evolving architecture - Trace graph ● Use traces to spot problems that affect KPIs Where do we go next?