OpenTelemetry For Architects

OpenTelemetry For
Architects
Presented by Kevin Brockhoff
Apache 2.0 Licensed

Our
Agenda
● Where are current observability patterns
falling short?
● Who is OpenTelemetry and why should I
care?
● What are some recommended
OpenTelemetry deployment
architectures?
● How can I use OpenTelemetry to
incrementally improve telemetry
collection in applications?

Level
Setting
● Have you used ELK stack or other log
aggregator?
● Have you used an APM system?
● Have you used distributed tracing
before?
● Have you used OpenCensus?
● Have you used OpenTracing?

Who am I?
● Kevin Brockhoff - Senior
Consultant, Daugherty Business
Solutions
○ Solving difficult cloud adoption
challenges for Daugherty's
Fortune 500 clients
○ OpenTelemetry committer since
early stages of the project
○ Github:
https://github.com/kbrockhoff
○ Linkedin:
https://www.linkedin.com/in/kevi
n-brockhoff-a557877/

6
Why observability?
● Microservices create complex interactions.
● Failures don't exactly repeat.
● Debugging multi-tenancy is painful.
● Monitoring no longer can help us.
Cynefin Framework
Complex

8
Metrics Concepts
● Gauges
○ Instantaneous point-in-time value (e.g.
CPU utilization)
● Cumulative counters
○ Cumulative sums of data since process
start (e.g. request counts)
● Cumulative histogram
○ Grouped counters for a range of buckets
(e.g. 0-10ms, 11-20ms)
● Rates
○ The derivative of a counter, typically. (e.g.
requests per second)

9
Basic Observability Metrics Methods
● USE - Utilization, Saturation, and Errors
○ Resource-scoped
● RED - Rate, Errors, and Duration
○ Request-scoped

10
Tracing Concepts
● Span
○ Represents a single unit of work in a
system.
● Trace
○ Defined implicitly by its spans. A trace
can be thought of as a directed acyclic
graph of spans where the edges
between spans are defined as
parent/child relationships.
● Distributed Context
○ Contains the tracing identifiers, tags, and
options that are propagated from parent
to child spans.

11
Observability 1.0 Limitations
● Data ends up in 3 different datastores.
● Different types of data not correlated with each other.
● Observability is not necessarily insight.

12
Operational Complexity Growth
2010 2020
Circuit Breaker Homegrown w/ 3 configs Resilience4J w/ 14 configs
Retries End user clicks submit again Resilience4J w/ 7 configs
Health Check HTTP server and DB are live Kubernetes liveness,
readiness, and startup probes
with 5 timing configs per probe
Alerts Unread count on circuit
breaker opened email folder
???

14
Observability 2.0 - PoC
● Deep Linking Metrics and Traces with OpenTelemetry, OpenMetrics and
M3 - Rob Skillington (Presentation @ KubeCon North America 2019)
○ Click on point in metrics graph to get representative traces
○ Click on trace span to get system metrics from server that produced the span
○ Click on trace span to get all application logs emitted during span

15
OpenTelemetry Project
Sandbox Project

OpenCensus + OpenTracing = OpenTelemetry
● OpenCensus:
○ Provides APIs and instrumentation that allow you to collect application metrics and
distributed tracing.
○ Provides oc-service and oc-agent middleware.
● OpenTracing:
○ Provides APIs for distributed tracing with implementations provided by tracing backend
vendors.
● OpenTelemetry:
○ An effort to combine distributed tracing, metrics and logging into a single set of system
components and language-specific libraries.

17
OpenTelemetry Project
● Specification
○ API (for application developers)
○ SDK Implementations
○ Transport Protocol (Protobuf - gRPC)
● Collector (middleware)
● SDK’s (various stages of maturity)
○ C++
○ C# (Auto-instrument/Manual)
○ Erlang
○ Go
○ JavaScript (Browser/Node)
○ Java (Auto-instrument/Manual)
■ Android compatibility
○ PHP
○ Python (Auto-instrument/Manual)
○ Ruby
○ Rust
○ Swift

Open Source Observability Platforms Supported

20
W3C Distributed Tracing Working Group
● Trace Context – Level 1 -
Recommendation
● Propagation format for distributed trace
context: Baggage (rec-track)
● Trace Context: AMQP protocol (rec-
track)
● Trace Context: MQTT protocol (rec-
track)
● Trace Response Headers (rec-track)
● Trace Context Protocols Registry –
Group Note
● Trace Context: binary protocol (rec-
track)
● Trace Interchange Format (rec-track)
● Trace State Ids Registry (note)

21
Trace Context HTTP Headers
traceparent: 00-0af7651916cd43dd8448eb211c80319c-00f067aa0ba902b7-01
tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE
version trace-id (128 bit) parent-id (64 bit) trace-flags (8 bit)
vendor-specific key/value pairs
Baggage: userId=sergey,serverNode=DF:28,isProduction=false
Draft Baggage header specification

Kubernetes Deployment - Proof of Concept
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
metrics:
receivers: [otlp, prometheus]
exporters: [otlp]
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter]
exporters: [prometheus]

Kubernetes Deployment - External Backends
service:
pipelines:
traces:
receivers: [otlp, zipkin]
exporters: [otlp]
metrics:
receivers: [otlp, prometheus]
exporters: [otlp]
service:
pipelines:
traces:
receivers: [otlp]
exporters: [commercial...]
metrics:
receivers: [otlp]

Kubernetes Deployment - Service Mesh
service:
pipelines:
traces:
receivers: [zipkin]
exporters: [otlp]
metrics:
receivers: [statsd, prometheus]
exporters: [otlp]
service:
pipelines:
traces:
receivers: [otlp]
metrics:
receivers: [otlp]

Application Server on VM Deployment
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp]
metrics:
receivers: [statsd, otlp]
exporters: [otlp]
service:
pipelines:
traces:
receivers: [otlp]
metrics:
receivers: [otlp]

29
Greenfield Project Evolution
● Proof of Concept Demos
○ Sample App w/auto-instrumentation & direct exporters -> Jaeger & Prometheus
● Initial Development
○ Application libraries w/manual instrumentation -> In-memory and/or logging exporter
● Deployments during Development
○ Application w/SDK -> Collector (OTLP receiver) -> Cloud platform native monitoring
● Production
○ Applications w/SDK on hybrid cloud -> Collector (OTLP receiver) -> Latest and greatest
enterprise-wide observability platform

30
Already Instrumented Applications
● OpenCensus
○ Application -> Collector (OpenCensus receiver) -> Backend
● OpenTracing
○ Application w/OT + OpenTracing shim + SDK -> Collector (OTLP receiver) -> Backend
● Spring Boot
○ Application w/Micrometer -> Collector (Prometheus receiver) -> Backend
○ Application w/Spring Cloud Sleuth -> Collector (Zipkin receiver) -> Backend
● AWS
○ Application w/X-Ray SDK -> Collector (X-Ray receiver) -> Backend(s)

31
Non-instrumented Applications
● Java
○ Launch with OpenTelemetry Java Agent (support for 61 widely-used frameworks and
libraries)
● Javascript/Typescript
○ Add handlers/wrappers at key places or Node auto-instrumentation
● Microservice in any language
○ Deploy Envoy proxy as sidecar
● Infrastructure
○ Move to public cloud. AWS, Azure, GCP are all incorporating the OpenTelemety collector
in their infrastructure

OpenTelemetry For Architects

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to OpenTelemetry For Architects

Similar to OpenTelemetry For Architects (20)

Recently uploaded

Recently uploaded (20)

OpenTelemetry For Architects

Editor's Notes