OpenTelemetry For
Presented by Kevin Brockhoff
Apache 2.0 Licensed
● Where are current observability patterns
falling short?
● Who is OpenTelemetry and why should I
● What are some recommended
OpenTelemetry deployment
● How can I use OpenTelemetry to
incrementally improve telemetry
collection in applications?
● Have you used ELK stack or other log
● Have you used an APM system?
● Have you used distributed tracing
● Have you used OpenCensus?
● Have you used OpenTracing?
Who am I?
● Kevin Brockhoff - Senior
Consultant, Daugherty Business
○ Solving difficult cloud adoption
challenges for Daugherty's
Fortune 500 clients
○ OpenTelemetry committer since
early stages of the project
○ Github:
○ Linkedin:

Observability 2.0
Why observability?
● Microservices create complex interactions.
● Failures don't exactly repeat.
● Debugging multi-tenancy is painful.
● Monitoring no longer can help us.
Cynefin Framework
Observability 1.0
Metrics Concepts
● Gauges
○ Instantaneous point-in-time value (e.g.
CPU utilization)
● Cumulative counters
○ Cumulative sums of data since process
start (e.g. request counts)
● Cumulative histogram
○ Grouped counters for a range of buckets
(e.g. 0-10ms, 11-20ms)
● Rates
○ The derivative of a counter, typically. (e.g.
requests per second)

Basic Observability Metrics Methods
● USE - Utilization, Saturation, and Errors
○ Resource-scoped
● RED - Rate, Errors, and Duration
○ Request-scoped
Tracing Concepts
● Span
○ Represents a single unit of work in a
● Trace
○ Defined implicitly by its spans. A trace
can be thought of as a directed acyclic
graph of spans where the edges
between spans are defined as
parent/child relationships.
● Distributed Context
○ Contains the tracing identifiers, tags, and
options that are propagated from parent
to child spans.
Observability 1.0 Limitations
● Data ends up in 3 different datastores.
● Different types of data not correlated with each other.
● Observability is not necessarily insight.
Operational Complexity Growth
2010 2020
Circuit Breaker Homegrown w/ 3 configs Resilience4J w/ 14 configs
Retries End user clicks submit again Resilience4J w/ 7 configs
Health Check HTTP server and DB are live Kubernetes liveness,
readiness, and startup probes
with 5 timing configs per probe
Alerts Unread count on circuit
breaker opened email folder

From Observability 1.0 to 2.0
Observability 2.0 - PoC
● Deep Linking Metrics and Traces with OpenTelemetry, OpenMetrics and
M3 - Rob Skillington (Presentation @ KubeCon North America 2019)
○ Click on point in metrics graph to get representative traces
○ Click on trace span to get system metrics from server that produced the span
○ Click on trace span to get all application logs emitted during span
OpenTelemetry Project
Sandbox Project
OpenCensus + OpenTracing = OpenTelemetry
● OpenCensus:
○ Provides APIs and instrumentation that allow you to collect application metrics and
distributed tracing.
○ Provides oc-service and oc-agent middleware.
● OpenTracing:
○ Provides APIs for distributed tracing with implementations provided by tracing backend
● OpenTelemetry:
○ An effort to combine distributed tracing, metrics and logging into a single set of system
components and language-specific libraries.

OpenTelemetry Project
● Specification
○ API (for application developers)
○ SDK Implementations
○ Transport Protocol (Protobuf - gRPC)
● Collector (middleware)
● SDK’s (various stages of maturity)
○ C++
○ C# (Auto-instrument/Manual)
○ Erlang
○ Go
○ JavaScript (Browser/Node)
○ Java (Auto-instrument/Manual)
■ Android compatibility
○ Python (Auto-instrument/Manual)
○ Ruby
○ Rust
○ Swift
Open Source Observability Platforms Supported
W3C Distributed Tracing Working Group
● Trace Context – Level 1 -
● Propagation format for distributed trace
context: Baggage (rec-track)
● Trace Context: AMQP protocol (rec-
● Trace Context: MQTT protocol (rec-
● Trace Response Headers (rec-track)
● Trace Context Protocols Registry –
Group Note
● Trace Context: binary protocol (rec-
● Trace Interchange Format (rec-track)
● Trace State Ids Registry (note)

Trace Context HTTP Headers
traceparent: 00-0af7651916cd43dd8448eb211c80319c-00f067aa0ba902b7-01
tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE
version trace-id (128 bit) parent-id (64 bit) trace-flags (8 bit)
vendor-specific key/value pairs
Baggage: userId=sergey,serverNode=DF:28,isProduction=false
Draft Baggage header specification
Deployment Architectures
Kubernetes Deployment - Proof of Concept
receivers: [otlp]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
receivers: [otlp, prometheus]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [jaeger]
receivers: [otlp]
processors: [memory_limiter]
exporters: [prometheus]

Kubernetes Deployment - External Backends
receivers: [otlp, zipkin]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
receivers: [otlp, prometheus]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
Kubernetes Deployment - Service Mesh
receivers: [zipkin]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
receivers: [statsd, prometheus]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
Application Server on VM Deployment
receivers: [otlp]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
receivers: [statsd, otlp]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
Instrumentation Strategies

Greenfield Project Evolution
● Proof of Concept Demos
○ Sample App w/auto-instrumentation & direct exporters -> Jaeger & Prometheus
● Initial Development
○ Application libraries w/manual instrumentation -> In-memory and/or logging exporter
● Deployments during Development
○ Application w/SDK -> Collector (OTLP receiver) -> Cloud platform native monitoring
● Production
○ Applications w/SDK on hybrid cloud -> Collector (OTLP receiver) -> Latest and greatest
enterprise-wide observability platform
Already Instrumented Applications
● OpenCensus
○ Application -> Collector (OpenCensus receiver) -> Backend
● OpenTracing
○ Application w/OT + OpenTracing shim + SDK -> Collector (OTLP receiver) -> Backend
● Spring Boot
○ Application w/Micrometer -> Collector (Prometheus receiver) -> Backend
○ Application w/Spring Cloud Sleuth -> Collector (Zipkin receiver) -> Backend
○ Application w/X-Ray SDK -> Collector (X-Ray receiver) -> Backend(s)
Non-instrumented Applications
● Java
○ Launch with OpenTelemetry Java Agent (support for 61 widely-used frameworks and
● Javascript/Typescript
○ Add handlers/wrappers at key places or Node auto-instrumentation
● Microservice in any language
○ Deploy Envoy proxy as sidecar
● Infrastructure
○ Move to public cloud. AWS, Azure, GCP are all incorporating the OpenTelemety collector
in their infrastructure
Thank you!

OpenTelemetry For Architects

  OpenTelemetry For Architects
Presented by Kevin Brockhoff
Apache 2.0 Licensed
  Our Agenda
● Where are current observability patterns falling short?
● Who is OpenTelemetry and why should I care?
● What are some recommended OpenTelemetry deployment architectures?
● How can I use OpenTelemetry to incrementally improve telemetry collection in applications?
  Level Setting
● Have you used ELK stack or other log aggregator?
● Have you used an APM system?
● Have you used distributed tracing before?
● Have you used OpenCensus?
● Have you used OpenTracing?
  Who am I?
● Kevin Brockhoff - Senior Consultant, Daugherty Business Solutions
○ Solving difficult cloud adoption challenges for Daugherty's Fortune 500 clients
○ OpenTelemetry committer since early stages of the project
○ Github:
○ Linkedin: n-brockhoff-a557877/
  Observability 2.0
  Why observability?
● Microservices create complex interactions.
● Failures don't exactly repeat.
● Debugging multi-tenancy is painful.
● Monitoring no longer can help us.
Cynefin Framework
Complex
  Observability 1.0
  Metrics Concepts
● Gauges
○ Instantaneous point-in-time value (e.g. CPU utilization)
● Cumulative counters
○ Cumulative sums of data since process start (e.g. request counts)
● Cumulative histogram
○ Grouped counters for a range of buckets (e.g. 0-10ms, 11-20ms)
● Rates
○ The derivative of a counter, typically. (e.g. requests per second)
  Basic Observability Metrics Methods
● USE - Utilization, Saturation, and Errors
○ Resource-scoped
● RED - Rate, Errors, and Duration
○ Request-scoped
  Tracing Concepts
● Span
○ Represents a single unit of work in a system.
● Trace
○ Defined implicitly by its spans. A trace can be thought of as a directed acyclic graph of spans where the edges between spans are defined as parent/child relationships.
● Distributed Context
○ Contains the tracing identifiers, tags, and options that are propagated from parent to child spans.
  Observability 1.0 Limitations
● Data ends up in 3 different datastores.
● Different types of data not correlated with each other.
● Observability is not necessarily insight.
  Operational Complexity Growth
2010 2020
Circuit Breaker Homegrown w/ 3 configs Resilience4J w/ 14 configs
Retries End user clicks submit again Resilience4J w/ 7 configs
Health Check HTTP server and DB are live Kubernetes liveness, readiness, and startup probes with 5 timing configs per probe
Alerts Unread count on circuit breaker opened email folder ???
  From Observability 1.0 to 2.0
  • 18. Open Source Observability Platforms Supported
  OpenTelemetry Project
Sandbox Project
  OpenCensus + OpenTracing = OpenTelemetry
● OpenCensus:
○ Provides APIs and instrumentation that allow you to collect application metrics and distributed tracing.
○ Provides oc-service and oc-agent middleware.
● OpenTracing:
○ Provides APIs for distributed tracing with implementations provided by tracing backend vendors.
● OpenTelemetry:
○ An effort to combine distributed tracing, metrics and logging into a single set of system components and language-specific libraries.
  OpenTelemetry Project
● Specification
○ API (for application developers)
○ SDK Implementations
○ Transport Protocol (Protobuf - gRPC)
● Collector (middleware)
● SDK's (various stages of maturity)
○ C++
○ C# (Auto-instrument/Manual)
○ Erlang
○ Go
○ JavaScript (Browser/Node)
○ Java (Auto-instrument/Manual)
■ Android compatibility
○ PHP
○ Python (Auto-instrument/Manual)
○ Ruby
○ Rust
○ Swift
  Open Source Observability Platforms Supported
  W3C Distributed Tracing Working Group
  W3C Distributed Tracing Working Group
● Trace Context – Level 1 - Recommendation
● Propagation format for distributed trace context: Baggage (rec-track)
● Trace Context: AMQP protocol (rec-track)
● Trace Context: MQTT protocol (rec-track)
● Trace Response Headers (rec-track)
● Trace Context Protocols Registry – Group Note
● Trace Context: binary protocol (rec-track)
● Trace Interchange Format (rec-track)
● Trace State Ids Registry (note)
  Trace Context HTTP Headers
traceparent: 00-0af7651916cd43dd8448eb211c80319c-00f067aa0ba902b7-01
tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE
version trace-id (128 bit) parent-id (64 bit) trace-flags (8 bit)
vendor-specific key/value pairs
Baggage: userId=sergey,serverNode=DF:28,isProduction=false
Draft Baggage header specification
  Deployment Architectures
  Kubernetes Deployment - Proof of Concept
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter]
exporters: [prometheus]

