Observability Shivagami Gugan

Observability
Shivagami Gugan
Technology Transformation Leader
SRE * DevOps * Practitioner

2
Performance Impacts the Business
1. Walmart found that for every 1
second improvement in page
load time, conversions
increased by 2%
2. Mobify found that each 100ms
improvement in their
homepage's load time resulted
in a 1.11% increase in
conversion
“SLOW is the new DOWN”

3
Performance in Complex Architectures
● Systems have become inherently very complex
● There is a whitespace in the area of “Integrated Visibility”
Distributedness

Practitioner’s view of Observability If you miss the State changes, you will
not know which workload is being
serviced by which resource.
With Transience, with every spin up of
resources, entity changes with every
state change
Remember, Aggregation is the biggest
enemy that will “kill” variety, making the
information totally useless
Technology and frameworks, analytics to
cater to high cardinality, fast retrieval and
meaningful deciphering in near real time
is key
Complexity
Metadata Variance due to high cardinality
Distributedness & Transaction depth

Logs, Events, Metrics and Tracing
Digital Business
• Business Metrics View
– Checkout Abandonment
– Customer Churn
– Revenue per Location
Demand & Workload
• RED Metrics View
– Request throughput
– Errors
– Duration (Latency, Response
time)
Resources
• USE Metrics View
– Utilization
– Saturation
– Errors
Context
• Distributed Tracing
– Dependency on downstream
– Service Maps
– End-to-End Transaction (hotspots,
logic flaws)
Satura
tion
Latency
Errors
Traffic
Google’s Golden Signals
As applications become more
distributed, multiple dependencies,
and ephemeral
BUILD BETTER INSIGHTS INTO
YOUR SYSTEM

6
Law of Requisite variety
“If a system is to be stable, the number of states of its control
mechanism must be greater than or equal to the number of states in
the system being controlled”
- W. Ross Ashby
What are the Varieties?
Version changes: deployed upgrades of service versions
Topological changes: new components that appear and disappear in the system
landscape and affect dependencies between existing running components.
Component property changes: changing labels and tags of components

Instrumenting with Agents vs. Instrumenting with Libraries
• Instrumenting with Agents
• Outside-in Approach
• External Agent logs with your
application to introspect your
code at run time
• Decides what calls to measure
and what metadata to extract
based on specifications from
external configs
• More complete but often loses
context
● Instrumenting with Libraries
● Inside-out Approach
● Developer includes a Trace library and configures
spans that allows code to participate in
distributed tracing
● When App runs, trace spans are generated
asynchronously and dispatched to a persistence
store preferably hooked to backend analytics
engine
● Highly context driven and hence breadcrumbs
code path
“These are not cannibalistic approaches, they can well play in
concert”

Inflection point - Observability-driven development ?
• Evolving Technology, Evolving user friendly analytics – The Observer technology should be
more competent than the Observed technology
• Dev and Ops? the Dev way - due to so many fundamental changes that Ops can’t keep
pace
• What is Staging like Prod? Does it exist?
• Developers needs to own the code, with the ability to deploy it and debug/test in Prod
• Good practice - Merge will happen only when proper Observability hooks are baked in the
code
• Never accept a PR until you learn the instrumentation
• Distributed tracing and building breadcrumbs fundamental for building reliable systems
• Observability Driven Design makes DevOps and SRE principles fuller
• Give Developers the privilege to “ You Build, You Run, You Monitor”

Observability Shivagami Gugan

More Related Content

Observability Shivagami Gugan

Editor's Notes