Tech talk microservices debugging

Debugging Microservices
Key challenges and techniques

• Staff engineer at Lohika
• More than 17 years in IT
• Primary focus on JVM based languages, Big data and Microservices
About me

• Debugging microservices key challenges
• Observability
– Logging
– Monitoring
– Tracing
• Debugging tools for Kubernetes
– Telepresence v1 and v2
Agenda

The challenge
Monolithic application
• Single process
• Holystic view
• Simple infrastructure
• Can be deployed/debugged
locally
Microservice application
• Multiple processes
• Fractional view
• Complex infrastructure
• Local deployment/debug
can be an issue

The challenge (most optimistic figures)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.370.9611&rep=rep1&type=pdf

Observability
Monitoring
• Provides high level view of the
system health and performance
(Grafana, Prometheus,
VictoriaMetrics)
Logging
• Keep record of input data,
processing and results in the
application (Elasticsearch, Fluent
bit, Kibana)
Tracing
• Insights about specific operation
(Open tracing, Jaeger)

Monitoring
• A way to get bird eye view on
the infrastructure and services
health
• A way to get information about
system and individual
components performance
• A way to be alerted on
SLA/SLO
• Infrastructure health and
resource utilization
• Application and individual
services health and resource
utilization
services performance
services errors
Why What

Monitoring- How?
• Define naming conventions for the metrics
• Structure dashboards
• Build dashboards to be used with predefined techniques e.g., Layer
peeling, Exemplars
• Dashboards for infrastructure and applications i.e., follow the methodology
(USE, RED)
• Dashboards for specific services e.g., Java and Spring
• Avoid having a lot of custom dashboards and too much data
• Avoid high data cardinality when using tags
• Avoid having false positive alerts
• Look for predefined dashboards e.g., Spring

USE and RED
• Utilization: the
proportion of the
resource that is used,
so 100% utilization
means no more work
can be accepted;
• Saturation: the degree
to which the resource
has extra work which it
can’t service, often
queued;
• Errors: the count of
error events;
RED
• Rate: the number of
requests our service
is serving;
• Error: the number of
failed requests;
• Duration: the amount
of time it takes to
process a request;
USE

Logging
• Monitoring and troubleshooting
application for engineers
• Helping operations
• Security, compliance
• A way to be alerted on
SLA/SLO
• Application events:
• Availability events
(startup/shutdown)
• Resources (connectivity issues)
• Threats
• Errors
• Processing events
• Highly depends on security/audit
and compliance requirements:
• Login/Logout
• Attempting accessing unauthorized
data
• User actions
Why What

Logging – How?
• Centralized logging
• Align on the log format and levels
• Use structured logs
• Ability to correlate request inter services
• Log messages same as code would be read by other engineers think of them
and help them
• Do not trust clocks
• Do not log sensitive information

Tracing with Jaeger
• A way to get details about
individual request/event
• A way to get insights into
performance
• A way to get cross service
dependencies
• Statistics on time spent
• Compare traces
• Share traces
• Timings and logs for:
• Database calls
• Calls to other services
• Messages queues
• Heavy processing
Why What

Tracing – How?
• Pick either open tracing or open telemetry
• Open telemetry is a merge of open tracing and open census
• Open telemetry is newer and provides metrics API as well
• Key concepts:
• Spans:
• Named, timed operation representing a piece of the workflow.
• Contains: operation name, start and finish timestamps, tags, logs and context
• May contain other spans
• Tracers
• The Tracer interface creates Spans and understands how to Inject (serialize)
and Extract (deserialize) their metadata across process boundaries
• A new trace is started whenever a new Span is created without references to a
parent Span.

Tracing – How?
• Add open tracing support to your application e.g., opentracing-spring-jaeger-cloud-
starter
• Add additional libraries e.g., gRPC(opentracing-grpc)
• In case application contains few languages align on span tags, names and implement
decorators
• Ensure trace id and span id are used as correlation id in logs
• If you have service mesh then interservice communication can be received for
free and integrated with Jaeger or you may look at tools like Kiali
• If your application uses Zipkin it still can be easily switched to Jaeger

Tracing – How?
• Install and configure Jaeger
• Client - libraries that implement open tracing API and send data further
• Agent – network daemon that listens to UDP and sends data to collector
• Collector – Stores data in the storage
• Storage – Storage with the span (Cassandra, Elasticsearch, Kafka)
• Query – provides API to read trace data from storage
• Ingester – reads data from Kafka and stores it in the storage

Tech talk microservices debugging

Tracing – How?
• Configure sampling
• Constant
• Probabilistic
• Rate limiting
• Remote
• Configure autoscaling for collectors
• Provide enough resources to the storage

Recap
• So:
• Complex infrastructure is monitored and there is visibility in
• Attempt to provide holistic view is provided by the Jaeger
• Centralized logging and open tracing allow to trace request through multiple processes
• How to troubleshoot then:
• Identify what version is deployed
• Punish people which use latest instead of specific deployment version
• Use metrics to check service and infrastructure health, resource consumption
• Find error(s) in the logs and by filtering by trace id find root operation
• Find corresponding operations in Jaeger and analyze the calls and compare with logs
• Build the hypothesis and test it or debug it

How can we debug services in Kubernetes?
• Port forward and remote debugging
• Tools like Telepresence and Squash
• Use cases:
• Issues reproduced only on the cluster
• Services accessible only on the cluster
• No ability to run service(s) locally
• Cloud native technologies

What telepresence tries to solve?

How does it solve it?
• Telepresence v1
• Provides ability to export env vars and swap deployment in the container with the
proxy
• Forwards ports that service exposes
• Routes all traffic through the proxy
• To achieve that:
• Run telepresence --swap-deployment {serviceName} --namespace
{namespaceName} --env-json ~/telepresence-legacy.json
• In other words:
• Service will run locally but would have access to all the resources in the cluster
and no debugging information will be passed via network and no time is spent
for container build/upload and deploy

Telepresence v1
• Telepresence v1 is cool and reliable tool which does not require
any cluster configuration
• Telepresence v1 is great but has a lot of limitations:
• Only one service at a time can be debugged
• Service is fully replaced and thus all traffic goes to your machine
• Thus, telepresence v2 was implemented

Telepresence v2
• Access all resources in cluster like your machine is deployed there
• telepresence connect
• Debug multiple services at a time
• Execute multiple intercept commands and point them to different local ports
• Intercept specific ports
• telepresence list
• kubectl get service example-service –output.yaml
• telepresence intercept example-service --port 8080:http --env-file ~/example-service-intercept.env
• Intercept specific requests
• telepresence intercept example-service --port 8080:http --env-file ~/example-service-intercept.env –preview-url=true
• Share dev environments

Telepresence v2
• Requires cluster level configuration
• Is not that stable as v1

Telepresence v2 cons
• Brew by default updates you to the latest which may require cluster
configuration
• It cannot intercept more than one port on the service
• It does not substitute the pod and thus if you consume messages
your breakpoint may not work
• It does not work with certain service meshes

So, what should I use?
• Use both 
• V1 suits for the cases when:
• there is more than one port to intercept
• you need to consume messages from the queues or Kafka
• It is ok to swap the deployment
• V2 suits for the cases:
• Connect to cluster resources without extra port forwards
• Intercept specific port
• Intercept specific requests

So, how would I do that?
• Install v2
• To install specific version (2.3.5), please use that command line:
• sudo curl -fL
https://app.getambassador.io/download/tel2/darwin/amd64/2.3.5/telepresence -
o /usr/local/bin/telepresence
• sudo chmod a+x /usr/local/bin/telepresence
• Install V1:
• brew install --cask macfuse
• brew install datawire/blackbird/telepresence-legacy
• ln -s /usr/local/Cellar/telepresence-legacy/0.109/bin/telepresence
/usr/local/bin/tel

Tech talk microservices debugging

More Related Content

Tech talk microservices debugging