What to consider when monitoring microservices

http://particular.net
What to consider when monitoring
microservices
Sean Farmar holds the world record for answering the most
NServiceBus questions - even more than Udi.
With over 20 years of experience, he specializes in providing
simple solutions for complex business requirements using
NServiceBus and applying SOA principles inspired by Udi
Dahan.
As a solution architect with Particular Software, the creators of
NServiceBus, Sean provides support, training and consulting
for customers using NServiceBus and the Particular Platform.
A professional geek, William works for Particular Software
writing amazing software like NServiceBus. Passionate about
the web and security, he is engaged in a sordid love affair with
JavaScript, and spends most of his free time trying to convince
others of it's beauty and elegance.
When not behind his laptop hacking away, this amateur beer
enthusiast can often be found playing boardgames or drinking
cold-brew coffee.

William Brander http://particular.net Sean Farmar
What to consider when monitoring
microservices

Agenda
• Introduction
• A Philosophy on Monitoring
• How things change when they’re distributed
• Monitoring Metrics
• Q & A

An average production system
Database
• Is the web server up?
• Is the database up?
• Can the webserver talk
to the db?

What are you actually monitoring?
Business
Capability
Application
Infrastructure
Are my servers running?Is my application process running?Can users access the system?

A Monitoring Philosophy
Business
Capability
Application
Infrastructure
Capacity
Performance
Health
Monitoring Area Monitoring Concern

Monitoring Concerns
Capacity
Performance
Health
Is the server up?Is there high CPU?Do I have enough disk space?
Is my application generating exceptions?
How quickly is my system processing messages?
Can I handle month end batch jobs?
Is the server up?
Is there high CPU?
Do I have enough disk space?
Application
Infrastructure
Can users access the system?
Are we meeting our SLAs?
What is the impact of adding another customer?
Business
Capability

A Monitoring Philosophy
Business
Capability
Application
Infrastructure
Capacity
Performance
Health
Monitoring Area Monitoring Concern
Proactive
Reactive
Passive
Interaction Type

An average production system
Database
• Is the web server up?
• Is the database up?
• Can the webserver talk
to the db?
Infrastructure PassiveHealth

Going Distributed
UI
BL
DAL
DB
Email
PDF
CRM

Going Distributed
EmailPDF
CRM
SQL

Monitoring distributed systems
Multiple processes and servers and queues
We want to monitor the time it takes for a message to be processed
We need to monitor the message queues

Queue Length
• Queue length is an indicator of work still outstanding
• High queue length doesn’t necessarily indicate a problem though
Stable or
decreasing
is good
Increasing
is bad

Processing Time
✔ ⌛
⏱️
⌛ ✔

Processing Time and fault tolerance
• Processing Time does not include error handling time
• Avoid losing data due to exceptions or temporary connectivity issues
• If all else fails, move the message to the error queue

Invoke
Exception Diagnostics – Immediate Retries
Input Queue
Immediate Retries x n
Start Delayed Retries
Timeout Queue
If all retries fail:
✘
✘

Exception Diagnostics – Delayed Retries
Return to timeout queue
Invoke
Error Queue
Retry x timesTimeout Queue
If all fails: move to error queue
✘
✘

Detecting Connectivity
• Distributed systems typically work when other parts aren’t available
• How do you know the endpoint you’re sending messages to is
actually processing messages?

✔⌛
⏱️
Critical time
⏱️
Critical time = The entire time taken to process a
message successfully
⏱️

• Critical Time is the total duration between when a message is created
to when it is processed
Critical Time = Time in Queue +
Processing Time +
Retry Time +
Network Time
Critical Time
Stable or decreasing could
be good
Increasing is bad

Putting these together
• Each of these metrics presents a piece of the puzzle
• Look at them from an endpoint’s perspective, not per message
• Looking at them together gives great insight into your system
Critical Time Processing Time Queue LengthCritical Time Processing Time Queue LengthCritical Time Processing Time Queue Length

Keeping your eye on everything
• These 5 metrics can give a lot of insight
• Some individual metrics are meaningful
• But most tell a story when combined with others
• Let the monitoring philosophy guide what you focus on
• NServiceBus already provides a lot of these metrics for you!
• Letting you focus on monitoring the metrics that impact your business

Learn more
• Try NServiceBus + the Particular Service Platform
• https://docs.particular.net/tutorials/quickstart/
• Take a look at NServiceBus.Metrics Nuget package
• Follow us to find out about the next webinar in the series!

Thank you!
@farmar sean.farmar@particular.net
@williambza william.brander@particular.net
https://www.particular.net

What to consider when monitoring microservices

Related slideshows

More Related Content

What to consider when monitoring microservices