What to consider when monitoring microservices
- 1. http://particular.net
What to consider when monitoring
microservices
Sean Farmar holds the world record for answering the most
NServiceBus questions - even more than Udi.
With over 20 years of experience, he specializes in providing
simple solutions for complex business requirements using
NServiceBus and applying SOA principles inspired by Udi
Dahan.
As a solution architect with Particular Software, the creators of
NServiceBus, Sean provides support, training and consulting
for customers using NServiceBus and the Particular Platform.
A professional geek, William works for Particular Software
writing amazing software like NServiceBus. Passionate about
the web and security, he is engaged in a sordid love affair with
JavaScript, and spends most of his free time trying to convince
others of it's beauty and elegance.
When not behind his laptop hacking away, this amateur beer
enthusiast can often be found playing boardgames or drinking
cold-brew coffee.
- 3. Agenda
• Introduction
• A Philosophy on Monitoring
• How things change when they’re distributed
• Monitoring Metrics
• Q & A
- 4. An average production system
Database
• Is the web server up?
• Is the database up?
• Can the webserver talk
to the db?
- 5. What are you actually monitoring?
Business
Capability
Application
Infrastructure
Are my servers running?Is my application process running?Can users access the system?
- 7. Monitoring Concerns
Capacity
Performance
Health
Is the server up?Is there high CPU?Do I have enough disk space?
Is my application generating exceptions?
How quickly is my system processing messages?
Can I handle month end batch jobs?
Is the server up?
Is there high CPU?
Do I have enough disk space?
Application
Infrastructure
Can users access the system?
Are we meeting our SLAs?
What is the impact of adding another customer?
Business
Capability
- 9. An average production system
Database
• Is the web server up?
• Is the database up?
• Can the webserver talk
to the db?
Infrastructure PassiveHealth
- 14. Queue Length
• Queue length is an indicator of work still outstanding
• High queue length doesn’t necessarily indicate a problem though
Stable or
decreasing
is good
Increasing
is bad
- 16. Processing Time and fault tolerance
• Processing Time does not include error handling time
• Avoid losing data due to exceptions or temporary connectivity issues
• If all else fails, move the message to the error queue
- 18. Exception Diagnostics – Delayed Retries
Return to timeout queue
Invoke
Error Queue
Retry x timesTimeout Queue
If all fails: move to error queue
✘
✘
- 19. Detecting Connectivity
• Distributed systems typically work when other parts aren’t available
• How do you know the endpoint you’re sending messages to is
actually processing messages?
- 22. • Critical Time is the total duration between when a message is created
to when it is processed
Critical Time = Time in Queue +
Processing Time +
Retry Time +
Network Time
Critical Time
Stable or decreasing could
be good
Increasing is bad
- 23. Putting these together
• Each of these metrics presents a piece of the puzzle
• Look at them from an endpoint’s perspective, not per message
• Looking at them together gives great insight into your system
Critical Time Processing Time Queue LengthCritical Time Processing Time Queue LengthCritical Time Processing Time Queue Length
- 24. Keeping your eye on everything
• These 5 metrics can give a lot of insight
• Some individual metrics are meaningful
• But most tell a story when combined with others
• Let the monitoring philosophy guide what you focus on
• NServiceBus already provides a lot of these metrics for you!
• Letting you focus on monitoring the metrics that impact your business
- 25. Learn more
• Try NServiceBus + the Particular Service Platform
• https://docs.particular.net/tutorials/quickstart/
• Take a look at NServiceBus.Metrics Nuget package
• Follow us to find out about the next webinar in the series!