SlideShare a Scribd company logo
Sumo Logic confidential
Kubernetes Monitoring &
Best Practices
1
Sumo Logic confidential
• Principal Development Engineer at DellEMC
• 1st half of my career was in CGI & VMware
• 2nd half of my career has been in System Integration Testing
• Docker Captain (since 2016)
• Docker Bangalore Meetup Organizer ( 8800+ Registered
Users)
• DockerLabs Incubator ~ 1700+ Slack Members
• Freqeunt Blogger – www.collabnix.com
Ajeet Singh Raina
Twitter: @ajeetsraina
GitHub: ajeetraina
2
Sumo Logic confidential
Suresh Govindachetty
• Enterprise Sales Engineer at Sumo Logic
• Formerly with Citrix, HPE,Nortel
• Mostly in Presales, Networking and Security
3
Sumo Logic confidential
Massive shift in
monitoring
requirements from
host based
monitoring
to
“container-specific
& service-oriented
monitoring”
4
Sumo Logic confidential
Containers & Kubernetes: The New Reality
App
Traditional
Software
Architecture
Containerized
Architecture
Server
Orchestrated
Containerized
Architecture
5
Sumo Logic confidential
Traditional Monitoring Solution
Bare Metal
System
hypervisor
Virtual Machines Containers
Monitoring agent
6
In a Monolithic World…
What to Monitor?
Application
Hosts on which the
applications gets deployed
7
In a Cloud Native World…
What to Monitor?
Hosts
Kubernetes
Platform
Docker Containers Containerized
Microservices
8
Sumo Logic confidential
Benefits of Containers & Kubernetes
Portability Scalability Rolling Updates Service Discovery Load Balancing
Self Healing Secure
9
Sumo Logic confidential
While Kubernetes solves old problems,
it introduces new ones.
10
Sumo Logic confidential
K8s is powerful…
but Complex !
Kubernetes
is great but
COMPLEX!
$kubectl create –f web.yaml
Current Challenges in Kubernetes Monitoring & Troubleshooting
Sumo Logic confidential
Current Challenges in Kubernetes Monitoring & Troubleshooting
K8s is powerful…
but Complex !
Everything,
In K8s
by design
Is
Ephemeral
Sumo Logic confidential
Current Challenges in Kubernetes Monitoring & Troubleshooting
K8s is powerful…
but Complex !
Cascading
Failures
- Container Communication
- Increased Dependencies
- Changing Architecture
Sumo Logic confidential
Current Challenges in Kubernetes Monitoring & Troubleshooting
K8s is powerful…
but Complex !
More & Noisy
Metrics(100x)
- Container Unique Metrics
- Ephemeral Data
- False Positives
Sumo Logic confidential
Methodology Switch
Cattle: (Container) Pet: (K8s Services)
o Named with strings of numbers
o Almost identical
o Ephemeral
o Sick: get new one
o 1 or more identical Pods
o Specific Name( kube_app, kube_name)
o Give context to container metrics
o Sick: nurse back to health
15
Sumo Logic confidential
Visualizing Kubernetes Objects
Service A
Namespace
Service B
Container
Pod C1
Pod C2
Pod C3
Service C
Container
Container
Pet
Cattle
16
Sumo Logic confidential
K8s Monitoring Strategies & Methods
- Remote Polling( K8s metric/event APIs)
- Node-based (agent per host/ DaemonSets)
- Sidecars (agent per Pod)
- Logs & APM
17
Sumo Logic confidential
K8s Metrics - Monitoring Kubernetes Cluster
Node resource utilization The number of nodes Running pods
- Are number of nodes available
sufficient?
- Can they handle the entire
workload in case a node fails?
- Number of nodes available
- What you are paying for
- Discover what the cluster is
being used for.
- Network bandwidth
- Disk utilization
- CPU, and
- Memory
18
Sumo Logic confidential
K8s Metrics - Monitoring Pod
Kubernetes Metrics Container Metrics Application Metrics
- Developed by the application
itself and are related to the
business rules it addresses.
- For example, a database
application exposing metrics
related to an indices’ state and
statistics concerning tables and
relationships.
- Using Cadvisor and exposed by
Heapster, which queries every
node about the running
containers.
- Metrics like CPU, network, and
memory usage compared with
the maximum allowed are the
highlights.
- Monitor how a specific pod and its
deployment are being handled
- The number of instances a pod has
at the moment and how many were
expected
- How the on-progress deployment is
going (how many instances were
changed from an older version to a
new one), health checks, and some
network data available through
network services.
19
Sumo Logic confidential
Node Metrics from node_exporter Container Metrics from cadvisor K8s Metrics from K8s API Server
- node_exporter installed a DaemonSet
- 1 instance per node
- Also called as “K8s Core Metrics”
- Metrics about the performance of the k8s
API server
- Standard Host Metrics
- Load Average
- CPU
- Memory
- Disk
- Network
- Embedded into the Kubelet, so we
scrape the Kubelet to get container
metrics
- For each container on the node:
- CPU Usage
- Filesystem read/write/limits
- Memory usage and limits
- Network transmit/receive/dropped
- Performance of controller work queues
- Request Rates and Latencies
- ETCD helper cache work queues and
cache performance
- General process status(File
Descriptors/Memory/CPU seconds.
- GoLang Status(GC/Memory/Threads).
100 unique series in typical node
Sources of Metrics in Kubernetes
20
Sumo Logic confidential
Source of Metrics in Kubernetes
k8s derived kube-state-metrics Etcd Metrics from etcd
- Counts & metadata about many k8s types
- Count of many 'nouns'
- Resource limits
- Container States
- Ready/restarts/running/terminated/waiting
- Etcd is "master of all truth" within a k8s
cluster
- Leader existence and leader change
rate
- Disk Write Performance
- Inbound gRPC stats
- etcd_http_received_total
- etcd_http_failed_total
- etcd_http_successful_duration_*
21
Kubernetes Monitoring
Best Practices
22
Sumo Logic confidential
#1: Collect Metrics at Container Level but Alerts at Service
Level
$cat /etc/docker/daemon.json
{
"metrics-addr" : "127.0.0.1:9323",
"experimental" : true
}
Sumo Logic confidential
#2: Monitor Service Level Objective(SLO) per Service per Route
• Error Rate per Service per route
• Latency per Service per route
Sumo Logic confidential
#3: Infra Metrics: Utilization
- Resource Availability for Pods Vs Allocation
- Verify every Pod/Container has a limit (BP)
25
Sumo Logic confidential
#4: Always alert on High Disk Usage
26
• Monitor ALL disk volumes, including the root file system.
• Kubernetes Node Exporter provides a nice metric for tracking devices
Sumo Logic confidential
#5: Never ignore Kube-system
27
• Total DNS Requests - Resource Issue, Scaling Limits, Application Bug
• DNS Request Time - High Latency
• Quorum Loss in the cluster/Failure in Leader Election
• Unusual High Snapshot Duration
• Network criticality
Sumo Logic confidential
#6: Consistent Metadata Enrichment
Tag individual components of Kubernetes so that it can provide context for
your services
Sumo Logic confidential
Best Practice #6: No Better KPI than API - Track the API
Gateway for Microservices in order to
automatically detect application issues
<Image TBD>
29
Sumo Logic confidential
Discoverability - Infrastructure vs. Service View
- Complex
- Slow to find and troubleshoot issues
- Disconnected from the customer reality
- Simple to understand
- Quick to find and troubleshoot issues
- Tightly connected to the customer reality
Service-centric ViewpointInfrastructure-centric Viewpoint
30
Sumologic K8s Monitoring and Troubleshooting
• Delivers a best in class, end-to-end Kubernetes Monitoring and Troubleshooting experience.
• Open source collectors (Fluentbit, Fluentd,Prometheus, Falco)
• Visualize K8s hierarchies through Deployment, Service, Node and Namespace views
• Honeycomb visualization - quick overview of data in a visually digestible way.
• Simplified Monitoring and Troubleshooting
• Correlation of Logs, Metrics, event and Security
• Integrated security with Falco+ partner apps
Sumo Logic confidential
Data Collection with Sumo Logic
32
Sumo Logic Confidential
Our Kubernetes Partner Apps - Security
App Purpose Details
SecOps Provides comprehensive monitoring and analysis solution for detecting
vulnerabilities and potential threats throughout your environment,
including hosts, containers, images and registry.
SecOps Helps you detect, investigate and remediate vulnerabilities, insecure
configurations and compliance violations across all container and
Kubernetes environments.
SecOps Provides granular security and compliance control monitoring to
DevSecOps teams throughout the cloud native application lifecycle, from
development to runtime in production.
SecOps Gives customers the ability to detect, investigate, and remediate
vulnerabilities in software artifacts across your deployment environments.
33
Sumo Logic Confidential
Ecosystem - Unified K8s DevOps and SecOps
Monitoring
CI/CD DevOps SecOps
circleci
codefresh
armory
harness
Kubernetes
AmazonEKS
Google
Kubernetes
Service
Azure
Kubernetes
Service
Falco
Twistlock
StackRox
aqua
Tigera
JFrog Xray
34
Sumo Logic confidential
It’s Demo Time…
35
Sumo Logic Confidential
36
References
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-
monitoring/
https://www.sumologic.com/lp/kubernetes-monitoring-app
37
Sumo Logic Confidential
Thank You
38

More Related Content

Kubernetes Monitoring & Best Practices

  • 1. Sumo Logic confidential Kubernetes Monitoring & Best Practices 1
  • 2. Sumo Logic confidential • Principal Development Engineer at DellEMC • 1st half of my career was in CGI & VMware • 2nd half of my career has been in System Integration Testing • Docker Captain (since 2016) • Docker Bangalore Meetup Organizer ( 8800+ Registered Users) • DockerLabs Incubator ~ 1700+ Slack Members • Freqeunt Blogger – www.collabnix.com Ajeet Singh Raina Twitter: @ajeetsraina GitHub: ajeetraina 2
  • 3. Sumo Logic confidential Suresh Govindachetty • Enterprise Sales Engineer at Sumo Logic • Formerly with Citrix, HPE,Nortel • Mostly in Presales, Networking and Security 3
  • 4. Sumo Logic confidential Massive shift in monitoring requirements from host based monitoring to “container-specific & service-oriented monitoring” 4
  • 5. Sumo Logic confidential Containers & Kubernetes: The New Reality App Traditional Software Architecture Containerized Architecture Server Orchestrated Containerized Architecture 5
  • 6. Sumo Logic confidential Traditional Monitoring Solution Bare Metal System hypervisor Virtual Machines Containers Monitoring agent 6
  • 7. In a Monolithic World… What to Monitor? Application Hosts on which the applications gets deployed 7
  • 8. In a Cloud Native World… What to Monitor? Hosts Kubernetes Platform Docker Containers Containerized Microservices 8
  • 9. Sumo Logic confidential Benefits of Containers & Kubernetes Portability Scalability Rolling Updates Service Discovery Load Balancing Self Healing Secure 9
  • 10. Sumo Logic confidential While Kubernetes solves old problems, it introduces new ones. 10
  • 11. Sumo Logic confidential K8s is powerful… but Complex ! Kubernetes is great but COMPLEX! $kubectl create –f web.yaml Current Challenges in Kubernetes Monitoring & Troubleshooting
  • 12. Sumo Logic confidential Current Challenges in Kubernetes Monitoring & Troubleshooting K8s is powerful… but Complex ! Everything, In K8s by design Is Ephemeral
  • 13. Sumo Logic confidential Current Challenges in Kubernetes Monitoring & Troubleshooting K8s is powerful… but Complex ! Cascading Failures - Container Communication - Increased Dependencies - Changing Architecture
  • 14. Sumo Logic confidential Current Challenges in Kubernetes Monitoring & Troubleshooting K8s is powerful… but Complex ! More & Noisy Metrics(100x) - Container Unique Metrics - Ephemeral Data - False Positives
  • 15. Sumo Logic confidential Methodology Switch Cattle: (Container) Pet: (K8s Services) o Named with strings of numbers o Almost identical o Ephemeral o Sick: get new one o 1 or more identical Pods o Specific Name( kube_app, kube_name) o Give context to container metrics o Sick: nurse back to health 15
  • 16. Sumo Logic confidential Visualizing Kubernetes Objects Service A Namespace Service B Container Pod C1 Pod C2 Pod C3 Service C Container Container Pet Cattle 16
  • 17. Sumo Logic confidential K8s Monitoring Strategies & Methods - Remote Polling( K8s metric/event APIs) - Node-based (agent per host/ DaemonSets) - Sidecars (agent per Pod) - Logs & APM 17
  • 18. Sumo Logic confidential K8s Metrics - Monitoring Kubernetes Cluster Node resource utilization The number of nodes Running pods - Are number of nodes available sufficient? - Can they handle the entire workload in case a node fails? - Number of nodes available - What you are paying for - Discover what the cluster is being used for. - Network bandwidth - Disk utilization - CPU, and - Memory 18
  • 19. Sumo Logic confidential K8s Metrics - Monitoring Pod Kubernetes Metrics Container Metrics Application Metrics - Developed by the application itself and are related to the business rules it addresses. - For example, a database application exposing metrics related to an indices’ state and statistics concerning tables and relationships. - Using Cadvisor and exposed by Heapster, which queries every node about the running containers. - Metrics like CPU, network, and memory usage compared with the maximum allowed are the highlights. - Monitor how a specific pod and its deployment are being handled - The number of instances a pod has at the moment and how many were expected - How the on-progress deployment is going (how many instances were changed from an older version to a new one), health checks, and some network data available through network services. 19
  • 20. Sumo Logic confidential Node Metrics from node_exporter Container Metrics from cadvisor K8s Metrics from K8s API Server - node_exporter installed a DaemonSet - 1 instance per node - Also called as “K8s Core Metrics” - Metrics about the performance of the k8s API server - Standard Host Metrics - Load Average - CPU - Memory - Disk - Network - Embedded into the Kubelet, so we scrape the Kubelet to get container metrics - For each container on the node: - CPU Usage - Filesystem read/write/limits - Memory usage and limits - Network transmit/receive/dropped - Performance of controller work queues - Request Rates and Latencies - ETCD helper cache work queues and cache performance - General process status(File Descriptors/Memory/CPU seconds. - GoLang Status(GC/Memory/Threads). 100 unique series in typical node Sources of Metrics in Kubernetes 20
  • 21. Sumo Logic confidential Source of Metrics in Kubernetes k8s derived kube-state-metrics Etcd Metrics from etcd - Counts & metadata about many k8s types - Count of many 'nouns' - Resource limits - Container States - Ready/restarts/running/terminated/waiting - Etcd is "master of all truth" within a k8s cluster - Leader existence and leader change rate - Disk Write Performance - Inbound gRPC stats - etcd_http_received_total - etcd_http_failed_total - etcd_http_successful_duration_* 21
  • 23. Sumo Logic confidential #1: Collect Metrics at Container Level but Alerts at Service Level $cat /etc/docker/daemon.json { "metrics-addr" : "127.0.0.1:9323", "experimental" : true }
  • 24. Sumo Logic confidential #2: Monitor Service Level Objective(SLO) per Service per Route • Error Rate per Service per route • Latency per Service per route
  • 25. Sumo Logic confidential #3: Infra Metrics: Utilization - Resource Availability for Pods Vs Allocation - Verify every Pod/Container has a limit (BP) 25
  • 26. Sumo Logic confidential #4: Always alert on High Disk Usage 26 • Monitor ALL disk volumes, including the root file system. • Kubernetes Node Exporter provides a nice metric for tracking devices
  • 27. Sumo Logic confidential #5: Never ignore Kube-system 27 • Total DNS Requests - Resource Issue, Scaling Limits, Application Bug • DNS Request Time - High Latency • Quorum Loss in the cluster/Failure in Leader Election • Unusual High Snapshot Duration • Network criticality
  • 28. Sumo Logic confidential #6: Consistent Metadata Enrichment Tag individual components of Kubernetes so that it can provide context for your services
  • 29. Sumo Logic confidential Best Practice #6: No Better KPI than API - Track the API Gateway for Microservices in order to automatically detect application issues <Image TBD> 29
  • 30. Sumo Logic confidential Discoverability - Infrastructure vs. Service View - Complex - Slow to find and troubleshoot issues - Disconnected from the customer reality - Simple to understand - Quick to find and troubleshoot issues - Tightly connected to the customer reality Service-centric ViewpointInfrastructure-centric Viewpoint 30
  • 31. Sumologic K8s Monitoring and Troubleshooting • Delivers a best in class, end-to-end Kubernetes Monitoring and Troubleshooting experience. • Open source collectors (Fluentbit, Fluentd,Prometheus, Falco) • Visualize K8s hierarchies through Deployment, Service, Node and Namespace views • Honeycomb visualization - quick overview of data in a visually digestible way. • Simplified Monitoring and Troubleshooting • Correlation of Logs, Metrics, event and Security • Integrated security with Falco+ partner apps
  • 32. Sumo Logic confidential Data Collection with Sumo Logic 32
  • 33. Sumo Logic Confidential Our Kubernetes Partner Apps - Security App Purpose Details SecOps Provides comprehensive monitoring and analysis solution for detecting vulnerabilities and potential threats throughout your environment, including hosts, containers, images and registry. SecOps Helps you detect, investigate and remediate vulnerabilities, insecure configurations and compliance violations across all container and Kubernetes environments. SecOps Provides granular security and compliance control monitoring to DevSecOps teams throughout the cloud native application lifecycle, from development to runtime in production. SecOps Gives customers the ability to detect, investigate, and remediate vulnerabilities in software artifacts across your deployment environments. 33
  • 34. Sumo Logic Confidential Ecosystem - Unified K8s DevOps and SecOps Monitoring CI/CD DevOps SecOps circleci codefresh armory harness Kubernetes AmazonEKS Google Kubernetes Service Azure Kubernetes Service Falco Twistlock StackRox aqua Tigera JFrog Xray 34