SlideShare a Scribd company logo
Z E B R I U M
Webinar: Using Machine Learning
on K8s Logs to Find Root Cause Faster
Zebrium Proprietary & Confidential
7 Best Log Management
Tools for Kubernetes
LARRY LANCASTER
Founder/CTO
Zebrium
ARAN KHANNA
Co-Founder/CEO
Reserved.ai
GAVIN COHEN
VP Product
Zebrium
Your app is broken, now what?
Your app is broken, now what?
Demo Part 1
Microservices demo app
Deployed on a K8s cluster using minikube
Detected: incident opened
PagerDuty Incident #1652:
HTTP Status: 500 Internal Server Error
Priority: P1
Description: Service not responding
Monitoring dashboard Incident management tool
Time to hunt through your logs…
▪ Look for “bad” (errors, critical,
alert, etc.) & rare events
▪ Focus on clusters of these,
especially across services/hosts
▪ Leverage intuition and experience
to keep exploring…
Imagine Automating the RCA Process?
Send your logs, zero training…
Identifies root cause without hunting
Your app is broken, now what?
Demo Part 2
What the machine learning found
when analyzing the logs from the
broken app
Welcome!
Aran Khanna
Co-Founder/CEO @ Reserved.ai
@arankhanna
About Reserved.ai
Reserved.AI plans, forecasts, and manages your cloud
resources for you
▪ Provides automated attribution, alerting and scenario modeling for
complex multi-cloud environment
▪ Enables automated commitment management, cashflow
forecasting and tax optimization of cloud resources
▪ Guarantees buyback of over-committed resources
▪ Requires only a simple 5-minute IAM role installation
The problem – Before Zebrium
▪ Cloud native stack in AWS using Kubernetes
▪ Flask, Celery, Redis, PostgreSQL, Airflow, React and more
▪ Each instance of each component generates lots of log events
▪ Prior to Zebrium, manual log searching when there was a
problem
▪ vi, grep, awk, custom scripts
▪ Lots of time wasted by critical engineers to find root cause
▪ Potential for problems to go unnoticed
I was skeptical!
ARE YOU
TELLING ME
AI CAN SHOW
ROOT CAUSE?
Very early success
▪ Soon after trial started
▪ Unexpected AWS API change occurred
▪ Would have caused a service disruption had it not been caught
▪ Zebrium picked up the problem and correlated it with events from
all the services that were impacted.
▪ Root cause report showed us what we needed
▪ We resolved problem quickly
Root cause reports are amazing
GPT-3 is amazing
▪ Some recent examples
Using Zebrium to Find Root Cause
Automatic root cause analysis
!
Monitoring, APM,
Logging, Help Desk
or any other tool
ML Root Cause
Report
2
3
1
With incident management tool
(PagerDuty, Opsgenie, Slack, etc.)
When something breaks: Scan root cause summaries
around time of interest (existing or on-demand)
Without incident management tool
How the ML works
ML-driven structuring,
event categorization &
pattern learning
1 2
Anomaly scoring for
each incoming event
Raw log streams
Cluster of abnormally
correlated anomalies
across log streams
3
Metrics
(optional)
Identify correlated
metrics anomalies 5 Automatic root
cause report
Optional
Feedback
6
4
Correlated Rare & Error Events
ML-driven Root Cause Analysis :
Examples of outages, crashes, failures, etc.
Bugs
Undefined variables, Undefined index,
mismatched data structures
Missing function arguments, input
validation failed
SQL syntax errors
Inter-Service Interactions
Auth & connection failures
Rate limit failures
API version mis-match errors
Container Orchestration
Missing containers, pods
Crashloopbackoffs
failed leader elections
Kafka, Redis, Databases
Kafka broker pod failure
SQL schema errors
Redis performance issues
Database connectivity issues
Security
Excessive auth failures
Invalid Root login attempts
WAF attacks
XSS vulnerabilities
Infrastructure
OOM process failures
Out of space, CPU saturation
Object storage failures
Network corruption, VM failovers
Note: No pre-built rules
Thank you!
hello@zebrium.com @ZebriumAI
Free trial: www.zebrium.com
“Zebrium caught a vendor API change which caused
problems downstream.” - Aran Khanna, CEO
“Zebrium helped us root cause obscure error
messages that didn’t seem to match our code."
– Yosef Deray, Lead Software Engineer
“Zebrium shrunk MTTR from 3 hours to <10 mins, and
caught a Redis performance issue that was silently
impacting users.” - Jason Johnson – SVP of InfoTech
Zebrium saves us hours per major incident, and in
aggregate, has saved us days.”
Mike Heyeck - Cloud Lead
“Zebrium identified root cause for every failure
induced using Litmus Chaos Engine.” - Murat
Karslioglu, VP of Product
Top 50 AI
Companies
7 Best Log
Tools for
Kubernetes
Zebrium named a
Gartner Cool Vendor

More Related Content

What's hot

DevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With AutomationDevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With Automation
Utkarsh Pandey
 
AWS Summit - Trends in Advanced Monitoring for AWS environments
AWS Summit - Trends in Advanced Monitoring for AWS environmentsAWS Summit - Trends in Advanced Monitoring for AWS environments
AWS Summit - Trends in Advanced Monitoring for AWS environments
Andreas Grabner
 
(R)evolutionize APM
(R)evolutionize APM(R)evolutionize APM
(R)evolutionize APM
Andreas Grabner
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons Learned
Jason Chan
 
Monitoring As Code: How to Integrate App Monitoring Into Your Developer Cycle
Monitoring As Code: How to Integrate App Monitoring Into Your Developer CycleMonitoring As Code: How to Integrate App Monitoring Into Your Developer Cycle
Monitoring As Code: How to Integrate App Monitoring Into Your Developer Cycle
Atlassian
 
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Andreas Grabner
 
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
CodeOps Technologies LLP
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsFour Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance Problems
Andreas Grabner
 
How Verizon Innovates Through AI-Driven DevOps with Dynatrace
How Verizon Innovates Through AI-Driven DevOps with DynatraceHow Verizon Innovates Through AI-Driven DevOps with Dynatrace
How Verizon Innovates Through AI-Driven DevOps with Dynatrace
Amazon Web Services
 
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
Vadym Kazulkin
 
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
Vadym Kazulkin
 
apidays LIVE Helsinki & North - Serverless Bots in a Blink by Rachel White, D...
apidays LIVE Helsinki & North - Serverless Bots in a Blink by Rachel White, D...apidays LIVE Helsinki & North - Serverless Bots in a Blink by Rachel White, D...
apidays LIVE Helsinki & North - Serverless Bots in a Blink by Rachel White, D...
apidays
 
Building Cloud-agnostic Serverless APIs
Building Cloud-agnostic Serverless APIsBuilding Cloud-agnostic Serverless APIs
Building Cloud-agnostic Serverless APIs
Postman
 
How to build a social network on serverless
How to build a social network on serverlessHow to build a social network on serverless
How to build a social network on serverless
Yan Cui
 
Troubleshooting serverless applications
Troubleshooting serverless applicationsTroubleshooting serverless applications
Troubleshooting serverless applications
Yan Cui
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Andreas Grabner
 
London WebPerf Meetup: End-To-End Performance Problems
London WebPerf Meetup: End-To-End Performance ProblemsLondon WebPerf Meetup: End-To-End Performance Problems
London WebPerf Meetup: End-To-End Performance Problems
Andreas Grabner
 
Lessons Learned from building a serverless API
Lessons Learned from building  a serverless APILessons Learned from building  a serverless API
Lessons Learned from building a serverless API
Pam Rucinque
 
DreamFactory Essentials Webinar
DreamFactory Essentials WebinarDreamFactory Essentials Webinar
DreamFactory Essentials Webinar
DreamFactory
 
HSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy ChecksHSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy Checks
Andreas Grabner
 

What's hot (20)

DevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With AutomationDevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With Automation
 
AWS Summit - Trends in Advanced Monitoring for AWS environments
AWS Summit - Trends in Advanced Monitoring for AWS environmentsAWS Summit - Trends in Advanced Monitoring for AWS environments
AWS Summit - Trends in Advanced Monitoring for AWS environments
 
(R)evolutionize APM
(R)evolutionize APM(R)evolutionize APM
(R)evolutionize APM
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons Learned
 
Monitoring As Code: How to Integrate App Monitoring Into Your Developer Cycle
Monitoring As Code: How to Integrate App Monitoring Into Your Developer CycleMonitoring As Code: How to Integrate App Monitoring Into Your Developer Cycle
Monitoring As Code: How to Integrate App Monitoring Into Your Developer Cycle
 
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
 
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsFour Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance Problems
 
How Verizon Innovates Through AI-Driven DevOps with Dynatrace
How Verizon Innovates Through AI-Driven DevOps with DynatraceHow Verizon Innovates Through AI-Driven DevOps with Dynatrace
How Verizon Innovates Through AI-Driven DevOps with Dynatrace
 
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
 
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
Revolutionize DevOps with ML capabilities. Introduction to Amazon CodeGuru an...
 
apidays LIVE Helsinki & North - Serverless Bots in a Blink by Rachel White, D...
apidays LIVE Helsinki & North - Serverless Bots in a Blink by Rachel White, D...apidays LIVE Helsinki & North - Serverless Bots in a Blink by Rachel White, D...
apidays LIVE Helsinki & North - Serverless Bots in a Blink by Rachel White, D...
 
Building Cloud-agnostic Serverless APIs
Building Cloud-agnostic Serverless APIsBuilding Cloud-agnostic Serverless APIs
Building Cloud-agnostic Serverless APIs
 
How to build a social network on serverless
How to build a social network on serverlessHow to build a social network on serverless
How to build a social network on serverless
 
Troubleshooting serverless applications
Troubleshooting serverless applicationsTroubleshooting serverless applications
Troubleshooting serverless applications
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
 
London WebPerf Meetup: End-To-End Performance Problems
London WebPerf Meetup: End-To-End Performance ProblemsLondon WebPerf Meetup: End-To-End Performance Problems
London WebPerf Meetup: End-To-End Performance Problems
 
Lessons Learned from building a serverless API
Lessons Learned from building  a serverless APILessons Learned from building  a serverless API
Lessons Learned from building a serverless API
 
DreamFactory Essentials Webinar
DreamFactory Essentials WebinarDreamFactory Essentials Webinar
DreamFactory Essentials Webinar
 
HSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy ChecksHSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy Checks
 

Similar to Using Machine Learning on K8s Logs to Find Root Cause Faster

AWS Observability Made Simple
AWS Observability Made SimpleAWS Observability Made Simple
AWS Observability Made Simple
Luciano Mammino
 
Meetup callback
Meetup callbackMeetup callback
Meetup callback
Wayne Scarano
 
Integrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsIntegrating Splunk into your Spring Applications
Integrating Splunk into your Spring Applications
Damien Dallimore
 
Abusing bleeding edge web standards for appsec glory
Abusing bleeding edge web standards for appsec gloryAbusing bleeding edge web standards for appsec glory
Abusing bleeding edge web standards for appsec glory
Priyanka Aash
 
How to address operational aspects effectively with Agile practices - Matthew...
How to address operational aspects effectively with Agile practices - Matthew...How to address operational aspects effectively with Agile practices - Matthew...
How to address operational aspects effectively with Agile practices - Matthew...
Skelton Thatcher Consulting Ltd
 
Resolving problems & high availability
Resolving problems & high availabilityResolving problems & high availability
Resolving problems & high availability
Zend by Rogue Wave Software
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
baoyin
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
Adrian Cockcroft
 
VodQA_Parallelizingcukes_AmanKing
VodQA_Parallelizingcukes_AmanKingVodQA_Parallelizingcukes_AmanKing
VodQA_Parallelizingcukes_AmanKing
vodQA
 
Reducing Build Time
Reducing Build TimeReducing Build Time
Reducing Build Time
Aman King
 
VodQA_ParallelizingCukes_AmanKing
VodQA_ParallelizingCukes_AmanKingVodQA_ParallelizingCukes_AmanKing
VodQA_ParallelizingCukes_AmanKing
poojaelkunchwar
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
Balvinder Hira
 
Serverless in production, an experience report (London js community)
Serverless in production, an experience report (London js community)Serverless in production, an experience report (London js community)
Serverless in production, an experience report (London js community)
Yan Cui
 
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Amazon Web Services
 
AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)
Yan Cui
 
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Yan Cui
 
Serverless in production, an experience report (NDC London, 31 Jan 2018)
Serverless in production, an experience report (NDC London, 31 Jan 2018)Serverless in production, an experience report (NDC London, 31 Jan 2018)
Serverless in production, an experience report (NDC London, 31 Jan 2018)
Domas Lasauskas
 
Miniature Guide to Operational Features - EdinDevOps - SkeltonThatcher
Miniature Guide to Operational Features - EdinDevOps - SkeltonThatcherMiniature Guide to Operational Features - EdinDevOps - SkeltonThatcher
Miniature Guide to Operational Features - EdinDevOps - SkeltonThatcher
Skelton Thatcher Consulting Ltd
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software
DevOps Chicago
 
Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)
Yan Cui
 

Similar to Using Machine Learning on K8s Logs to Find Root Cause Faster (20)

AWS Observability Made Simple
AWS Observability Made SimpleAWS Observability Made Simple
AWS Observability Made Simple
 
Meetup callback
Meetup callbackMeetup callback
Meetup callback
 
Integrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsIntegrating Splunk into your Spring Applications
Integrating Splunk into your Spring Applications
 
Abusing bleeding edge web standards for appsec glory
Abusing bleeding edge web standards for appsec gloryAbusing bleeding edge web standards for appsec glory
Abusing bleeding edge web standards for appsec glory
 
How to address operational aspects effectively with Agile practices - Matthew...
How to address operational aspects effectively with Agile practices - Matthew...How to address operational aspects effectively with Agile practices - Matthew...
How to address operational aspects effectively with Agile practices - Matthew...
 
Resolving problems & high availability
Resolving problems & high availabilityResolving problems & high availability
Resolving problems & high availability
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
VodQA_Parallelizingcukes_AmanKing
VodQA_Parallelizingcukes_AmanKingVodQA_Parallelizingcukes_AmanKing
VodQA_Parallelizingcukes_AmanKing
 
Reducing Build Time
Reducing Build TimeReducing Build Time
Reducing Build Time
 
VodQA_ParallelizingCukes_AmanKing
VodQA_ParallelizingCukes_AmanKingVodQA_ParallelizingCukes_AmanKing
VodQA_ParallelizingCukes_AmanKing
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
 
Serverless in production, an experience report (London js community)
Serverless in production, an experience report (London js community)Serverless in production, an experience report (London js community)
Serverless in production, an experience report (London js community)
 
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
 
AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)
 
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
 
Serverless in production, an experience report (NDC London, 31 Jan 2018)
Serverless in production, an experience report (NDC London, 31 Jan 2018)Serverless in production, an experience report (NDC London, 31 Jan 2018)
Serverless in production, an experience report (NDC London, 31 Jan 2018)
 
Miniature Guide to Operational Features - EdinDevOps - SkeltonThatcher
Miniature Guide to Operational Features - EdinDevOps - SkeltonThatcherMiniature Guide to Operational Features - EdinDevOps - SkeltonThatcher
Miniature Guide to Operational Features - EdinDevOps - SkeltonThatcher
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software
 
Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)
 

More from LibbySchulze

Running distributed tests with k6.pdf
Running distributed tests with k6.pdfRunning distributed tests with k6.pdf
Running distributed tests with k6.pdf
LibbySchulze
 
Extending Kubectl.pptx
Extending Kubectl.pptxExtending Kubectl.pptx
Extending Kubectl.pptx
LibbySchulze
 
Enhancing Data Protection Workflows with Kanister And Argo Workflows
Enhancing Data Protection Workflows with Kanister And Argo WorkflowsEnhancing Data Protection Workflows with Kanister And Argo Workflows
Enhancing Data Protection Workflows with Kanister And Argo Workflows
LibbySchulze
 
Fallacies in Platform Engineering.pdf
Fallacies in Platform Engineering.pdfFallacies in Platform Engineering.pdf
Fallacies in Platform Engineering.pdf
LibbySchulze
 
Intro to Fluvio.pptx.pdf
Intro to Fluvio.pptx.pdfIntro to Fluvio.pptx.pdf
Intro to Fluvio.pptx.pdf
LibbySchulze
 
Enhance your Kafka Infrastructure with Fluvio.pptx
Enhance your Kafka Infrastructure with Fluvio.pptxEnhance your Kafka Infrastructure with Fluvio.pptx
Enhance your Kafka Infrastructure with Fluvio.pptx
LibbySchulze
 
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdfCNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
LibbySchulze
 
Oh The Places You'll Sign.pdf
Oh The Places You'll Sign.pdfOh The Places You'll Sign.pdf
Oh The Places You'll Sign.pdf
LibbySchulze
 
Rancher MasterClass - Avoiding-configuration-drift.pptx
Rancher  MasterClass - Avoiding-configuration-drift.pptxRancher  MasterClass - Avoiding-configuration-drift.pptx
Rancher MasterClass - Avoiding-configuration-drift.pptx
LibbySchulze
 
vFunction Konveyor Meetup - Why App Modernization Projects Fail - Aug 2022.pptx
vFunction Konveyor Meetup - Why App Modernization Projects Fail - Aug 2022.pptxvFunction Konveyor Meetup - Why App Modernization Projects Fail - Aug 2022.pptx
vFunction Konveyor Meetup - Why App Modernization Projects Fail - Aug 2022.pptx
LibbySchulze
 
CNCF Live Webinar: Low Footprint Java Containers with GraalVM
CNCF Live Webinar: Low Footprint Java Containers with GraalVMCNCF Live Webinar: Low Footprint Java Containers with GraalVM
CNCF Live Webinar: Low Footprint Java Containers with GraalVM
LibbySchulze
 
EnRoute-OPA-Integration.pdf
EnRoute-OPA-Integration.pdfEnRoute-OPA-Integration.pdf
EnRoute-OPA-Integration.pdf
LibbySchulze
 
AirGap_zusammen_neu.pdf
AirGap_zusammen_neu.pdfAirGap_zusammen_neu.pdf
AirGap_zusammen_neu.pdf
LibbySchulze
 
Copy of OTel Me All About OpenTelemetry The Current & Future State, Navigatin...
Copy of OTel Me All About OpenTelemetry The Current & Future State, Navigatin...Copy of OTel Me All About OpenTelemetry The Current & Future State, Navigatin...
Copy of OTel Me All About OpenTelemetry The Current & Future State, Navigatin...
LibbySchulze
 
OTel Me All About OpenTelemetry The Current & Future State, Navigating the Pr...
OTel Me All About OpenTelemetry The Current & Future State, Navigating the Pr...OTel Me All About OpenTelemetry The Current & Future State, Navigating the Pr...
OTel Me All About OpenTelemetry The Current & Future State, Navigating the Pr...
LibbySchulze
 
CNCF_ A step to step guide to platforming your delivery setup.pdf
CNCF_ A step to step guide to platforming your delivery setup.pdfCNCF_ A step to step guide to platforming your delivery setup.pdf
CNCF_ A step to step guide to platforming your delivery setup.pdf
LibbySchulze
 
CNCF Online - Data Protection Guardrails using Open Policy Agent (OPA).pdf
CNCF Online - Data Protection Guardrails using Open Policy Agent (OPA).pdfCNCF Online - Data Protection Guardrails using Open Policy Agent (OPA).pdf
CNCF Online - Data Protection Guardrails using Open Policy Agent (OPA).pdf
LibbySchulze
 
Securing Windows workloads.pdf
Securing Windows workloads.pdfSecuring Windows workloads.pdf
Securing Windows workloads.pdf
LibbySchulze
 
Securing Windows workloads.pdf
Securing Windows workloads.pdfSecuring Windows workloads.pdf
Securing Windows workloads.pdf
LibbySchulze
 
Advancements in Kubernetes Workload Identity for Azure
Advancements in Kubernetes Workload Identity for AzureAdvancements in Kubernetes Workload Identity for Azure
Advancements in Kubernetes Workload Identity for Azure
LibbySchulze
 

More from LibbySchulze (20)

Running distributed tests with k6.pdf
Running distributed tests with k6.pdfRunning distributed tests with k6.pdf
Running distributed tests with k6.pdf
 
Extending Kubectl.pptx
Extending Kubectl.pptxExtending Kubectl.pptx
Extending Kubectl.pptx
 
Enhancing Data Protection Workflows with Kanister And Argo Workflows
Enhancing Data Protection Workflows with Kanister And Argo WorkflowsEnhancing Data Protection Workflows with Kanister And Argo Workflows
Enhancing Data Protection Workflows with Kanister And Argo Workflows
 
Fallacies in Platform Engineering.pdf
Fallacies in Platform Engineering.pdfFallacies in Platform Engineering.pdf
Fallacies in Platform Engineering.pdf
 
Intro to Fluvio.pptx.pdf
Intro to Fluvio.pptx.pdfIntro to Fluvio.pptx.pdf
Intro to Fluvio.pptx.pdf
 
Enhance your Kafka Infrastructure with Fluvio.pptx
Enhance your Kafka Infrastructure with Fluvio.pptxEnhance your Kafka Infrastructure with Fluvio.pptx
Enhance your Kafka Infrastructure with Fluvio.pptx
 
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdfCNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
 
Oh The Places You'll Sign.pdf
Oh The Places You'll Sign.pdfOh The Places You'll Sign.pdf
Oh The Places You'll Sign.pdf
 
Rancher MasterClass - Avoiding-configuration-drift.pptx
Rancher  MasterClass - Avoiding-configuration-drift.pptxRancher  MasterClass - Avoiding-configuration-drift.pptx
Rancher MasterClass - Avoiding-configuration-drift.pptx
 
vFunction Konveyor Meetup - Why App Modernization Projects Fail - Aug 2022.pptx
vFunction Konveyor Meetup - Why App Modernization Projects Fail - Aug 2022.pptxvFunction Konveyor Meetup - Why App Modernization Projects Fail - Aug 2022.pptx
vFunction Konveyor Meetup - Why App Modernization Projects Fail - Aug 2022.pptx
 
CNCF Live Webinar: Low Footprint Java Containers with GraalVM
CNCF Live Webinar: Low Footprint Java Containers with GraalVMCNCF Live Webinar: Low Footprint Java Containers with GraalVM
CNCF Live Webinar: Low Footprint Java Containers with GraalVM
 
EnRoute-OPA-Integration.pdf
EnRoute-OPA-Integration.pdfEnRoute-OPA-Integration.pdf
EnRoute-OPA-Integration.pdf
 
AirGap_zusammen_neu.pdf
AirGap_zusammen_neu.pdfAirGap_zusammen_neu.pdf
AirGap_zusammen_neu.pdf
 
Copy of OTel Me All About OpenTelemetry The Current & Future State, Navigatin...
Copy of OTel Me All About OpenTelemetry The Current & Future State, Navigatin...Copy of OTel Me All About OpenTelemetry The Current & Future State, Navigatin...
Copy of OTel Me All About OpenTelemetry The Current & Future State, Navigatin...
 
OTel Me All About OpenTelemetry The Current & Future State, Navigating the Pr...
OTel Me All About OpenTelemetry The Current & Future State, Navigating the Pr...OTel Me All About OpenTelemetry The Current & Future State, Navigating the Pr...
OTel Me All About OpenTelemetry The Current & Future State, Navigating the Pr...
 
CNCF_ A step to step guide to platforming your delivery setup.pdf
CNCF_ A step to step guide to platforming your delivery setup.pdfCNCF_ A step to step guide to platforming your delivery setup.pdf
CNCF_ A step to step guide to platforming your delivery setup.pdf
 
CNCF Online - Data Protection Guardrails using Open Policy Agent (OPA).pdf
CNCF Online - Data Protection Guardrails using Open Policy Agent (OPA).pdfCNCF Online - Data Protection Guardrails using Open Policy Agent (OPA).pdf
CNCF Online - Data Protection Guardrails using Open Policy Agent (OPA).pdf
 
Securing Windows workloads.pdf
Securing Windows workloads.pdfSecuring Windows workloads.pdf
Securing Windows workloads.pdf
 
Securing Windows workloads.pdf
Securing Windows workloads.pdfSecuring Windows workloads.pdf
Securing Windows workloads.pdf
 
Advancements in Kubernetes Workload Identity for Azure
Advancements in Kubernetes Workload Identity for AzureAdvancements in Kubernetes Workload Identity for Azure
Advancements in Kubernetes Workload Identity for Azure
 

Recently uploaded

一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
taqyea
 
Massey University degree offer diploma Transcript
Massey University degree offer diploma TranscriptMassey University degree offer diploma Transcript
Massey University degree offer diploma Transcript
ubufe
 
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
taqyea
 
University of Otago degree offer diploma Transcript
University of Otago degree offer diploma TranscriptUniversity of Otago degree offer diploma Transcript
University of Otago degree offer diploma Transcript
ubufe
 
一比一原版(oregon毕业证书)俄勒冈大学毕业证如何办理
一比一原版(oregon毕业证书)俄勒冈大学毕业证如何办理一比一原版(oregon毕业证书)俄勒冈大学毕业证如何办理
一比一原版(oregon毕业证书)俄勒冈大学毕业证如何办理
taqyea
 
一比一原版澳洲巴拉特大学毕业证(utas毕业证书)如何办理
一比一原版澳洲巴拉特大学毕业证(utas毕业证书)如何办理一比一原版澳洲巴拉特大学毕业证(utas毕业证书)如何办理
一比一原版澳洲巴拉特大学毕业证(utas毕业证书)如何办理
taqyea
 
一比一原版(mqu毕业证)麦考瑞大学毕业证如何办理
一比一原版(mqu毕业证)麦考瑞大学毕业证如何办理一比一原版(mqu毕业证)麦考瑞大学毕业证如何办理
一比一原版(mqu毕业证)麦考瑞大学毕业证如何办理
taqyea
 
10th International Conference on Networks, Mobile Communications and Telema...
10th International Conference on Networks, Mobile Communications and   Telema...10th International Conference on Networks, Mobile Communications and   Telema...
10th International Conference on Networks, Mobile Communications and Telema...
ijp2p
 
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
ffg01100
 
very nice project on internet class 10.pptx
very nice project on internet class 10.pptxvery nice project on internet class 10.pptx
very nice project on internet class 10.pptx
bazukagaming6
 
Future Trends What's Next for UI UX Design on Websites
Future Trends What's Next for UI UX Design on WebsitesFuture Trends What's Next for UI UX Design on Websites
Future Trends What's Next for UI UX Design on Websites
Serva AppLabs
 
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
taqyea
 
一比一原版(bu毕业证书)英国伯恩茅斯大学毕业证如何办理
一比一原版(bu毕业证书)英国伯恩茅斯大学毕业证如何办理一比一原版(bu毕业证书)英国伯恩茅斯大学毕业证如何办理
一比一原版(bu毕业证书)英国伯恩茅斯大学毕业证如何办理
taqyea
 
2023. Archive - Gigabajtos selfpublisher homepage
2023. Archive - Gigabajtos selfpublisher homepage2023. Archive - Gigabajtos selfpublisher homepage
2023. Archive - Gigabajtos selfpublisher homepage
Zsolt Nemeth
 
About Alibaba company and brief general information regarding how to trade on...
About Alibaba company and brief general information regarding how to trade on...About Alibaba company and brief general information regarding how to trade on...
About Alibaba company and brief general information regarding how to trade on...
Erkinjon Erkinov
 
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
taqyea
 
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
taqyea
 
一比一原版(heriotwatt毕业证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt毕业证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt毕业证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt毕业证书)英国赫瑞瓦特大学毕业证如何办理
taqyea
 
cyber-security-training-presentation-q320.ppt
cyber-security-training-presentation-q320.pptcyber-security-training-presentation-q320.ppt
cyber-security-training-presentation-q320.ppt
LiamOConnor52
 
一比一原版(hull毕业证书)英国赫尔大学毕业证如何办理
一比一原版(hull毕业证书)英国赫尔大学毕业证如何办理一比一原版(hull毕业证书)英国赫尔大学毕业证如何办理
一比一原版(hull毕业证书)英国赫尔大学毕业证如何办理
taqyea
 

Recently uploaded (20)

一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
 
Massey University degree offer diploma Transcript
Massey University degree offer diploma TranscriptMassey University degree offer diploma Transcript
Massey University degree offer diploma Transcript
 
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
 
University of Otago degree offer diploma Transcript
University of Otago degree offer diploma TranscriptUniversity of Otago degree offer diploma Transcript
University of Otago degree offer diploma Transcript
 
一比一原版(oregon毕业证书)俄勒冈大学毕业证如何办理
一比一原版(oregon毕业证书)俄勒冈大学毕业证如何办理一比一原版(oregon毕业证书)俄勒冈大学毕业证如何办理
一比一原版(oregon毕业证书)俄勒冈大学毕业证如何办理
 
一比一原版澳洲巴拉特大学毕业证(utas毕业证书)如何办理
一比一原版澳洲巴拉特大学毕业证(utas毕业证书)如何办理一比一原版澳洲巴拉特大学毕业证(utas毕业证书)如何办理
一比一原版澳洲巴拉特大学毕业证(utas毕业证书)如何办理
 
一比一原版(mqu毕业证)麦考瑞大学毕业证如何办理
一比一原版(mqu毕业证)麦考瑞大学毕业证如何办理一比一原版(mqu毕业证)麦考瑞大学毕业证如何办理
一比一原版(mqu毕业证)麦考瑞大学毕业证如何办理
 
10th International Conference on Networks, Mobile Communications and Telema...
10th International Conference on Networks, Mobile Communications and   Telema...10th International Conference on Networks, Mobile Communications and   Telema...
10th International Conference on Networks, Mobile Communications and Telema...
 
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
 
very nice project on internet class 10.pptx
very nice project on internet class 10.pptxvery nice project on internet class 10.pptx
very nice project on internet class 10.pptx
 
Future Trends What's Next for UI UX Design on Websites
Future Trends What's Next for UI UX Design on WebsitesFuture Trends What's Next for UI UX Design on Websites
Future Trends What's Next for UI UX Design on Websites
 
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
 
一比一原版(bu毕业证书)英国伯恩茅斯大学毕业证如何办理
一比一原版(bu毕业证书)英国伯恩茅斯大学毕业证如何办理一比一原版(bu毕业证书)英国伯恩茅斯大学毕业证如何办理
一比一原版(bu毕业证书)英国伯恩茅斯大学毕业证如何办理
 
2023. Archive - Gigabajtos selfpublisher homepage
2023. Archive - Gigabajtos selfpublisher homepage2023. Archive - Gigabajtos selfpublisher homepage
2023. Archive - Gigabajtos selfpublisher homepage
 
About Alibaba company and brief general information regarding how to trade on...
About Alibaba company and brief general information regarding how to trade on...About Alibaba company and brief general information regarding how to trade on...
About Alibaba company and brief general information regarding how to trade on...
 
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
 
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
 
一比一原版(heriotwatt毕业证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt毕业证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt毕业证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt毕业证书)英国赫瑞瓦特大学毕业证如何办理
 
cyber-security-training-presentation-q320.ppt
cyber-security-training-presentation-q320.pptcyber-security-training-presentation-q320.ppt
cyber-security-training-presentation-q320.ppt
 
一比一原版(hull毕业证书)英国赫尔大学毕业证如何办理
一比一原版(hull毕业证书)英国赫尔大学毕业证如何办理一比一原版(hull毕业证书)英国赫尔大学毕业证如何办理
一比一原版(hull毕业证书)英国赫尔大学毕业证如何办理
 

Using Machine Learning on K8s Logs to Find Root Cause Faster

  • 1. Z E B R I U M Webinar: Using Machine Learning on K8s Logs to Find Root Cause Faster Zebrium Proprietary & Confidential 7 Best Log Management Tools for Kubernetes LARRY LANCASTER Founder/CTO Zebrium ARAN KHANNA Co-Founder/CEO Reserved.ai GAVIN COHEN VP Product Zebrium
  • 2. Your app is broken, now what?
  • 3. Your app is broken, now what? Demo Part 1 Microservices demo app Deployed on a K8s cluster using minikube
  • 4. Detected: incident opened PagerDuty Incident #1652: HTTP Status: 500 Internal Server Error Priority: P1 Description: Service not responding Monitoring dashboard Incident management tool
  • 5. Time to hunt through your logs… ▪ Look for “bad” (errors, critical, alert, etc.) & rare events ▪ Focus on clusters of these, especially across services/hosts ▪ Leverage intuition and experience to keep exploring…
  • 6. Imagine Automating the RCA Process? Send your logs, zero training… Identifies root cause without hunting
  • 7. Your app is broken, now what? Demo Part 2 What the machine learning found when analyzing the logs from the broken app
  • 8. Welcome! Aran Khanna Co-Founder/CEO @ Reserved.ai @arankhanna
  • 9. About Reserved.ai Reserved.AI plans, forecasts, and manages your cloud resources for you ▪ Provides automated attribution, alerting and scenario modeling for complex multi-cloud environment ▪ Enables automated commitment management, cashflow forecasting and tax optimization of cloud resources ▪ Guarantees buyback of over-committed resources ▪ Requires only a simple 5-minute IAM role installation
  • 10. The problem – Before Zebrium ▪ Cloud native stack in AWS using Kubernetes ▪ Flask, Celery, Redis, PostgreSQL, Airflow, React and more ▪ Each instance of each component generates lots of log events ▪ Prior to Zebrium, manual log searching when there was a problem ▪ vi, grep, awk, custom scripts ▪ Lots of time wasted by critical engineers to find root cause ▪ Potential for problems to go unnoticed
  • 11. I was skeptical! ARE YOU TELLING ME AI CAN SHOW ROOT CAUSE?
  • 12. Very early success ▪ Soon after trial started ▪ Unexpected AWS API change occurred ▪ Would have caused a service disruption had it not been caught ▪ Zebrium picked up the problem and correlated it with events from all the services that were impacted. ▪ Root cause report showed us what we needed ▪ We resolved problem quickly
  • 13. Root cause reports are amazing
  • 14. GPT-3 is amazing ▪ Some recent examples
  • 15. Using Zebrium to Find Root Cause Automatic root cause analysis ! Monitoring, APM, Logging, Help Desk or any other tool ML Root Cause Report 2 3 1 With incident management tool (PagerDuty, Opsgenie, Slack, etc.) When something breaks: Scan root cause summaries around time of interest (existing or on-demand) Without incident management tool
  • 16. How the ML works ML-driven structuring, event categorization & pattern learning 1 2 Anomaly scoring for each incoming event Raw log streams Cluster of abnormally correlated anomalies across log streams 3 Metrics (optional) Identify correlated metrics anomalies 5 Automatic root cause report Optional Feedback 6 4 Correlated Rare & Error Events
  • 17. ML-driven Root Cause Analysis : Examples of outages, crashes, failures, etc. Bugs Undefined variables, Undefined index, mismatched data structures Missing function arguments, input validation failed SQL syntax errors Inter-Service Interactions Auth & connection failures Rate limit failures API version mis-match errors Container Orchestration Missing containers, pods Crashloopbackoffs failed leader elections Kafka, Redis, Databases Kafka broker pod failure SQL schema errors Redis performance issues Database connectivity issues Security Excessive auth failures Invalid Root login attempts WAF attacks XSS vulnerabilities Infrastructure OOM process failures Out of space, CPU saturation Object storage failures Network corruption, VM failovers Note: No pre-built rules
  • 18. Thank you! hello@zebrium.com @ZebriumAI Free trial: www.zebrium.com “Zebrium caught a vendor API change which caused problems downstream.” - Aran Khanna, CEO “Zebrium helped us root cause obscure error messages that didn’t seem to match our code." – Yosef Deray, Lead Software Engineer “Zebrium shrunk MTTR from 3 hours to <10 mins, and caught a Redis performance issue that was silently impacting users.” - Jason Johnson – SVP of InfoTech Zebrium saves us hours per major incident, and in aggregate, has saved us days.” Mike Heyeck - Cloud Lead “Zebrium identified root cause for every failure induced using Litmus Chaos Engine.” - Murat Karslioglu, VP of Product Top 50 AI Companies 7 Best Log Tools for Kubernetes Zebrium named a Gartner Cool Vendor