SlideShare a Scribd company logo
Lessons from Cloud
Scaling Prometheus
metrics in
Kubernetes with
Telegraf
The curious case of the missing metrics
One Label too far...
© 2019 InfluxData. All rights reserved. 3
The Suspects
● Prometheus
● Kubernetes
● Gateway
● Queryd
© 2019 InfluxData. All rights reserved. 4
Prometheus
http://gateway.twodotoh.svc.cluster.local:9999/metrics
© 2019 InfluxData. All rights reserved. 5
Prometheus
http://gateway.twodotoh.svc.cluster.local:9999/metrics
global:
scrape_interval: 15s
scrape_configs:
- job_name: prod_twodotoh
kubernetes_sd_configs:
- role: service
© 2019 InfluxData. All rights reserved. 6
Kubernetes
© 2019 InfluxData. All rights reserved. 7
InfluxCloud
Gateway Gateway
Queryd
Gateway
Queryd Queryd
Ingress
© 2019 InfluxData. All rights reserved. 8
Problem: Prometheus Debugging is Hard
prometheus_target_sync_length_seconds{scrape_job="prod_twodotoh",quantile="0.01"} 0.012562015
prometheus_target_sync_length_seconds{scrape_job="prod_twodotoh",quantile="0.05"} 0.012562015
prometheus_target_sync_length_seconds{scrape_job="prod_twodotoh",quantile="0.5"} 0.012562015
prometheus_target_sync_length_seconds{scrape_job="prod_twodotoh",quantile="0.9"} 0.012562015
prometheus_target_sync_length_seconds{scrape_job="prod_twodotoh",quantile="0.99"} 0.012562015
prometheus_target_sync_length_seconds_sum{scrape_job="prod_twodotoh"} 0.012562015
prometheus_target_sync_length_seconds_count{scrape_job="prod_twodotoh"} 1
© 2019 InfluxData. All rights reserved. 9
Problem: Prometheus Scaling is Hard
global:
scrape_interval: 15s
scrape_configs:
- job_name: prod_twodotoh_ns_a
kubernetes_sd_configs:
- role: service
namespaces:
names:
- a
global:
scrape_interval: 15s
scrape_configs:
- job_name: prod_twodotoh_ns_a
kubernetes_sd_configs:
- role: service
namespaces:
names:
- b
© 2019 InfluxData. All rights reserved. 10
Solution: Isolatation with Telegraf Sidecar
© 2019 InfluxData. All rights reserved. 11
Solution: Isolation with Telegraf Sidecar
apiVersion: apps/v1
kind: Deployment
metadata:
name: "gateway"
labels:
spec:
serviceName: "gateway"
replicas: 100
template:
metadata:
name: "gateway"
labels:
app: "gateway"
spec:
containers:
- name: "telegraf"
image: "docker.io/library/telegraf:1.12"
- name: "gateway"
image: "quay.io/influxdb/gateway:latest"
[[inputs.internal]]
[[inputs.prometheus]]
urls = ["http://127.0.0.1:9999/metrics"]
[[outputs.influxdb]]
urls = ["$MONITOR_HOST"]
database = "$MONITOR_DATABASE"
timeout = "5s"
[[outputs.influxdb_v2]]
urls=["http://us-west-2-1.aws.cloud2.influxdata.c
token = "$TOKEN"
organization = "$ORG"
bucket = "$BUCKET"
timeout = "5s"
namepass = ["internal"]
© 2019 InfluxData. All rights reserved. 12
Solution: Isolatation with Telegraf Sidecar
© 2019 InfluxData. All rights reserved. 13
Problem: Prom has 1 and only 1 value
http://gateway.twodotoh.svc.cluster.local:9999/metrics
global:
scrape_interval: 15s
scrape_configs:
- job_name: prod_twodotoh
kubernetes_sd_configs:
- role: service
metric_relabel_configs:
- regex: user_agent
action: labeldrop
© 2019 InfluxData. All rights reserved. 14
Solution: Influx for more context
http://gateway.twodotoh.svc.cluster.local:9999/metrics
[[inputs.internal]]
[[inputs.prometheus]]
urls = ["http://127.0.0.1:9999/metrics"]
[[processors.converter]]
[processors.converter.tags]
string = ["user_agent"]
[[outputs.influxdb]]
urls = ["$MONITOR_HOST"]
database = "$MONITOR_DATABASE"
timeout = "5s"
[[outputs.influxdb_v2]]
urls=["http://us-west-2-1.aws.cloud2.influxdata.com"]
token = "$TOKEN"
organization = "$ORG"
bucket = "$BUCKET"
timeout = "5s"
namepass = ["internal"]
© 2019 InfluxData. All rights reserved. 15
Problem: Is there a way to prevent?
http://gateway.twodotoh.svc.cluster.local:9999/metrics
global:
scrape_interval: 15s
scrape_configs:
- job_name: prod_twodotoh
kubernetes_sd_configs:
- role: service
metric_relabel_configs:
- regex: user_agent
action: labeldrop
© 2019 InfluxData. All rights reserved. 16
Solution: Telegraf Guard Rails
http://gateway.twodotoh.svc.cluster.local:9999/metrics
[[inputs.internal]]
[[inputs.prometheus]]
urls = ["http://127.0.0.1:9999/metrics"]
[[processors.tag_limit]]
limit = 4
## List of tags to preferentially preserve
keep = ["handler", "method", "status"]
[[outputs.influxdb]]
urls = ["$MONITOR_HOST"]
database = "$MONITOR_DATABASE"
timeout = "5s"
[[outputs.influxdb_v2]]
urls=["http://us-west-2-1.aws.cloud2.influxdata.com"]
token = "$TOKEN"
organization = "$ORG"
bucket = "$BUCKET"
timeout = "5s"
namepass = ["internal"]
© 2019 InfluxData. All rights reserved. 17
Problem: Hard to Rotate Prom Passwords
http://gateway.twodotoh.svc.cluster.local:9999/metrics
global:
scrape_interval: 15s
scrape_configs:
- job_name: prod_twodotoh
kubernetes_sd_configs:
- role: service
bearer_token_file: /etc/hunter2
© 2019 InfluxData. All rights reserved. 18
Solution: Per Pod Credentials
http://gateway.twodotoh.svc.cluster.local:9999/metrics
[[inputs.internal]]
[[inputs.prometheus]]
urls = ["http://127.0.0.1:9999/metrics"]
bearer_token = "/etc/telegraf/hunter2"
© 2019 InfluxData. All rights reserved. 19
Lessons
Scaling is NOT More Manual Processes
Scaling is NOT saying “You’re Doing it Wrong”
Scaling IS Empowering Developers
Scaling IS Predictability of Failure Modes
The time when we were
Watching the watchers...
© 2019 InfluxData. All rights reserved. 21
Problem: Am I scraping all the pods?
http://gateway.twodotoh.svc.cluster.local:9999/metrics
global:
scrape_interval: 15s
scrape_configs:
- job_name: prod_twodotoh
kubernetes_sd_configs:
- role: service
© 2019 InfluxData. All rights reserved. 22
Solution: Telegraf K8s Inventory
[[inputs.internal]]
[[inputs.kube_inventory]]
url = "http://1.1.1.1:10255"
[[outputs.influxdb]]
urls = ["$MONITOR_HOST"]
database = "$MONITOR_DATABASE"
timeout = "5s"
[[outputs.influxdb_v2]]
urls=["http://us-west-2-1.aws.cloud2.influxdata.com"]
token = "$TOKEN"
organization = "$ORG"
bucket = "$BUCKET"
timeout = "5s"
namepass = ["internal"]
Prometheus Scraping Designs
© 2019 InfluxData. All rights reserved. 24
Scaling even more
© 2019 InfluxData. All rights reserved. 25
Scaling even more with Influx Enterprise
Load
Balancer
© 2019 InfluxData. All rights reserved. 26
Scaling even more with Kafka and Influx
Enterprise
Kafka
© 2019 InfluxData. All rights reserved. 27
Core Idea
● Measure and test metrics scaling
○ Are you missing metrics?
● Decentralize metrics gathering
○ Consider metrics as part of the program
● Empower Developers
○ They know their metrics the best. Allow them local tooling control
© 2019 InfluxData. All rights reserved. 28
First Order Conclusion
● Too easy to shoot yourself in the foot with prometheus metrics.
● Too much in prometheus needs operation heroes.
● Too difficult to express vital information in prometheus about your
program without a ton of centralized control.
● One mistake can impact everyone.
© 2019 InfluxData. All rights reserved. 29
Second Order Conclusion
● Prometheus is not descriptive enough.
● Extremely difficult to change over time.
● The metrics game is not a solved problem.
○ Opentelemetry?
○ SNMP?
● Probably not one answer to everything.
© 2019 InfluxData. All rights reserved. 30
Future
● Flux into Telegraf
○ Processor for transformation
○ Moving the program near the data
○ Flux Output
○ Monitoring and alerting at edge
● Telegraf Flux scripts hosted in InfluxDB API
○ Runtime plugins without re-compiling
○ Sampling rules from server-side
■ Aggregation on server with input to client
● What else?
© 2019 InfluxData. All rights reserved. 31
Thank You!
The time when collecting metrics impacted storage...
Measure, measure, measure
© 2019 InfluxData. All rights reserved. 33
Problem: Prometheus metrics are heavy
weight

More Related Content

What's hot

Kubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesKubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best Practices
Ajeet Singh Raina
 
GitOps with ArgoCD
GitOps with ArgoCDGitOps with ArgoCD
GitOps with ArgoCD
CloudOps2005
 
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹
InfraEngineer
 
Helm - Application deployment management for Kubernetes
Helm - Application deployment management for KubernetesHelm - Application deployment management for Kubernetes
Helm - Application deployment management for Kubernetes
Alexei Ledenev
 
Intro to Knative
Intro to KnativeIntro to Knative
Intro to Knative
Christian Posta
 
GitHub Actions in action
GitHub Actions in actionGitHub Actions in action
GitHub Actions in action
Oleksii Holub
 
GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)
GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)
GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)
Phil Wilkins
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Henning Jacobs
 
Continuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event KeynoteContinuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event Keynote
Weaveworks
 
An overview of the Kubernetes architecture
An overview of the Kubernetes architectureAn overview of the Kubernetes architecture
An overview of the Kubernetes architecture
Igor Sfiligoi
 
Istio service mesh introduction
Istio service mesh introductionIstio service mesh introduction
Istio service mesh introduction
Kyohei Mizumoto
 
Modern vSphere Monitoring and Dashboard using InfluxDB, Telegraf and Grafana
Modern vSphere Monitoring and Dashboard using InfluxDB, Telegraf and GrafanaModern vSphere Monitoring and Dashboard using InfluxDB, Telegraf and Grafana
Modern vSphere Monitoring and Dashboard using InfluxDB, Telegraf and Grafana
InfluxData
 
Kubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystemKubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystem
Sreenivas Makam
 
Gitops: the kubernetes way
Gitops: the kubernetes wayGitops: the kubernetes way
Gitops: the kubernetes way
sparkfabrik
 
GitOps is IaC done right
GitOps is IaC done rightGitOps is IaC done right
GitOps is IaC done right
Chen Cheng-Wei
 
Argocd up and running
Argocd up and runningArgocd up and running
Argocd up and running
Raphaël PINSON
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
Antonin Stoklasek
 
Final terraform
Final terraformFinal terraform
Final terraform
Gourav Varma
 
Introduction to Docker Compose
Introduction to Docker ComposeIntroduction to Docker Compose
Introduction to Docker Compose
Ajeet Singh Raina
 
Gitops Hands On
Gitops Hands OnGitops Hands On
Gitops Hands On
Brice Fernandes
 

What's hot (20)

Kubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesKubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best Practices
 
GitOps with ArgoCD
GitOps with ArgoCDGitOps with ArgoCD
GitOps with ArgoCD
 
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹
 
Helm - Application deployment management for Kubernetes
Helm - Application deployment management for KubernetesHelm - Application deployment management for Kubernetes
Helm - Application deployment management for Kubernetes
 
Intro to Knative
Intro to KnativeIntro to Knative
Intro to Knative
 
GitHub Actions in action
GitHub Actions in actionGitHub Actions in action
GitHub Actions in action
 
GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)
GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)
GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Continuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event KeynoteContinuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event Keynote
 
An overview of the Kubernetes architecture
An overview of the Kubernetes architectureAn overview of the Kubernetes architecture
An overview of the Kubernetes architecture
 
Istio service mesh introduction
Istio service mesh introductionIstio service mesh introduction
Istio service mesh introduction
 
Modern vSphere Monitoring and Dashboard using InfluxDB, Telegraf and Grafana
Modern vSphere Monitoring and Dashboard using InfluxDB, Telegraf and GrafanaModern vSphere Monitoring and Dashboard using InfluxDB, Telegraf and Grafana
Modern vSphere Monitoring and Dashboard using InfluxDB, Telegraf and Grafana
 
Kubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystemKubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystem
 
Gitops: the kubernetes way
Gitops: the kubernetes wayGitops: the kubernetes way
Gitops: the kubernetes way
 
GitOps is IaC done right
GitOps is IaC done rightGitOps is IaC done right
GitOps is IaC done right
 
Argocd up and running
Argocd up and runningArgocd up and running
Argocd up and running
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Final terraform
Final terraformFinal terraform
Final terraform
 
Introduction to Docker Compose
Introduction to Docker ComposeIntroduction to Docker Compose
Introduction to Docker Compose
 
Gitops Hands On
Gitops Hands OnGitops Hands On
Gitops Hands On
 

Similar to Scaling Prometheus Metrics in Kubernetes with Telegraf | Chris Goller | InfluxData

The rise of microservices
The rise of microservicesThe rise of microservices
The rise of microservices
Cloud Technology Experts
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Weaveworks
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Sonja Schweigert
 
Zoo keeper in the wild
Zoo keeper in the wildZoo keeper in the wild
Zoo keeper in the wild
datamantra
 
A hitchhiker‘s guide to the cloud native stack
A hitchhiker‘s guide to the cloud native stackA hitchhiker‘s guide to the cloud native stack
A hitchhiker‘s guide to the cloud native stack
QAware GmbH
 
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
Mario-Leander Reimer
 
InfluxDB Live Product Training
InfluxDB Live Product TrainingInfluxDB Live Product Training
InfluxDB Live Product Training
InfluxData
 
PRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdfPRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdf
AvinashDesireddy
 
Kubernetes Security Workshop
Kubernetes Security WorkshopKubernetes Security Workshop
Kubernetes Security Workshop
Mirantis
 
Taming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using TelegrafTaming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using Telegraf
InfluxData
 
Introduction to PaaS and Heroku
Introduction to PaaS and HerokuIntroduction to PaaS and Heroku
Introduction to PaaS and Heroku
Tapio Rautonen
 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction
Netronome
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
Datacratic
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
Weaveworks
 
Running your Spring Apps in the Cloud Javaone 2014
Running your Spring Apps in the Cloud Javaone 2014Running your Spring Apps in the Cloud Javaone 2014
Running your Spring Apps in the Cloud Javaone 2014
cornelia davis
 
Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021
InfluxData
 
Industrial IoT bootcamp
Industrial IoT bootcampIndustrial IoT bootcamp
Industrial IoT bootcamp
Lothar Schubert
 
OSDC 2019 | Introducing Kudo – Kubernetes Operators the easy way by Matt Jarvis
OSDC 2019 | Introducing Kudo – Kubernetes Operators the easy way by Matt JarvisOSDC 2019 | Introducing Kudo – Kubernetes Operators the easy way by Matt Jarvis
OSDC 2019 | Introducing Kudo – Kubernetes Operators the easy way by Matt Jarvis
NETWAYS
 
Dockerize a Django app elegantly
Dockerize a Django app elegantlyDockerize a Django app elegantly
Dockerize a Django app elegantly
frentrup
 

Similar to Scaling Prometheus Metrics in Kubernetes with Telegraf | Chris Goller | InfluxData (20)

The rise of microservices
The rise of microservicesThe rise of microservices
The rise of microservices
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
 
Zoo keeper in the wild
Zoo keeper in the wildZoo keeper in the wild
Zoo keeper in the wild
 
A hitchhiker‘s guide to the cloud native stack
A hitchhiker‘s guide to the cloud native stackA hitchhiker‘s guide to the cloud native stack
A hitchhiker‘s guide to the cloud native stack
 
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
 
InfluxDB Live Product Training
InfluxDB Live Product TrainingInfluxDB Live Product Training
InfluxDB Live Product Training
 
PRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdfPRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdf
 
Kubernetes Security Workshop
Kubernetes Security WorkshopKubernetes Security Workshop
Kubernetes Security Workshop
 
Taming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using TelegrafTaming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using Telegraf
 
Introduction to PaaS and Heroku
Introduction to PaaS and HerokuIntroduction to PaaS and Heroku
Introduction to PaaS and Heroku
 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
 
Running your Spring Apps in the Cloud Javaone 2014
Running your Spring Apps in the Cloud Javaone 2014Running your Spring Apps in the Cloud Javaone 2014
Running your Spring Apps in the Cloud Javaone 2014
 
Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021
 
Industrial IoT bootcamp
Industrial IoT bootcampIndustrial IoT bootcamp
Industrial IoT bootcamp
 
OSDC 2019 | Introducing Kudo – Kubernetes Operators the easy way by Matt Jarvis
OSDC 2019 | Introducing Kudo – Kubernetes Operators the easy way by Matt JarvisOSDC 2019 | Introducing Kudo – Kubernetes Operators the easy way by Matt Jarvis
OSDC 2019 | Introducing Kudo – Kubernetes Operators the easy way by Matt Jarvis
 
Dockerize a Django app elegantly
Dockerize a Django app elegantlyDockerize a Django app elegantly
Dockerize a Django app elegantly
 

More from InfluxData

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB Clustered
InfluxData
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow Ecosystem
InfluxData
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
InfluxData
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDB
InfluxData
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
InfluxData
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING Stack
InfluxData
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using Rust
InfluxData
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud Dedicated
InfluxData
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB
InfluxData
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
InfluxData
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
InfluxData
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage Engine
InfluxData
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
InfluxData
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
InfluxData
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
InfluxData
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
InfluxData
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
InfluxData
 

More from InfluxData (20)

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB Clustered
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow Ecosystem
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDB
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING Stack
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using Rust
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud Dedicated
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage Engine
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
 

Recently uploaded

BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
Bert Blevins
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
shanthidl1
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Bert Blevins
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 

Recently uploaded (20)

BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 

Scaling Prometheus Metrics in Kubernetes with Telegraf | Chris Goller | InfluxData

  • 1. Lessons from Cloud Scaling Prometheus metrics in Kubernetes with Telegraf
  • 2. The curious case of the missing metrics One Label too far...
  • 3. © 2019 InfluxData. All rights reserved. 3 The Suspects ● Prometheus ● Kubernetes ● Gateway ● Queryd
  • 4. © 2019 InfluxData. All rights reserved. 4 Prometheus http://gateway.twodotoh.svc.cluster.local:9999/metrics
  • 5. © 2019 InfluxData. All rights reserved. 5 Prometheus http://gateway.twodotoh.svc.cluster.local:9999/metrics global: scrape_interval: 15s scrape_configs: - job_name: prod_twodotoh kubernetes_sd_configs: - role: service
  • 6. © 2019 InfluxData. All rights reserved. 6 Kubernetes
  • 7. © 2019 InfluxData. All rights reserved. 7 InfluxCloud Gateway Gateway Queryd Gateway Queryd Queryd Ingress
  • 8. © 2019 InfluxData. All rights reserved. 8 Problem: Prometheus Debugging is Hard prometheus_target_sync_length_seconds{scrape_job="prod_twodotoh",quantile="0.01"} 0.012562015 prometheus_target_sync_length_seconds{scrape_job="prod_twodotoh",quantile="0.05"} 0.012562015 prometheus_target_sync_length_seconds{scrape_job="prod_twodotoh",quantile="0.5"} 0.012562015 prometheus_target_sync_length_seconds{scrape_job="prod_twodotoh",quantile="0.9"} 0.012562015 prometheus_target_sync_length_seconds{scrape_job="prod_twodotoh",quantile="0.99"} 0.012562015 prometheus_target_sync_length_seconds_sum{scrape_job="prod_twodotoh"} 0.012562015 prometheus_target_sync_length_seconds_count{scrape_job="prod_twodotoh"} 1
  • 9. © 2019 InfluxData. All rights reserved. 9 Problem: Prometheus Scaling is Hard global: scrape_interval: 15s scrape_configs: - job_name: prod_twodotoh_ns_a kubernetes_sd_configs: - role: service namespaces: names: - a global: scrape_interval: 15s scrape_configs: - job_name: prod_twodotoh_ns_a kubernetes_sd_configs: - role: service namespaces: names: - b
  • 10. © 2019 InfluxData. All rights reserved. 10 Solution: Isolatation with Telegraf Sidecar
  • 11. © 2019 InfluxData. All rights reserved. 11 Solution: Isolation with Telegraf Sidecar apiVersion: apps/v1 kind: Deployment metadata: name: "gateway" labels: spec: serviceName: "gateway" replicas: 100 template: metadata: name: "gateway" labels: app: "gateway" spec: containers: - name: "telegraf" image: "docker.io/library/telegraf:1.12" - name: "gateway" image: "quay.io/influxdb/gateway:latest" [[inputs.internal]] [[inputs.prometheus]] urls = ["http://127.0.0.1:9999/metrics"] [[outputs.influxdb]] urls = ["$MONITOR_HOST"] database = "$MONITOR_DATABASE" timeout = "5s" [[outputs.influxdb_v2]] urls=["http://us-west-2-1.aws.cloud2.influxdata.c token = "$TOKEN" organization = "$ORG" bucket = "$BUCKET" timeout = "5s" namepass = ["internal"]
  • 12. © 2019 InfluxData. All rights reserved. 12 Solution: Isolatation with Telegraf Sidecar
  • 13. © 2019 InfluxData. All rights reserved. 13 Problem: Prom has 1 and only 1 value http://gateway.twodotoh.svc.cluster.local:9999/metrics global: scrape_interval: 15s scrape_configs: - job_name: prod_twodotoh kubernetes_sd_configs: - role: service metric_relabel_configs: - regex: user_agent action: labeldrop
  • 14. © 2019 InfluxData. All rights reserved. 14 Solution: Influx for more context http://gateway.twodotoh.svc.cluster.local:9999/metrics [[inputs.internal]] [[inputs.prometheus]] urls = ["http://127.0.0.1:9999/metrics"] [[processors.converter]] [processors.converter.tags] string = ["user_agent"] [[outputs.influxdb]] urls = ["$MONITOR_HOST"] database = "$MONITOR_DATABASE" timeout = "5s" [[outputs.influxdb_v2]] urls=["http://us-west-2-1.aws.cloud2.influxdata.com"] token = "$TOKEN" organization = "$ORG" bucket = "$BUCKET" timeout = "5s" namepass = ["internal"]
  • 15. © 2019 InfluxData. All rights reserved. 15 Problem: Is there a way to prevent? http://gateway.twodotoh.svc.cluster.local:9999/metrics global: scrape_interval: 15s scrape_configs: - job_name: prod_twodotoh kubernetes_sd_configs: - role: service metric_relabel_configs: - regex: user_agent action: labeldrop
  • 16. © 2019 InfluxData. All rights reserved. 16 Solution: Telegraf Guard Rails http://gateway.twodotoh.svc.cluster.local:9999/metrics [[inputs.internal]] [[inputs.prometheus]] urls = ["http://127.0.0.1:9999/metrics"] [[processors.tag_limit]] limit = 4 ## List of tags to preferentially preserve keep = ["handler", "method", "status"] [[outputs.influxdb]] urls = ["$MONITOR_HOST"] database = "$MONITOR_DATABASE" timeout = "5s" [[outputs.influxdb_v2]] urls=["http://us-west-2-1.aws.cloud2.influxdata.com"] token = "$TOKEN" organization = "$ORG" bucket = "$BUCKET" timeout = "5s" namepass = ["internal"]
  • 17. © 2019 InfluxData. All rights reserved. 17 Problem: Hard to Rotate Prom Passwords http://gateway.twodotoh.svc.cluster.local:9999/metrics global: scrape_interval: 15s scrape_configs: - job_name: prod_twodotoh kubernetes_sd_configs: - role: service bearer_token_file: /etc/hunter2
  • 18. © 2019 InfluxData. All rights reserved. 18 Solution: Per Pod Credentials http://gateway.twodotoh.svc.cluster.local:9999/metrics [[inputs.internal]] [[inputs.prometheus]] urls = ["http://127.0.0.1:9999/metrics"] bearer_token = "/etc/telegraf/hunter2"
  • 19. © 2019 InfluxData. All rights reserved. 19 Lessons Scaling is NOT More Manual Processes Scaling is NOT saying “You’re Doing it Wrong” Scaling IS Empowering Developers Scaling IS Predictability of Failure Modes
  • 20. The time when we were Watching the watchers...
  • 21. © 2019 InfluxData. All rights reserved. 21 Problem: Am I scraping all the pods? http://gateway.twodotoh.svc.cluster.local:9999/metrics global: scrape_interval: 15s scrape_configs: - job_name: prod_twodotoh kubernetes_sd_configs: - role: service
  • 22. © 2019 InfluxData. All rights reserved. 22 Solution: Telegraf K8s Inventory [[inputs.internal]] [[inputs.kube_inventory]] url = "http://1.1.1.1:10255" [[outputs.influxdb]] urls = ["$MONITOR_HOST"] database = "$MONITOR_DATABASE" timeout = "5s" [[outputs.influxdb_v2]] urls=["http://us-west-2-1.aws.cloud2.influxdata.com"] token = "$TOKEN" organization = "$ORG" bucket = "$BUCKET" timeout = "5s" namepass = ["internal"]
  • 24. © 2019 InfluxData. All rights reserved. 24 Scaling even more
  • 25. © 2019 InfluxData. All rights reserved. 25 Scaling even more with Influx Enterprise Load Balancer
  • 26. © 2019 InfluxData. All rights reserved. 26 Scaling even more with Kafka and Influx Enterprise Kafka
  • 27. © 2019 InfluxData. All rights reserved. 27 Core Idea ● Measure and test metrics scaling ○ Are you missing metrics? ● Decentralize metrics gathering ○ Consider metrics as part of the program ● Empower Developers ○ They know their metrics the best. Allow them local tooling control
  • 28. © 2019 InfluxData. All rights reserved. 28 First Order Conclusion ● Too easy to shoot yourself in the foot with prometheus metrics. ● Too much in prometheus needs operation heroes. ● Too difficult to express vital information in prometheus about your program without a ton of centralized control. ● One mistake can impact everyone.
  • 29. © 2019 InfluxData. All rights reserved. 29 Second Order Conclusion ● Prometheus is not descriptive enough. ● Extremely difficult to change over time. ● The metrics game is not a solved problem. ○ Opentelemetry? ○ SNMP? ● Probably not one answer to everything.
  • 30. © 2019 InfluxData. All rights reserved. 30 Future ● Flux into Telegraf ○ Processor for transformation ○ Moving the program near the data ○ Flux Output ○ Monitoring and alerting at edge ● Telegraf Flux scripts hosted in InfluxDB API ○ Runtime plugins without re-compiling ○ Sampling rules from server-side ■ Aggregation on server with input to client ● What else?
  • 31. © 2019 InfluxData. All rights reserved. 31 Thank You!
  • 32. The time when collecting metrics impacted storage... Measure, measure, measure
  • 33. © 2019 InfluxData. All rights reserved. 33 Problem: Prometheus metrics are heavy weight