This document discusses using Keptn to automate service level indicator (SLI) evaluation and performance validation with service level objectives (SLOs). It describes two use cases: 1) automating SLI evaluation over a timeframe, and 2) integrating performance validation as a self-service capability. The document outlines how Keptn works underneath, including defining SLIs and SLOs in YAML and scoring SLIs against SLO criteria. It demonstrates integrating Keptn with existing pipelines and monitoring tools. Finally, it discusses options for installing only the Keptn quality gate functionality or the full Keptn platform.
Cloud Native Night, July 2020, online: Talk of Jürgen Etzlstorfer (@jetzlstorfer, Dynatrace) == Please download slides if blurred! == Abstract: Prometheus is considered a foundational building block when running applications on Kubernetes and has become the de-facto open-source standard for visibility and monitoring in Kubernetes environments. Your first starting points when operating Prometheus are most probably configuring scraping to pull your metrics from your services, building dashboards on top of your data with Grafana, or defining alerts for important metrics breaching thresholds in your production environment. in your production environment. As soon as you are comfortable with Prometheus as your weapon of choice, your next challenges will be scaling and managing Prometheus for your whole fleet of applications and environments. As the journey “From Zero to Prometheus Hero” is not trivial you will find obstacles on the way. In this talk we are highlighting the most common challenges we have seen and provide guidance on how to overcome them. Finally, we are discussing a solution to get you there more quickly to build automated, future-proof observability with Prometheus showing Keptn as one possible implementation. About Jürgen: Jürgen is a core contributor to the Keptn open-source project and responsible for the strategy and integration of self-healing techniques and tools into the Keptn framework. He also loves to share his experience, most recently at conferences on Kubernetes based technologies and automation. More information: Overview: https://github.com/keptn/community Github: https://github.com/keptn/keptn Website: https://keptn.sh Google Group: https://groups.google.com/forum/#!forum/keptn Twitter: https://twitter.com/keptnProject ________________________________________________ Follow us on: https://twitter.com/qaware https://www.linkedin.com/company/qaware-gmbh https://github.com/qaware www.qaware.de
This talk was given at DevSecOps Days Boston and DevOps & Security Meetup Vienna in 2021 Automatic Release Validation, aka Quality Gates, is not a new concept but often only covers functional or performance metrics. Keptn’s open SLO-based evaluation allows DevSecOps to have their favorite security tool report SLOs such as number of detected vulnerabilities as part of delivery automation
The document discusses challenges with scaling Prometheus monitoring as applications and environments grow. Common issues include lack of centralized configuration management, significant manual configuration work, and configurations becoming out of sync. The presentation proposes using GitOps and code generators to address these challenges. It also introduces Keptn as a solution to automate Prometheus and Grafana configuration based on service level indicators and objectives defined in YAML files. Keptn provides an event-driven control plane for continuous delivery and automated operations.
This talk was given at the Online Kubernetes Meetup July 2020 as well as DevOps Fusion 2020. The talk discusses 3 major problems in current delivery and operations: too much time spent in delivery, hard to maintain monolithic delivery pipelines and a lack of auto-remediation of production problems The talk focuses on new approaches to solve these problems inspired by SRE practices and event-driven architectures. As an implementation for a new approach we use Keptn (www.keptn.sh) - a CNCF Open Source project.
Rodrigo Faria's presentation on Developing your Plugin. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
Yelp’s ad platform handles millions of ad requests everyday. To generate ad metrics and analytics in real-time, they built they ad event tracking and analyzing pipeline on top of Spark Streaming. It allows Yelp to manage large number of active ad campaigns and greatly reduce over-delivery. It also enables them to share ad metrics with advertisers in a more timely fashion. This session will start with an overview of the entire pipeline and then focus on two specific challenges in the event consolidation part of the pipeline that Yelp had to solve. The first challenge will be about joining multiple data sources together to generate a single stream of ad events that feeds into various downstream systems. That involves solving several problems that are unique to real-time applications, such as windowed processing and handling of event delays. The second challenge covered is with regards to state management across code deployments and application restarts. Throughout the session, the speakers will share best practices for the design and development of large-scale Spark Streaming pipelines for production environments.
This document discusses how Puppet can be used to set up and manage a minimum viable BI (business intelligence) infrastructure at Stylight. It provides tips for running Puppet in standalone mode on Windows machines, using scheduled tasks to regularly sync configurations and run scripts, and defining reusable classes and definitions to avoid duplicating configurations. It also covers how Puppet can help implement a lean approach to ranking models through a multi-stage evaluation process using Solr and A/B testing.
Puppet is widely known in DevOps community, but not so popular in data teams. Nevertheless, Puppet could easily empower your data teams. In the talk presented hands-on experience of using Puppet for different data topics starting from configuring Windows machine for Business Intelligence and finishing with advanced ranking infrastructures based on Puppet. The talk will walk you through the process of setting up a standalone Puppet configuration, that used for provisioning Windows machine to be utilized for Business Intelligence purposes like Tableau and Talend Big Data configurations, ETL scheduling etc. Second part of the talk will cover a use-case of Puppet for enabling a lean ranking infrastructure.
Slidedeck from Vienna DevOps & Security Meetup. This talk is keptn - an open source event driven control plane for continuous delivery and automated operations for kubernetes
AWS Lambda has changed the way we deploy and run software, but this new serverless paradigm has created new challenges to old problems - how do you test a cloud-hosted function locally? How do you monitor them? What about logging and config management? And how do we start migrating from existing architectures? In this talk Yan and Scott will discuss solutions to these challenges by drawing from real-world experience running Lambda in production and migrating from an existing monolithic architecture.
The document discusses continuous deployment practices at Outbrain, an online content recommendation company. It emphasizes the importance of short feedback loops between code changes and user exposure through practices like deploying new code multiple times daily and testing code changes automatically before deployment. Infrastructure is codified and deployment is automated using tools like Chef to further streamline the process.
Slides from the DevOps Training in Ho Chi Minh City, Vietnam. The source code is available at https://gitlab.com/ctrabold/devops-training
This document describes how a robot assessor can automate the process of vulnerability assessments by executing common security tools. The robot assessor uses heuristics to discover services on a target, determine which tools to run, execute those tools via APIs, and record the results. This allows vulnerability assessments to be initiated with a single command, freeing up analysts to focus on analysis rather than repetitive tasks. Several examples are provided of how the robot assessor would automate running tools like nmap, Nikto, sqlmap, and more.
This document summarizes a presentation about controlling technical debt with continuous delivery. It discusses using tools for continuous inspection of code to detect debt, automated code fixes to reduce debt incrementally, and integrating fixes into the continuous delivery pipeline to continuously pay down debt over time. Key aspects covered include metrics and tools to measure debt, automated fixes for common code issues, code transformation techniques to fix issues safely, and a WalkMod pipeline API to integrate fixes into the delivery process.
This document provides advice on preparing serverless applications for production based on the author's experience deploying 170 Lambda functions to production. It covers important areas to consider like testing at the unit, integration, and acceptance levels; setting up CI/CD pipelines; monitoring, logging, and alerting; distributed tracing; security; and configuration management. The author emphasizes the importance of testing end-to-end without mocking external services, setting up production-ready monitoring and metrics dashboards, and choosing deployment frameworks that are tried and tested.