From the course: DevOps Foundations
Unlock the full course today
Join today to access over 23,200 courses taught by industry experts.
Operational feedback: Incident response and retrospectives
From the course: DevOps Foundations
Operational feedback: Incident response and retrospectives
- Remember how we said that all of our systems are sociotechnical systems and humans are a part of their resilient operation? - Yeah, well you can do all the other stuff, right. You can have great design and development and testing and great monitoring, but things are still going to break. - Since this is absolutely no surprise. Part of the job is to get really good at responding to and remediating problems in your production system, which we affectionately refer to as incidents. - Incident response is an activity that needs to be practiced. It's the place where in-depth system knowledge and a cool head make all the difference. - There are three general activities you want to be good at for incident response, troubleshooting, understanding the system enough to be able to diagnose and remediate the problem. Automation having tooling already created to speed up and make safe information gathering and re remediation activities and communication. Incident response often requires a team of…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
-
(Locked)
What is site reliability engineering?3m
-
(Locked)
Building for reliability: Theory3m 45s
-
(Locked)
Building for reliability: Practice5m 57s
-
(Locked)
Operational feedback: Observability4m 42s
-
(Locked)
Operational feedback: Incident response and retrospectives4m 42s
-
(Locked)
Your DevOps SRE toolchain6m 22s
-
(Locked)
-
-