Embracing the Fire: How Outages Shaped My Tech Career
My first job in Silicon Valley was in 2009 for an already acquired “startup” called Tellme Networks. If you ever called 411 in the early 2000s, you absolutely used the Tellme platform, which specialized in telephony based applications. I was working in a Network Operation Center (NOC) where I quickly clung to writing software projects during my 12-hour weekend shifts, thus inspiring a career-long obsession with building thoughtful tools. During my two weekday swing shifts, I gravitated toward implementing large, complex changes to the system.
The primary job of working in an NOC is to ensure the system's reliability. This was after Google coined the term Site Reliability Engineer (SRE), but before many of my colleagues (myself included) would eventually become SREs at various companies.
Jumping into the Fire: Handling Outages
SREs are known for jumping into the fire pit. During the many outages I have mitigated throughout my career, I found early on that I was quite good at it even though I didn’t much like the stress of them. In Season 7 of Game of Thrones, there’s a quote where Daenerys says, “We all enjoy what we are good at,” and Jon, brooding momentarily, replies, “I don’t.” That was me. I hated outages, but I was really good at them. They really stressed me out.