Finding Harmony in Reliability
“Site Reliability Engineering” is an ambiguous field and role. I spend a lot of time thinking about ways to distill what I do into metaphors or analogies that might help others understand it better. One that I’ve been thinking about recently is the comparison of an SRE to a music producer.

When I graduated from college, I initially embarked on a career as an aspiring music producer and audio engineer. What energized and motivated me was collaborating with other musicians on their songs and arrangements. Being a good producer involves understanding the vision of the group, listening carefully to each individual musician’s parts, pushing them to get the perfect take, and then ensuring everything fits together as a single composition. It requires nitpicking on both a micro level, ensuring that every note played is in tune and on time, and a macro level, thinking about how each part influences the resulting song. You have to navigate egos and help others feel comfortable under pressure in order to get the best out of them. The dynamics are complicated, and you need to take in all kinds of inputs, interpret the vision and goals of each artist, and then find the path to get there.
I find that my career in Site Reliability Engineering has many parallels to this previous life of mine. Instead of bands of musicians, I work with teams of software engineers who are crafting a different type of art, tapping into another form of creativity, under different sorts of pressures. Much like producing music, I observe and analyze the outputs and help the creators tweak or operate their systems to fulfill their vision or requirements. Sometimes this involves honing in on particular notes (metrics) and determining how they affect the overall composition, or figuring out how the introduction of a new part might impact the resulting “song” or “arrangement”, similar to a large distributed system.
In both of my career paths, the importance of working with and enabling the people behind the art is just as important as the art itself. There needs to be trust and cohesion in working together to accomplish a shared vision. There’s a partnership that needs to be balanced, where the SRE is an exoskeleton to the software engineering teams, extending and enhancing their capabilities, filling in small gaps, but in the end, the SWE teams need to be the pilot and make the moves. In music, the producer often doesn’t perform every part of every song, or they’d need to go on tour with the musician, and that wouldn’t scale. Similarly, organizations usually don’t fund enough SREs to be the primary operator of every service, so teams need to be able to operate their own services.