Meet the Author
Lex Neva
Author
Explore Author's Blog

AIOps: Prove It!
I’ve read a steadily increasing stream of articles about using AI in SRE, and I have yet to find one that inspires my trust. Each article makes impressive claims about the capabilities of AI and the way it can be applied to SRE tasks, but the vast majority are light on details.

Always. Enable. Keepalives.
As part of our recent failure testing project, we ran into an interesting failure mode involving the OpenTelemetry SDK for Go. In this post, we’ll show you why our apps stopped sending telemetry for over 15 minutes and how we enabled keepalives to prevent this kind of failure from happening in the future.

Destroy on Friday: The Big Day 🧨 A Chaos Engineering Experiment – Part 2
In my last blog post, I explained why we decided to destroy one third of our infrastructure in production just to see what would happen. This is part two, where I go over the big day. How did our chaos engineering experiment go? Find out below!

Deploy on Friday? How About Destroy on Friday! A Chaos Engineering Experiment – Part 1
We recently took a daring step to test and improve the reliability of the Honeycomb service: we abruptly destroyed one third of the infrastructure in our production environment using AWS’s Fault Injection Service. You might be wondering why the heck we did something so drastic. In this post, we’ll go over why we did it and how we made sure that it wouldn’t impact our service.

Should Every Incident Get a Retro?
At a recent training session, Jeli spent a great deal of time covering incident retrospectives and what makes an incident worthy of studying. My colleague Ben Hartshorne asked a fascinating question, which I’ll paraphrase here: We’ve been talking about what makes an incident interesting, but what about the reverse? Are there aspects of an incident that would make you say, “We probably shouldn’t bother doing a retrospective on this one?”

The Incident Retrospective Ground Rules
I joined Honeycomb as a Staff Site Reliability Engineer (SRE) midway through September, and it’s been a wild ride so far. One thing I was especially excited about was the opportunity to see Honeycomb’s incident retrospective process from the inside. I wasn’t disappointed!