Blog

Meet the Author

Nick Travaglini

Senior Technical Customer Success Manager

Nick is a Technical Customer Success Manager with years of experience working with software infrastructure for developers and data scientists at companies like Solano Labs, GE Digital, and Domino Data Lab. He loves a good complex, socio-technical system. So much so that the concept was the focus of his MA research. Outside of work he enjoys exercising, reading, and philosophizing.

February 13, 2025

Preempting Problems in a Sociotechnical System

Here at Honeycomb, we emphasize that organizations are sociotechnical systems. At a high level, that means that “wet-brained” people and the stuff they do is irreducible to “dry-brained” computations. That cashes out as the inability to ultimately remove or replace people in organizations with computers, in spite of what artificial general intelligence (AGI) ideologues would have you believe. The best that such artifacts can do is “relieve labor-intensive toil,” as my colleagues Charity and Phillip put it.

Culture

Teams & Collaboration

October 23, 2024

Determining a CoPE’s Efficacy—and Everything After

As discussed in the first article in this series, a Center of Production Excellence (CoPE) is a more or less formal, provisional subsystem within an organization. Its purpose is to act from within to change that organization so that it’s more capable of achieving production excellence. The series has, to date, focused mainly on how best to construct such a subsystem and what activities it should pursue. In this concluding post, however, I want to return to the point of a CoPE, discuss signs of success, and evaluate the impacts it’s having.

Culture

Teams & Collaboration

September 18, 2024

A CoPE’s Duty: Indexing on Prod

Building a center of production excellence (CoPE) starts with indexing on production. Here’s why. Odds are that a software engineer today is really focused on one place: pre-prod. Short for “pre-production,” this is slang for an environment where software code operates in a prototype phase of its development lifecycle. Common sense would have one believe that this is a safe space, a workbench of sorts, where problems can be found and remediated. Then, once engineers are reasonably certain everything’s working properly, they advance it to a matching environment called production, where the code behaves like it did in pre-prod and it merely needs to be managed by an operations team. That story is a comforting lie.

Culture

Software Engineering

September 17, 2024

An Ode to Events

At this point, it’s almost passé to write a blog post comparing events to the three pillars. Nobody really wants to give up their position. Regardless, I’m going to talk about how great events are and use some analogies to try to get that across. Maybe these will help folks learn to really appreciate them and to depreciate a certain understanding of the three pillars. Or maybe not.

Observability

August 15, 2024

A CoPE’s Guide to Alert Management

Alerts are a perennial topic, and a CoPE will need to engage with them. The bounds of this problem space are formed by two types of alerts: Reactive alerts (in Honeycomb, we call these Triggers): They are alerts that fire after some event, like crossing a pre-determined boundary. Proactive alerts (Burn Alerts based on Honeycomb’s SLO feature): These give notice before crossing a threshold; in the case of SLOs, that means before failing to meet the stated objective.

Culture

Software Engineering

August 8, 2024

The CoPE and Other Teams, Part 2: Custom Instrumentation and Telemetry Pipelines

The previous post laid out the basic idea of instrumentation and how OpenTelemetry’s auto-instrumentation can get teams started. However, you can’t rely only on auto-instrumentation. This post will discuss the limitations in more detail and how a CoPE can help teams overcome them.

Culture

Software Engineering

July 25, 2024

The CoPE and Other Teams, Part 1: Introduction & Auto-Instrumentation

The CoPE is made to affect, meaning change, how things work. The disruption it produces is a feature, not a bug. That disruption pushes things away from a locally optimal, comfortable state that generates diminishing returns. It sets things on a course of exploration to find new terrains which may benefit it more—and for longer.

Culture

Software Engineering

July 10, 2024

Staffing Up Your CoPE

Getting the right people working in the CoPE is crucial to success because these change agents must limber up the organization and promote the flexibility necessary to perform resilience.

Culture

Software Engineering

May 29, 2024

Independent, Involved, Informed, and Informative: The Characteristics of a CoPE

In part one of our CoPE series, we analogized the CoPE with safety departments. David Woods says that those safety departments must be: independent, involved, informed, informative. In this post, we’ll elaborate on what each of those characteristics means, why the CoPE should also match those qualifications, and how to achieve that status.

Culture

Software Engineering

May 15, 2024

Establishing and Enabling a Center of Production Excellence

Software is in a crisis. This is nothing new. Complex distributed systems are perpetually in a state far from equilibrium, operating in what Richard Cook has called a “degraded mode.” It’s through a combination of technical artifacts, organizational practices and policies, and pure gumption that they manage to maintain themselves through time.

Culture

Software Engineering

July 17, 2023

Evolving by Involving

In this post, we’re going to lay out the guiding principle that unifies the diverse world of CS as we see it—and show how we put it into practice.

Software Engineering

Teams & Collaboration

February 2, 2023

Autocatalytic Adoption: Harnessing Patterns to Promote Honeycomb in Your Organization

When an organization signs up for Honeycomb at the Enterprise account level, part of their support package is an assigned Technical Customer Success Manager. As one of these TCSMs, part of my responsibilities is helping a central observability team develop a strategy to help their colleagues learn how to make use of the product. At a minimum, this means making sure that they can log in, that relevant data is available, that they receive training on how to query, and perhaps that they collaborate with the rest of Honeycomb’s CS department to solve problems as they arise.

Observability

Teams & Collaboration

September 27, 2022

Sense and Signals

Part of understanding a complex, distributed software system as a socio-technical system means taking seriously that the signals the stewards receive aren’t just chatter.

Observability

Meet the Author

Nick Travaglini

Explore Author's Blog

Preempting Problems in a Sociotechnical System

Determining a CoPE’s Efficacy—and Everything After

A CoPE’s Duty: Indexing on Prod

An Ode to Events

A CoPE’s Guide to Alert Management

The CoPE and Other Teams, Part 2: Custom Instrumentation and Telemetry Pipelines

The CoPE and Other Teams, Part 1: Introduction & Auto-Instrumentation

Staffing Up Your CoPE

Independent, Involved, Informed, and Informative: The Characteristics of a CoPE

Establishing and Enabling a Center of Production Excellence

Evolving by Involving

Autocatalytic Adoption: Harnessing Patterns to Promote Honeycomb in Your Organization

Sense and Signals