A robot at an on-call desk sorting incoming alerts while a human engineer drinks coffee with one tidy summary on the screen

What Customers Want When Things Break: An On-Call Reframe

Step back to what the customer actually wants when things break, and the answer is short — service works, fix is fast, customer informed, no repeats. PagerDuty, Grafana, Datadog, Sentry, Slack — each does its job, and each does it well. But those are the team’s tools, not the customer’s vocabulary. The implementation reflects a specific constraint: every layer of the existing stack was designed around what humans need to do incident response. When the responder changes from human to agent, the right question isn’t “what does each tool become.” That’s tool-first thinking, and it accidentally preserves the existing shape. The cleaner question is layer by layer: does this layer’s function exist for the customer, or does its form exist because humans need it? Functions survive — customers are permanent. Forms are up for renegotiation when the actor doing the work isn’t a human anymore. ...

May 13, 2026 · 12 min · Jared Lee
Confused customer with laptop while engineer watches dashboards

Time to Customer Awareness: the Incident KPI No One Measures

In most incident response pipelines, customers begin experiencing impact at minute zero. Within a few minutes, the first support tickets arrive. By minute ten, customer-initiated posts appear in shared channels. Public status pages are typically updated somewhere between thirty and ninety minutes. The gap between when a customer first feels impact and when that customer is formally told is a measurable, important, and almost universally unmeasured KPI. Call it Time-to-Customer Awareness, or TTC-Aware. ...

February 1, 2026 · 6 min · Jared Lee