AI SRE agents on-call

A friendly engineer and a small robot working side by side at a cozy desk

The Five Phases of AI Agency: Which Decision You Moved Into the Agent

Agent autonomy is usually pictured as a ladder — a low rung where the agent does a little on its own, a high rung where it runs nearly unattended, and the job is to climb. There’s a sharper way to read the same progression. Each step up isn’t more autonomy in some general sense; it’s one specific decision moving out of human judgment, or out of code, and into the agent. ...

A neat-looking report sitting on the path between an agent and the live data source it should have queried

The Accidental Middle Layer: How Human Reports Destabilize Agent Retrieval

Every week, an automation pulls PagerDuty data and writes a trends report — top services, alert noise by rotation. Humans read it at the weekly review and adjust priorities. The report goes into Notion or the team wiki. A week later, someone asks an agent — “what are the recent PD trends?” The agent has MCP access to both PagerDuty and the team’s docs. Retrieval finds the report. It matches the query perfectly: same words, polished structure, topical headings. The agent answers from the report. ...

A robot at an on-call desk sorting incoming alerts while a human engineer drinks coffee with one tidy summary on the screen

What Customers Want When Things Break: An On-Call Reframe

Step back to what the customer actually wants when things break, and the answer is short — service works, fix is fast, customer informed, no repeats. PagerDuty, Grafana, Datadog, Sentry, Slack — each does its job, and each does it well. But those are the team’s tools, not the customer’s vocabulary. The implementation reflects a specific constraint: every layer of the existing stack was designed around what humans need to do incident response. When the responder changes from human to agent, the right question isn’t “what does each tool become.” That’s tool-first thinking, and it accidentally preserves the existing shape. The cleaner question is layer by layer: does this layer’s function exist for the customer, or does its form exist because humans need it? Functions survive — customers are permanent. Forms are up for renegotiation when the actor doing the work isn’t a human anymore. ...

One robot typing a triage note, a second robot reviewing it with a clipboard, sticky-note prompt patches floating between them

The Self-Evolving Triage Agent

Most production AI agents are static. The prompt you ship is the prompt you keep — until somebody reads enough output to notice a recurring failure, edits by hand, and ships a fix. The cycle takes weeks; most failures slip through it. A self-evolving agent shortens that loop, but it has to solve one structural problem first: a model auditing its own output is bad at it. Same-model audit produces self-deception — the auditor rationalises the same blind spot the triage agent just exhibited. The fix isn’t more prompt engineering; it’s a different model. ...

Engineer redacting a document on the left, another holding a sieve catching dangerous symbols on the right, robot in the middle behind two walls

Two Boundaries: Redact Before You Prompt, Sanitize Before You Render

The LLM sits between two boundaries you have to defend. On the input side, treat it as an untrusted destination for sensitive strings. On the output side, treat what it produces as user-input-controlled — because prompt injection makes it so. The LLM is not a trusted insider. Most threat models for LLMs in production treat the model as the security boundary: “the prompt says don’t leak the API key.” That’s a wish, not a control. The real boundaries are around the model — at the data going in and the data coming out. ...

A horizontal three-stage funnel — many envelopes flowing in, a gear filter, a small tagging robot, then a larger investigating robot at the end

Pre-filter, Classify, Triage: A Three-Stage Funnel for Token Budget

A cheap classifier should pay for the expensive investigator. Three stages — deterministic pre-filter, lightweight classify, full triage — let the cost shape match the value shape on a queue of mixed-value events. Spend tokens where the work is. The easy way to build an agent on a queue is one prompt that does everything: receive an event, decide if it matters, pull the data, write the response. One prompt, one model call, simple to ship. ...

Engineer pair-investigating with an AI on the left, robot alone in the server room on the right

Investigate Locally, Triage Server-Side

The same triage agent running on your laptop and running server-side off an alert webhook looks like the same agent — but how much agency you give the LLM should be completely different on each side. Locally, you should let the LLM stretch out: hand it the MCPs you’ve set up for investigation, let it pick which tool to call, let it choose its own path, let it backtrack and try a different hypothesis. Discovery is a feature. You’re standing right there — every step shows up on your screen, and you can redirect at any time. ...

Engineer handing tools to an AI robot on the floor

Context Engineering for Operational AI Agents

Most AI agent setups that disappoint their teams don’t disappoint because the model is wrong. They disappoint because the agent was asked to reason about systems it can’t see. A triage agent without PagerDuty access produces a vague analysis. An on-call agent without metrics hallucinates a root cause from alert titles. The agent isn’t bad; it’s undercontexted. Context engineering is the long game, and it has a structure. Specifically, it has four techniques — not a staircase. Each one matches a different shape of context source, and which ones apply depends on what you already have. A team with a CLI-heavy internal stack will spend most of its effort on technique 3. A team whose vendors all expose public MCPs might never touch technique 2. What remains true, regardless, is that technique 4 — bundling — is what turns any subset of the others into a team asset. ...

On-call engineer with AI robot partner at the desk

The Slack-Native AI On-Call Agent That Stands Shift With You

The limiting factor on how quickly an on-call engineer resolves an incident is rarely the rate at which they make decisions. It is the rate at which they can look at things — logs, metrics, traces, dashboards, SSH sessions — and assemble a picture of what the system is doing. Decision-making is cheap once the picture exists. Building the picture is where the time goes. This work is structurally single-seat. To bring a colleague in, an engineer has to translate what they have already seen: which log lines, when the metric started climbing, and what the network trace confirmed. The translation cost is high enough that most on-call engineers defer involving others until the problem forces it. The team is nominally 24/7; in practice, investigations happen alone. ...

Confused customer with laptop while engineer watches dashboards

Time to Customer Awareness: the Incident KPI No One Measures

In most incident response pipelines, customers begin experiencing impact at minute zero. Within a few minutes, the first support tickets arrive. By minute ten, customer-initiated posts appear in shared channels. Public status pages are typically updated somewhere between thirty and ninety minutes. The gap between when a customer first feels impact and when that customer is formally told is a measurable, important, and almost universally unmeasured KPI. Call it Time-to-Customer Awareness, or TTC-Aware. ...