Step back to what the customer actually wants from on-call, and the answer is short: a working service, a fast fix, clear communication, no repeats. PagerDuty, Grafana, Datadog, Sentry, Slack — each does its job, and each does it well. But those are the team’s tools, not the customer’s vocabulary. The customer’s ask is four things: the service works; when it breaks it gets fixed fast; I’m told what’s happening; the same problem doesn’t keep happening. That’s the whole list. Every paging tool, every dashboard, every status page, every postmortem template is implementation detail of how a team delivers those four.
The implementation has a property worth noticing: almost every layer was designed around a human limitation. Humans can’t watch every metric continuously, so observability tools became the eyes of the system. Humans aren’t always reachable, so paging tools solved “find the right person and don’t take no for an answer.” Humans need visual compression to grok large systems, so dashboards became the working surface. Humans investigate alone, so Slack channels became the place a second pair of eyes could join. Humans forget at 3 a.m., so runbooks. Humans don’t naturally learn from incidents, so postmortems.
When the responder changes from human to agent, the right question isn’t “what does each tool become.” That’s tool-first thinking, and it accidentally preserves the existing shape. The cleaner question is layer by layer: does this layer’s function exist for the customer, or does its form exist because humans need it? Functions survive — customers are permanent. Forms are up for renegotiation when the actor doing the work isn’t a human anymore.
The four things the customer actually wants
Whatever tools a team uses, the customer’s ask is the same:
- The service works.
- When it breaks, it gets fixed fast.
- I’m told what’s happening.
- The same problem doesn’t keep happening.
PagerDuty isn’t on this list. Grafana isn’t on this list. Datadog, Sentry, CloudWatch, Slack, your runbook folder, your weekly incident review — none of those show up, because the customer cares about the service, not the stack. The tools exist because someone, somewhere, has to do work to keep those four promises, and the tools shape that work.
The mistake the agent-native conversation usually makes is to ask “what does our paging tool become with AI?” That keeps the implementation in the centre of the picture. The customer’s question is upstream — at the four things, not at the tools we deliver them with.
Why the existing stack looks the way it does
Each layer of the current on-call stack was designed to deliver one or more of those four things, given the constraints of working with humans:
- Detection needs tools because humans can’t watch every metric continuously. Grafana, Datadog, Sentry, and CloudWatch became the eyes of the system.
- Paging needs tools because humans aren’t always reachable. PagerDuty solved “find the right person and don’t take no for an answer.”
- Investigation needs visual surfaces because humans need shape recognition to grok large systems. Dashboards became the working interface.
- Coordination needs a shared channel because humans investigate alone unless invited. Slack became the place where the second pair of eyes joins.
- Runbooks exist because humans forget, especially at 3 a.m.
- Postmortems exist because humans don’t naturally learn from incidents without a forcing ritual.
- Status pages exist because the customer wants a stable URL to check “is it me or is it them.”
Each of these is a workaround for some property of human cognition or human availability. The four customer expectations would still need to be served if the actor doing the work were different — but the shape of the workarounds would be different.
Function vs form
Two things get conflated when people argue about which tools survive: the function a layer serves and the form that function currently takes. The function is what the customer expectation requires done. The form is how it currently gets done, given that the responder was a human.
Take detection. The function — “notice when something is wrong” — has to exist regardless of who responds, because it’s how the customer’s first promise (the service works) is enforced. The form — dashboards staring at a wall, alerts firing into Slack, somebody paged when a threshold trips — is what that function looks like when a human has to be in the loop. The function survives because the customer is permanent. The form is up for re-evaluation because the responder isn’t.
The same split repeats at every layer. Every function survives. The forms change wherever the form encoded a human limitation. Dashboards as the primary working surface, runbooks as PDFs, Slack as the alert dump, paging as the start of every response, single-seat investigation, postmortems as a four-hour homework assignment — those are forms, not functions. Each existed because the responder was human; each is up for renegotiation when the responder isn’t.
This isn’t about “removing humans.” Humans still own accountability. The customer is still a human. What gets removed is workarounds for the responder being human, in the cases the responder isn’t, anymore.
Walking the layers, customer-first
Detection — survives
Customers care because slow detection means longer downtime. This layer doesn’t move. Metrics, logs, traces, exceptions, synthetic checks, and deploy events all still need to exist, and they still need to fire when something looks wrong.
What changes is the consumer. With a human responder, the detection layer’s job was to wake someone up. With an agent responder, the detection layer’s job is to push the event into the agent’s queue. The event itself is the same. Grafana, Datadog, Sentry, CloudWatch — these survive because the customer still needs fast detection and faithful records.
Investigation — same function, different actor
Customers care because faster investigation means faster fix. This layer doesn’t move either, but the actor shifts. The bulk of investigation moves from a human at a keyboard to an agent calling tools.
This is the layer where the most-visible reorganization happens. Dashboards become evidence rather than the primary working surface — the agent reads the underlying metrics and pulls the panels a human will want to verify. Runbooks become executable playbooks the agent can run step by step, with the read-only steps automated and the risky ones gated by approval. Log search and trace query happen through the agent’s tool surface, not a human’s keyboard. The investigation outputs that were previously assembled in a person’s head become a structured triage note posted to Slack.
The customer doesn’t see any of this. The customer sees: “the service was wobbly for four minutes and recovered, and the status page entry explains what happened.”
Decision and action — split by the permission ladder
Customers want decisions correct and fast. This layer splits.
Decisions that follow from the playbook (“this is noise; don’t escalate”; “this looks like the same incident we had last Tuesday”) become the agent’s job. Decisions that require judgment, approval, ownership, or are irreversible stay human. The boundary lives in code, not in the prompt — a permission ladder a prompt-injected request can’t talk its way past:
| Level | Scope | Examples |
|---|---|---|
| 0 — Read-only | Investigation | Check metrics, logs, deploys, Sentry, runbooks, incident history |
| 1 — Low-risk write | Communication artifacts | Slack summary, ticket, draft customer update, propose runbook update |
| 2 — Approval required | Mitigation | Restart, scale, roll back, disable flag, silence alert, resolve incident |
| 3 — Never autonomous | Production-altering | Delete data, change schema, rotate secrets, modify IAM, change billing |
Level 0 is where the first useful version lives. Level 1 follows once the Slack summaries earn trust. Level 2 is gated per-action behind an approval flow. Level 3 the agent doesn’t get the capability — the only way the action happens is a human pressing the button. The agent doesn’t need unlimited autonomy. It needs clear boundaries, code-enforced.
Communication — survives, and gets more attention
Customers care because being kept in the dark is a trust event. This layer doesn’t shrink — it grows, because freeing the human from first-response means there’s more attention available for the customer-facing comms.
The agent can do most of the prep: draft the initial customer update from incident context, fan it out to status page and Slack, draft the post-incident summary from the running thread. The human still presses publish — autopublishing LLM output is a PR incident waiting to happen — but the time between “we know what’s wrong” and “the customer knows what’s wrong” can compress, because the bottleneck (a human drafting prose from a fragmented mental picture) has moved upstream into the agent’s running record.
This is where Time-to-Customer-Awareness improvements live. The agent doesn’t replace the comms; it frees the human to focus on the comms instead of the investigation.
Memory — survives, in a form the agent can use
Customers care because they don’t want the same outage twice. This layer doesn’t disappear either; it changes form.
In the human-only stack, memory lives in postmortem docs, half-stale runbooks, oral history, and the unindexed Slack thread from two months ago. In an agent-native stack, memory becomes structured Markdown the agent can retrieve — service files, incident files, playbook files, escalation policies, severity rubrics — stored in a place humans can read, edit, and review in PRs:
/team-memory
/services
payments-api.md
mobile-backend.md
/playbooks
api-5xx.md
queue-lag.md
/incidents
2026-05-13-api-5xx.md
/rules
escalation-policy.md
action-permissions.md
Humans can read it. Humans can edit it. Agents can retrieve it. Changes can be reviewed. The operational memory isn’t hidden in a black box. Postmortems stop being a separate four-hour task and start being a draft the agent produces from the incident thread, that the human edits and adds nuance to.
Memory is read, not run. Memory is what the agent reads when investigating — service notes, incident records, recurring patterns. A wrong memory line makes the agent slightly less well-informed; it doesn’t change what the agent does. What the agent actually does runs through the playbook and the permission ladder, both human-authored elsewhere in the stack.
So the agent writes memory freely. Git is the audit trail; nothing waits in a queue for approval. The real risk isn’t bad rules getting installed — it’s accumulation (stale contacts, duplicated notes, contradictory observations), and that’s a housekeeping problem solved by a weekly or monthly pass, not a per-change gate.
Where the existing tools land
With the layers reorganized this way, each tool falls into a clearer place:
- Observability tools (Grafana, Datadog, Sentry, CloudWatch) — detection sources for the customer-facing reliability promise, evidence sources for the investigation layer. They survive because the customer still needs fast detection and faithful records.
- Paging tools (PagerDuty, Opsgenie) — accountability layer. Volume drops by an order of magnitude; the function (find the named human, force an acknowledgment, audit the trail) remains. For small teams this may compress to a phone number plus a Slack DM; for compliance-heavy orgs it stays the audit-of-record, just with a sparser dashboard.
- Slack — the human collaboration surface, not the input layer. The place the agent posts its triage and the place humans coordinate.
- Runbook docs — replaced by executable playbooks. The doc form survives only where it’s also the agent’s input.
- Status pages — survive, possibly drafted by the agent and pressed-publish by the human. Same destination, faster path.
- Postmortem templates — survive, with the agent drafting from the incident thread and the human editing for nuance.
None of these tools look “replaced by AI.” They look reorganized around what the customer was always asking for, with one fewer constraint on how it gets delivered.
The product is the integration layer
The product surface that delivers all of this isn’t another dashboard, and it isn’t another alert channel. It’s an AI first responder layer that sits between the detection sources on one side and the human accountability layer on the other, and serves the four customer expectations end to end.
What this layer has to do well:
- Ingest detection events from the observability stack.
- Pull context from the tools the team already uses.
- Retrieve relevant team memory.
- Run safe playbook steps.
- Classify the alert as noisy, actionable, unknown, or requiring escalation.
- Draft the Slack summary and, when warranted, the customer update.
- Escalate to the accountable human when the evidence says it should.
- Propose memory and playbook updates after each incident.
- Keep an audit trail of everything it saw, did, and recommended.
The hard part isn’t calling a model. The hard part is context selection, operational taste, trust boundaries, and escalation design — the things that decide which of the four customer expectations actually gets served well. An agent that sees too little hallucinates. An agent that sees too much becomes slow, expensive, and risky. An agent that talks too much becomes noise. An agent that escalates too often is just PagerDuty with better prose. An agent that escalates too little is dangerous.
The product is the judgment layer — and the judgment layer is the part that won’t fall out of the next model release.
Less overhead, not less human
The right framing isn’t “AI replaces on-call.” That’s too broad, and it confuses the customer-facing function with the human-coordination workaround. The accurate framing is: less overhead, not less human. The agent handles the alert by default — detection, investigation, decision, the safe actions — and writes an audit log the human can read whenever they want. It reaches out only when the situation calls for it: an approval at the permission boundary, a judgment outside the playbook, or a customer-facing call.
What’s new isn’t a new role. Most of what the on-call shift has meant up to now — dashboard-hopping, runbook-hunting, building context from scratch at 3 a.m. — was overhead the human absorbed only because the responder was human. An always-on actor that follows the playbook and leaves an audit trail takes most of that off the human’s shift; the human is still there, just not buried in it.
Closing
Step back to the customer’s ask, and it’s short: a service that works, that gets fixed fast when it doesn’t, that tells them what’s happening, and that doesn’t repeat its mistakes. PagerDuty, Grafana, runbooks — those are how the team currently delivers it. Each is doing what it was built for. None of them is what the customer asked about.
Every layer of the on-call stack we have today is an implementation detail of how a team delivers those four things using humans. When the responder isn’t human, the layers don’t all need to look the same. Every function survives — detection, investigation, decision, action, communication, memory, accountability — because the customer still needs each one. The forms that existed because the responder was human — paging volume, dashboards as the primary working surface, runbooks as PDFs at 3 a.m., investigation as a single-seat activity — reorganize around the actor that’s now doing the work.
The on-call stack isn’t being replaced. It’s being rebuilt around what the customer was always asking for, with one fewer constraint on how it gets delivered.