The limiting factor on how quickly an on-call engineer resolves an incident is rarely the rate at which they make decisions. It is the rate at which they can look at things — logs, metrics, traces, dashboards, SSH sessions — and assemble a picture of what the system is doing. Decision-making is cheap once the picture exists. Building the picture is where the time goes.
This work is structurally single-seat. To bring a colleague in, an engineer has to translate what they have already seen: which log lines, when the metric started climbing, and what the network trace confirmed. The translation cost is high enough that most on-call engineers defer involving others until the problem forces it. The team is nominally 24/7; in practice, investigations happen alone.
An AI co-oncall is a pattern designed to address this specific inefficiency. It does not replace the on-call engineer or make decisions. It processes things in parallel and produces a structured record of what it sees. The cost of bringing another party — human or machine — up to speed decreases because the record is continuously maintained rather than requiring retrospective summarization.
What a co-oncall does
An on-call agent that earns the “co-oncall” label has three observable behaviors.
It signals attention through a low-bandwidth channel. When an alert fires and the agent begins investigating, it marks the state with a minimal indicator — typically an emoji reaction on the corresponding Slack thread. The engineer sees at a glance that the incident is being looked at, without reading any prose. The agent’s presence is ambient; it does not demand attention to announce itself.
It writes its findings incrementally into a thread. Each investigative step — a log tailed, a metric queried, an SSH command run — produces a short note in the same thread. By the time the engineer reads the thread, it is an ordered log of what has been tried and what was observed.
It emits a final state. When the first investigative pass is complete, the agent updates its status: resolved (nothing further required) or needs-attention (a human should look). This compresses a triage decision an engineer would otherwise make manually into a single emoji change.
These three behaviors, together, turn the thread into an artifact that serves as both a running investigation and a handoff document. The same record serves both purposes because the agent generates it as a side effect of its investigation.
Escalation becomes a link paste
The most expensive part of escalation in traditional on-call is not the decision to escalate; it is the summarization. An engineer who has been investigating for twenty minutes must reconstruct a coherent account of what was seen, when, and what was tried. This reconstruction frequently takes longer than the investigation itself — one reason on-call engineers defer escalation past the point where it would have been useful.
Because the co-oncall has been writing continuously in the thread, escalation has been reduced to just pasting the thread link. Everything the recipient needs — the initial analysis, the intermediate findings, the current state — is already present and ordered. Escalation friction drops by roughly an order of magnitude.
An order-of-magnitude change in the cost of an action tends to alter how often it is performed. Engineers escalate earlier, pull in colleagues earlier, and acknowledge uncertainty earlier, because the dominant cost of doing so has been removed.
Three capabilities that make this work
The co-oncall pattern depends on three underlying capabilities. Each is independent of the others, but all three are required for the pattern to be useful. Setups missing any one of them degrade in predictable ways.
Tool access
An agent that can describe what should be checked but cannot check it produces an assistant that generates work for the engineer rather than completing work alongside them — every suggestion requires the engineer to run a command and report back.
What the agent needs is access to the same sources an engineer uses during investigation, delivered through whichever mechanism fits the source:
- Vendor systems — PagerDuty, metrics stores, ticket systems, Grafana. Typically accessed through MCP servers.
- Hosts themselves — tailing logs, inspecting processes, testing network paths. Typically through SSH.
- Internal services — admin portals, deployment systems, service registries. Typically through custom MCP or direct API access.
The specific mechanism varies by source; what matters is that the agent’s reach — the set of things it can observe and execute — matches what a human with the same credentials could observe and execute. Without that parity, the agent’s ceiling stays strictly below the engineer’s, and the co-oncall dynamic collapses back into a solo investigation with a verbose assistant.
The pattern parallels what coding agents discovered once they were given bash: the moment an agent can execute rather than merely describe, its utility increases by an order of magnitude.
Memory
An agent without memory treats every incident as its first encounter with the system under investigation. Chronic issues are rediscovered; known patterns are re-derived; alerts that have fired a dozen times with the same root cause are analyzed from scratch.
An agent with per-resource memory accumulates observations across shifts. The memory is keyed on stable identifiers — this host, this service, this endpoint — rather than on session state. Over time, it develops context about which alerts are chronic, which deploys correlate with which regressions, and which thresholds are misconfigured.
This capability is the qualitative difference between a tool and a teammate. A tool is stateless; it behaves identically on its hundredth invocation and its first. A teammate accumulates knowledge. Memory is the mechanism that places a co-oncall in the second category.
Ambient presence
An agent that posts long messages on every alert quickly exhausts the channel’s attention budget. On-call engineers learn to ignore such agents, and whatever signal they produce is filtered out regardless of content.
An agent that exists through minimal-bandwidth interfaces — emoji reactions, thread-scoped messages, status changes — preserves channel attention for humans. The agent’s communication cost per interaction is small enough that frequent updates remain viable without producing noise.
Ambient presence is a design constraint, not a feature. Agents that violate it lose their usefulness regardless of how good their analysis is.
The compounding effect
The three capabilities compose in a way that becomes more pronounced over time. After a few months of operation, the agent’s memory develops a structure that reflects the actual behavior of the infrastructure: which hosts are unreliable, which services are chronically strained, and which alerts recur on predictable schedules.
This structure becomes the default context for every subsequent investigation. An engineer starting a new on-call shift inherits the accumulated observations of every previous shift, without a manual handoff. Onboarding cost for the observability layer drops correspondingly; a new engineer’s effective context approaches that of a tenured engineer within days, because the agent carries the shared context.
This is a different class of leverage than prompt-level optimization. Prompt engineering compounds within a session and resets between them. Context-level work, of which memory is one axis, compounds across the team and persists.
Concurrent investigation
The single-seat constraint is most expensive when more than one incident is active at the same time. A human on-call investigates one at a time, serially; the others queue up, get triaged late, or get bounced to someone already occupied. An agent has no such constraint. Each active incident gets its own thread and its own running investigation, in parallel.
The engineer’s role shifts from “investigate in series” to “decide in series on top of parallel investigation.” The reports arrive at roughly the same time; the engineer reads them in priority order — starting with whichever thread has escalated to ⚠️ — and makes decisions one incident at a time. Concurrent investigation was previously impossible for a single person. With a co-oncall, investigation parallelizes while decisions remain serial — and serial decision-making was always the actual bottleneck.
What does not change
The co-oncall does not make decisions. It does not escalate unilaterally, does not roll back deployments, and does not declare incidents resolved. Those actions remain the on-call engineer’s responsibility.
What changes is the investigation itself. The single-seat constraint on looking — the structural reason on-call work has historically been performed alone — loosens. A single engineer plus an agent can cover the investigative surface that previously required either one engineer working longer or two engineers working together, with the translation cost that implied.
Closing
On-call work has historically been performed alone, not because the team is absent, but because investigation resists sharing. Translating what has been observed to a second party costs more than continuing to observe. The cost imbalance favors working alone.
An AI co-oncall restructures that cost. The agent looks in parallel, writes what it sees into a thread continuously, and signals its state through low-bandwidth channels. The engineer arrives at a running log rather than a blank slate; any subsequent party — a colleague pulled in, a post-mortem, the next shift — arrives at the same log.