The Accidental Middle Layer: How Human Reports Destabilize Agent Retrieval

Every week, an automation pulls PagerDuty data and writes a trends report — top services, alert noise by rotation. Humans read it at the weekly review and adjust priorities. The report goes into Notion or the team wiki.

A week later, someone asks an agent — “what are the recent PD trends?” The agent has MCP access to both PagerDuty and the team’s docs. Retrieval finds the report. It matches the query perfectly: same words, polished structure, topical headings. The agent answers from the report.

But the report is a week-old snapshot, framed by whoever wrote it, with whatever indicators they thought mattered. Live PagerDuty data sits one MCP call away — fresher, complete, unframed. The agent should have queried that. It didn’t, because the report was right there and looked authoritative.

No one decided to feed the agent a summary. The report was written for humans. It just happened to land somewhere indexable, and indexable means agent-retrievable. The middle layer was accidental.

Anything indexable becomes agent context

The mental model worth holding: every artifact written for humans that ends up in a searchable place — Notion, internal wikis, a #status-trends channel — joins the agent’s retrievable surface by default. Not because anyone wired it up that way. Because that’s what “we put it in our docs” means once MCP and RAG are sitting on top.

Once it’s retrievable, it competes with substrate. Not supplements — competes. And in many cases, it wins.

The default failure mode

Give an agent a Notion summary and a live PagerDuty MCP. Ask “what are recent PD trends?” In a typical setup — shared docs indexed alongside tools, no source priority in the prompt, no freshness policy — the agent answers from the Notion doc and never calls PagerDuty. This isn’t a hypothetical edge case; it’s what those defaults produce. Worth understanding why:

Reports look like answers; substrate looks like data. A Notion doc has narrative, conclusions, headers — a coherent reply ready to forward. PagerDuty MCP returns rows of incidents that the agent would have to aggregate itself. Retrieval ranks the polished artifact higher because it lexically matches conversational queries, and the agent’s planner takes the cheaper path.
No freshness signal in the query. “Recent PD trends” doesn’t contain “today” or “right now.” A doc titled Weekly PD Trends looks like a perfectly fine match. Nothing in the query tells the agent to suspect staleness.
No source priority by default. The agent doesn’t know which MCP is ground truth and which is a derivative. Nobody told it. So it prefers whatever retrieval ranks highest — and that’s usually the polished derivative.

The agent isn’t doing anything wrong. It’s using what’s there. The design failure is unscoped retrieval — letting derivative reports compete with substrate by default.

Instability, not slowness

The natural first question is “does this actually hurt performance?” Most of the time, the latency and token cost are small enough that you wouldn’t notice. That framing misses the real problem.

The real problem is instability. Retrieval ranking shifts with the state of the index — a new wiki page lands, the reranker gets re-tuned, overlap accumulates — and the same query routes to different sources at different times: substrate one time, the Notion summary the next, a quarterly review after that. The agent doesn’t know which it’s reading. You don’t either.

That instability is invisible in three ways:

The agent sounds confident regardless of source. Polished prose from a stale Notion doc reads identically to a freshly-aggregated substrate answer. There’s no surface tell.
You can’t reproduce the failure. When someone notices the agent gave a stale answer last Tuesday, asking the same question on Thursday may pull a different doc and produce something plausible. The failure isn’t deterministic; it’s structural.
Drift accumulates without alarms. As more human-facing reports accumulate in your knowledge base, overlap grows. The agent’s effective behavior changes — slowly, without any deploy, without anyone touching the agent.

You wouldn’t accept a system that returned floating answers to the same query. You’re already accepting it from your agent — it just speaks fluently enough that you don’t notice.

Coverage overlap is the slow burn

Connecting more MCPs doesn’t linearly cost you performance — agents only call what they need. What hurts is coverage overlap: the same information accessible through multiple paths.

Slack MCP has the original #incidents thread.
Notion MCP has the postmortem written from that thread.
A Notion retro doc consolidates several postmortems into a quarterly summary.

Same event, three retrieval paths, three different framings, three levels of staleness. When a query lands, RAG returns several at once. The agent reconciles paraphrases of paraphrases instead of reading the source.

Overlap only goes one direction. Nobody deletes old reports. Every quarter another layer of derivative lands on the retrieval surface. The instability gets quietly worse, and there’s no event that says “this is when the system started drifting.”

When the report is substrate, not summary

Not every report is purely derived. Some carry information that isn’t in substrate — explicit decisions, root-cause judgments, framings layered on top. “We decided to deprioritize alert-X because it’s been low-signal for six months” isn’t derivable from PagerDuty data. It’s a new fact.

A useful test: could this report be regenerated from substrate alone?

Yes — it’s all derived. Humans may genuinely still need it — that’s a separate question. The one worth considering carefully is whether it belongs in the agent’s retrieval surface at all. The agent has substrate one MCP call away.
No — it carries decisions and interpretations. Those decisions are substrate. Worth extracting them into something small, atomic, dated, queryable — so the agent retrieves them as first-class facts, not paragraphs to parse.

Designing the retrieval surface

The shift isn’t “don’t write reports.” Humans still need them. The shift is recognizing that once agents share your knowledge base, your knowledge base is part of your agent’s runtime — every artifact in it is a candidate for retrieval, whether you intended that or not. The default position becomes prefer substrate over derivatives, and a few specific positions follow:

Separate the human reading surface from the agent retrieval surface. The agent doesn’t need to read everything humans do. Default it to an explicit allowlist — an agent/ folder, or an “index-only-this” convention — rather than letting it index every wiki page humans write. The asymmetry: humans read deliberately, agents retrieve by default, so the burden of choice belongs on the agent side.
Make the preference explicit in the prompt. “Prefer substrate over summaries” doesn’t happen by itself. If the agent has a PagerDuty MCP, “what are PD trends” should route there — but you have to say so, in the system prompt or tool description, or retrieval ranking decides.
Watch closed loops. When AI produces an artifact another agent reads, you’ve built a feedback loop — and loops drift. Decide whether the loop is intentional and worth designing, or accidental and worth breaking.

Where this doesn’t fit

A few assumptions hold this framework up. When they break, the cost shrinks — sometimes to zero.

Substrate has to be reachable. If your tools don’t expose MCPs and the only path to information is the human-written wiki, the wiki isn’t a middle layer — it’s the only layer. The reframe starts to matter when an alternative source exists; before that, the report is the source.

Overlap takes time to accumulate. The cost is in coverage overlap, not in any single doc. A small team with a clean knowledge base may never produce enough overlapping artifacts to feel the instability. Overlap is a function of time, team size, and how many human-facing artifacts the team produces — and it builds quietly.

Some queries are historically scoped. “How did we handle the auth migration last year” is a fine fit for a postmortem doc; the question is about a past framing, not a current state. Reports earn their place for queries where staleness isn’t a failure mode — onboarding, retrospectives, narrative context.

The reframe doesn’t disappear in these cases — it just becomes a smaller part of the design.

Closing

Reports written for humans aren’t going away. The question is what happens after they’re written. Once they land in an indexable place, they join the agent’s retrieval surface — whether anyone wired them up that way or not. And once they’re on that surface, they compete with substrate, not supplement it. Polished, topical, answer-shaped — they often win.

Instability isn’t loud. The agent answers fluently from a week-old report, the prose sounds confident, nobody asks where it came from. That’s the part to watch for.

Anything indexable becomes agent context#

The default failure mode#

Instability, not slowness#

Coverage overlap is the slow burn#

When the report is substrate, not summary#

Designing the retrieval surface#

Where this doesn’t fit#

Closing#