Pre-filter, Classify, Triage: A Three-Stage Funnel for Token Budget

A cheap classifier should pay for the expensive investigator. Three stages — deterministic pre-filter, lightweight classify, full triage — let the cost shape match the value shape on a queue of mixed-value events.

Spend tokens where the work is.

The easy way to build an agent on a queue is one prompt that does everything: receive an event, decide if it matters, pull the data, write the response. One prompt, one model call, simple to ship.

It’s also expensive on a queue you don’t curate. An event or alert pipeline carries a lot of low-value traffic — self-resolving CPU spikes, dup alerts inside an alert storm, scheduled-maintenance pages, cron-tick alerts that aren’t real. None of these justify a full LLM-with-tools run, yet the one-prompt-does-everything design pays MCP and tool tokens on every one.

The fix isn’t to make the prompt cheaper. It’s to stage the agent so the cheapest decision happens first, the next-cheapest decision happens second, and the expensive one only runs on events that earned it.

The funnel

Three stages, in increasing cost order:

Stage	What it spends	What it does
1. Pre-filter	Zero tokens	Deterministic JS rules. Drops obvious skips before the LLM sees anything.
2. Classify	LLM, no MCP, no tools	One prompt, one job: a category tag, a product tag, a one-line trend entry.
3. Triage	LLM + MCP + skills	Full investigation. Only runs for tags marked `investigate: true`.

Most events stop at stage 1 or 2. Only a fraction — the ones where the agent’s note is actually useful — pay the stage-3 cost. The cost shape now matches the value shape: cheap things get spent on cheap-to-decide events, expensive things on events that justify it.

Stage 1: pre-filter, zero tokens

The cheapest token is the one you don’t spend.

Pre-filter is plain code, no LLM. It runs against deterministic fields on the event — the source, the author type, the tags already on the event, the account name. Every condition is AND-matched against a small rule set:

The author is a known automation account.
The event is a re-fire of one already triaged.
The state is already terminal — resolved or snoozed.
The event fired during a known maintenance window or deploy.
The source environment is non-prod (staging, dev, CI).

If a rule matches, the event gets a triage_skipped tag, the pipeline marks it complete, and the LLM never sees it. The cost of running this stage on a million events is a millisecond per event and zero tokens.

The temptation is to push these decisions into the classifier — “the LLM can figure it out from context.” It can. It’s just paying for that figuring on every event, including the 30–50% you’d reject deterministically. A rule you can write in JS is a rule that doesn’t need an LLM to evaluate.

Stage 2: classify, one cheap LLM call

Stage 2 is one model call, no tools, no MCP. The prompt has one job: read the alert and emit JSON with a category tag, a product tag, and a one-line trend entry. Nothing else. The output is structurally constrained — it must parse as JSON with three fields, or the run fails and retries.

Critically, classify doesn’t write the response. It only labels the event. The label is what drives routing in stage 3. A noise tag means “don’t investigate, just tag and close.” A duplicate tag means “link to the active incident and close.” A performance tag means “investigate latency.” An availability tag means “investigate service health and errors.” A saturation tag means “investigate resource pressure.” An unclassified tag means “investigate by default, log for review.”

Two reasons to keep this stage tool-free:

Speed. A no-tools call returns in seconds. A tools-enabled call has to enumerate the tool list (thousands of tokens of schema), then potentially round-trip. On a high-volume queue, that latency compounds across every event.
Cost predictability. A no-tools call has a known token cost: input prompt + small JSON output. A tools-enabled call’s cost depends on what the agent decides to do, which means the budget per event is unbounded.

The classifier is also the agent’s lightest pattern-detection loop. The trend entry it emits — | 2026-04-29T14:08Z | 18095 | availability | payments-api | keywords: 5xx spike, gateway timeout, 503 | — appends to a shared trends log, which gets inlined into the next stage-3 run’s prompt. Stage 3 reads “what alerts have come in recently and what shape were they?” without the agent itself making any extra queries. Memory as a side effect of routing — no separate vector store, no retrieval call, just an append-only file the next prompt reads.

Stage 3: triage, expensive but rare

Stage 3 is the full picture: the model with MCP servers, skills, a domain-specific tool surface. It can query metrics, read logs, hit internal APIs, fetch documentation. It writes the actual operator-facing note.

The reason it’s expensive is also the reason it’s useful: it has agency. It picks which queries to run, what hypotheses to test, what to include in the note. That open-ended power is exactly what a one-prompt-does-everything design has to grant on every event, useful or not.

In the funnel, stage 3 only runs when stage 2 emitted a tag whose investigate flag is true. Tags that don’t trigger investigation get a triage_skipped mark and the pipeline ends. Configuration drives which tags investigate; the LLM doesn’t decide. The classifier picked the route; the router obeys the routing table.

That’s the split that earns the cost. Stage 3 is expensive per call, but it’s only called on the subset where its expense is paying for something useful.

Why this beats one-prompt-does-everything

A one-prompt design wins on simplicity and loses on three things:

Token waste compounds. If 60% of your events should never trigger an investigation, a one-prompt design pays MCP + tool overhead on all of them. The funnel pays it on 40%.
Failure modes leak. A flaky MCP server during stage 3 should not break stage-2 routing. In a one-prompt design they’re fused; one bad MCP call corrupts the response. In the funnel, a stage-3 failure leaves a tagged event with no note — recoverable, retriable, and the routing decision is still on disk.
You can’t A/B individual stages. One-prompt means swapping a classifier improvement also swaps the investigation behavior. The funnel lets each stage iterate independently. The classifier prompt changes more often than the triage prompt; the triage skills churn more often than the classifier categories. Decoupled, both move faster.

The cost of the funnel is a routing table — a small JSON file that says “for tag X, do Y.” That table is also the place where engineering judgment lives. “We don’t investigate noise events because the response is always ‘auto-resolved, no action needed.’” That sentence belongs in code, not in a prompt.

Where the funnel doesn’t pay off

A funnel earns its complexity on a queue with a wide value distribution — many low-value events, a long tail of high-value ones. On a curated queue where every event is already worth investigating, the pre-filter drops to zero matches and the classify step adds a cost without saving one. The one-prompt design wins there.

The signal that you need the funnel: you can already point to categories of events on your queue and say “we shouldn’t be running the full agent on these.” If you can’t, you don’t have the value distribution that makes staging worth the wiring. If you can, you’re already paying the cost of running the full agent on them anyway.

Closing

The one-prompt design treats every event as if it might justify the full power of the agent. The funnel treats events as a distribution: most don’t, some do, and the cost of running the agent should reflect that.

Three stages, three cost levels:

Pre-filter — JS rules. Drops the obvious skips for free.
Classify — one cheap LLM call. Tags every event, drives routing.
Triage — the expensive call. Runs only on events the classifier marked worth it.

The classifier is the bridge: it translates an unstructured event into a label the deterministic router can act on. Once that label exists, engineering judgment — not the model — decides what happens next.

A queue agent that costs money has to spend it on the events that earn the spend. The funnel makes that explicit, and the rest is just routing.

The funnel#

Stage 1: pre-filter, zero tokens#

Stage 2: classify, one cheap LLM call#

Stage 3: triage, expensive but rare#

Why this beats one-prompt-does-everything#

Where the funnel doesn’t pay off#

Closing#