← Blog · MCP reliability · May 17, 2026

Tool output budget checklist

MCP tools need output budgets before they need bigger context windows.

A tool call can be correct and still break the agent if it returns too much. Search results, files, transcripts, logs, and nested API responses need bounded output contracts so the model receives the smallest safe evidence, not a context flood.

Fast answer

→Tool output is part of the route budget. A verbose MCP result can burn more model context than the call that produced it, then make the next planning step slower, more expensive, and less recoverable.
→A production MCP tool needs an output contract before launch: maximum bytes, maximum records, schema shape, summary rule, artifact handoff, redaction policy, and the exact denial or truncation receipt when the response exceeds budget.
→The useful test is not whether the tool can return a large JSON blob. It is whether the same route can return the minimum safe result, point to a durable artifact when needed, and prove what was omitted.
→If the trace cannot explain how many bytes or tokens were returned, why the payload was shaped that way, what artifact holds the full result, and how the agent can request the next page safely, the route is not ready for unattended loops.

The production checklist

Per-route output ceiling

Set a maximum response size by route, not just by server. A search result, file summary, database row read, transcript extract, and browser scrape should not share one generic payload limit.

Schema before prose

Return typed fields, stable ids, result counts, omitted-count metadata, and next-page cursors before free-form explanation. Let the model reason over bounded structure instead of raw dumps.

Content-block order

Preserve the original sequence of text, image, file, table, and citation blocks or return an explicit order map. Reordering mixed media can make the agent attach evidence to the wrong step even when every block is present.

Artifact handoff

When the full payload is too large, write it to a durable artifact or provider object and return a reference, checksum, expiration, access rule, and safe follow-up route instead of flooding context.

Summarization boundary

Name whether the tool returned raw data, extracted fields, a lossy summary, or a sampled preview. The receipt should make lossy compression visible before the agent treats it as ground truth.

Redaction and data-use policy

Apply redaction before payload shaping, and record which secret, customer-data, credential, prompt, or topology class was removed. Truncation is not a security control.

Pagination and refill rule

Expose a cursor, range, query refinement, or approval step for more data. Do not let the agent repeat the same oversized call hoping the next response is smaller.

Failure fixtures

Test the context-flood cases before the agent discovers them in production.

Oversized search result

Expected: Return top bounded results, total count, omitted count, ranking criteria, and a cursor or refinement hint; do not stream every match into context.

Large file or transcript

Expected: Return section summaries plus artifact reference, byte range, checksum, and follow-up extraction route instead of a full dump.

Nested JSON response

Expected: Flatten or select approved fields, include schema version, and receipt omitted nested objects before the agent plans from partial data.

Mixed text and image blocks

Expected: Return text, images, files, tables, and citations in the route's declared order with block ids and parent ids; do not move all images or artifacts to the end of the response.

Sensitive field in allowed result

Expected: Redact before truncation and record the protected class. A payload clipped after the secret is already returned fails the gate.

Agent asks for 'everything' again

Expected: Deny or require a narrower query after budget exhaustion. The planner should not bypass the output budget by rephrasing the same broad request.

Trace evidence

The output receipt should make omitted data auditable.

Once the agent moves on, operators need to know whether it acted on raw data, an extraction, a summary, or a clipped preview. The trace should keep returned payload size, omitted data, redaction, artifact references, and allowed next actions in one place.

route id and tool call id

caller / tenant / workspace

operation class and data class

query / filter fingerprint

policy decision source

output ceiling in bytes / records / tokens

actual bytes and estimated tokens returned

raw count, returned count, and omitted count

schema version and selected fields

content block order and parent ids

redaction rule and protected class

summary / extract / raw-data mode

artifact id, checksum, and expiration

cursor, range, or refill route

policy decision and denial / truncation code

receipt id and allowed next action

Database-backed tools

Read-only is not the same as bounded.

Database-backed MCP tools need a result-authority check after the query-authority check. Returning raw customer identifiers, free-text notes, nested JSON, or twenty thousand allowed rows can still overexpose context even when the underlying SQL was read-only.

The receipt should prove why this exact slice was safe for the agent to see, not merely that the agent had permission to run a read.

Query authority

Record a query or filter fingerprint before execution so an audit can tell whether the agent asked for the bounded slice it was allowed to inspect or a broad 'all customers' style read.

Result authority

Treat returned fields, row scope, tenant scope, redaction class, and sample mode as a second permission check. A legal query can still return evidence the agent should not reason over.

Policy decision source

Name the table allowlist, column allowlist, row-level predicate, workspace or tenant rule, and redaction policy that shaped the payload before truncation or summarization.

Copy-paste route card

Budget the returned evidence before the call runs.

MCP route:
Caller / tenant:
Data class:
Query / filter fingerprint:
Policy decision source:
Max bytes / records / tokens:
Allowed fields / schema:
Content block order rule:
Summary vs raw-data rule:
Artifact handoff rule:
Redaction rule:
Pagination / refill route:
Oversize denial or truncation code:
Receipt fields:

Common misreads

✕Optimizing provider-call retries while ignoring that the returned payload is what actually explodes the model bill.
✕Calling a tool read-only and therefore safe, even though it can leak private data or swamp context with unbounded output.
✕Returning a natural-language summary without saying which fields were dropped, sampled, redacted, or inferred.
✕Treating mixed content as a bag of blocks. Reordered screenshots, citations, or file excerpts can be as misleading as omitted evidence.
✕Using truncation as a quiet success path. The agent must know the response is partial before it takes action.
✕Storing a full artifact without a checksum, expiration, access rule, or route for retrieving a narrower slice later.
✕Letting the agent retry the same broad query after an output-budget denial instead of requiring a smaller query or human approval.