Runtime MCP Discovery Needs Trust Filters Before Giant Indexes Become Useful

Wrong-tool risk

A local helper and a high-side-effect remote system look equally available because semantic relevance was allowed to outrank authority.

Context tax

The model sees too many mixed-authority candidates and spends tokens exploring options that should have been filtered out upstream.

Auth-blind ranking

The top result may be impossible for the current caller to use safely because auth shape and principal mismatch were hidden until too late.

Freshness theater

A giant directory looks rich even when stale, dead, or auth-broken entries remain in the candidate pool.

The useful question

The runtime discovery question is not “how many tools can the agent find?” It is “how many safe, relevant, caller-appropriate tools can the agent see before it starts choosing?”

1. Giant indexes feel like progress because they improve recall

The current MCP ecosystem really does have a discovery problem. There are too many demos, too many stale entries, and too many directories that make every surface look equally real. A giant index improves one important thing, recall.

If the right tool exists somewhere, broader coverage increases the odds that the agent can discover it. That matters. It is just not the whole problem.

Fresh curated-registry launch notes sharpen the same point. Curated is better than random, because editorial filtering can remove obvious dead ends before the model ever looks. It is still not runtime truth. A registry entry is inventory until the runtime re-applies caller visibility, trust class, auth viability, and freshness for the current lane.

The practical audit is whether curation hands the runtime a cleaner starting set, or whether it turns one editorial label into a false proof of current safety.

The harder problem is selection, and selection gets more dangerous as the candidate pool mixes more trust classes and side-effect profiles together.

Curated registry handoff audit

A curated registry earns trust only when it preserves the separation between editorial review and runtime permission. Before an agent ranks a listed server, the discovery layer should be able to answer four questions without asking the model to infer them from prose.

Editorial inclusion

Who decided this entry belongs in the registry, what evidence did they inspect, and when was that evidence last refreshed?

Install and handshake truth

Does the advertised setup path still complete, and does the live handshake expose the same tools the listing describes?

Caller-specific auth viability

Can this principal authenticate with the intended scope now, or is the listing only proving that some maintainer once connected it?

Runtime narrowing

After policy, tenant, and trust-class filters run, is the server still visible to this caller — and are out-of-scope choices denied with typed evidence?

Verified vertical discovery audit

Vertical MCP directories are starting to advertise verified professional or domain-specific discovery. That is a stronger recall signal than a random catalog row, but it still needs runtime filters before an agent can act on it.

Verified source

Preserve which upstream registry, licensing body, or primary source verified the entry, not just that a marketplace label says verified.

Jurisdiction and allowed action

A lawyer, clinic, vendor, or domain expert may be valid for one region or job and unsafe for another. The discovery result should carry that boundary before ranking.

Freshness interval

The runtime needs the last verification time and refresh requirement, because stale professional or regulated data can be worse than no match.

No-advice / no-side-effect lane

Discovery, recommendation, outreach, payment, booking, and advice-like actions should stay separate until a policy gate promotes the workflow.

2. Runtime discovery changes the problem from browsing to mediation

A human browsing a directory can apply judgment before clicking anything. They can notice that one tool is local and harmless while another is remote, stale, or broad enough to be dangerous.

An agent does not inherit that judgment by default. If the runtime exposes one giant mixed-authority pool, the model is being asked to solve relevance, availability, safety, and authority all at once.

That is the moment where discovery becomes part of the control plane. The runtime is no longer just describing what exists. It is shaping what choices the model is allowed to consider.

3. The wrong abstraction is “best search over the whole catalog”

Better embeddings or a smarter ranker do not fix the authority problem if the candidate pool is wrong. A local read-mostly helper and a high-side-effect remote business integration should not appear as interchangeable ranking candidates just because both match the same task description.

If the only safety layer is hoping the ranker prefers the harmless one, the runtime has already delegated control-plane work to the model that should have been solved upstream.

The real job of the discovery layer is to remove bad candidate classes before semantic ranking begins.

4. Trust filters belong before ranking

Trust class

Local helper, read-mostly surface, reversible write tool, high-side-effect execution surface, or shared remote integration. Ranking without this is ranking blast radius by accident.

Auth shape

Public, static key, delegated user auth, or tenant-bound runtime credential. A candidate the caller cannot safely authenticate to is not a real candidate.

Side-effect class

Inspect, write, execute, or egress. These need to be visible before the model starts reasoning, not after the call is already selected.

Network egress class

Fetch, browser, and crawl tools need resolved-IP, redirect, and destination-class policy before they enter the candidate set. A URL argument can point at public content or cloud metadata with the same semantic label.

Host-state boundary

Filesystem, repo, and workspace tools need canonical path, allowed-root, symlink, operation-class, redaction, and denied-neighbor evidence before they rank. A relevant file tool can still expose the wrong project, secret, or host mount.

Caller-visible scope

Generated manifests and gateway policy layers only count when the runtime actually hides what this principal should not see now. Discovery truth is caller-visible scope, not global inventory plus a promise.

Freshness and viability

Handshake, auth viability, failure shape, and stale-entry suppression decide whether the candidate pool is still operational truth.

Fetch discovery boundary

URL-fetch tools need egress filters before they become candidates

The fresh MCP fetch SSRF finding is a runtime-discovery problem, not only an HTTP-client bug. If a marketplace entry or generated manifest exposes a generic fetch tool, the candidate pool has to know whether that tool can reach cloud metadata, loopback, private networks, or in-cluster services before the model ever sees it as an option.

Classify fetch, browser, and crawl candidates as network-egress tools before semantic ranking starts, not as generic read tools.

Resolve DNS and redirects under policy before promotion: public web targets can rank; cloud metadata, loopback, private ranges, IPv6 ULA, Kubernetes service names, and internal control-plane domains stay hidden or typed-denied.

Keep allowed-public-fetch and denied-metadata-neighbor fixtures beside the listing so runtime discovery can prove the candidate set did not leak infrastructure routes into the model context.

Filesystem discovery boundary

Filesystem tools need path-boundary proof before they become candidates

The fresh MCP filesystem scans are runtime-discovery pressure, not a badge-confidence shortcut. If a marketplace entry or generated manifest exposes repo, file, workspace, or local-resource tools, the candidate pool has to know which roots this caller may touch and which neighboring paths must fail closed before the model sees file contents as usable context.

Classify filesystem, repo, workspace, and local-resource candidates as host-state authority before semantic ranking starts, not as generic read helpers.

Normalize the requested path to a canonical path under the caller's allowed root or repo prefix before promotion; parent traversal, sibling workspaces, hidden config, host mounts, and out-of-policy writes stay hidden or typed-denied.

Keep allowed-read/write and denied-neighbor fixtures beside the listing so runtime discovery can prove the candidate set did not leak neighboring projects or secrets into model context.

5. The useful discovery surface is the smallest caller-safe subset

A good runtime discovery system should not say, “Here are 14,000 things, good luck.” It should say something closer to, “For this caller, in this environment, under this policy, here are the few candidates that are both relevant enough and safe enough to consider.”

That bounded candidate set lowers context pressure, lowers wrong-tool risk, and makes auditability cleaner because the pool itself reflects policy rather than only search quality.

Bigger catalogs are only better when the runtime gets stricter about what the model is allowed to see.

6. A better runtime-discovery ladder

Discovery ladder

Discoverable, the service exists in an index.
Caller-visible, this principal can actually see it right now.
Trust-classed, side-effect and authority shape are explicit before selection.
Auth-viable, the intended caller can complete auth with the expected scope.
Rankable, only then should semantic search, rules, or LLM ranking choose among the remainder.

That ordering matters. If ranking happens before trust filtering, the system is asking the model to decide blast radius while it decides relevance.

Runtime mediation should optimize for bounded choice first, then better selection inside that bounded set.

7. What a useful evaluator should score here

The strongest evaluation questions are whether the system exposes caller-specific visibility, whether trust class and side-effect class are visible before selection, whether auth shape is legible, whether stale entries are suppressed, and whether the runtime bounds the pool before semantic ranking begins.

That means a useful discovery layer should separate “worth reviewing” from “safe for this caller right now.” If curated-registry inclusion, launch-week enthusiasm, and runtime availability collapse into one badge, the model mistakes editorial confidence for authorization.

That now includes a harder discovery-truth test: do generated manifests or gateway policy layers actually narrow the live candidate set for this caller, does an out-of-scope choice yield a typed denial instead of a vague failure, and can the runtime still explain which lane consumed shared quota or backend authority after the remote hop.

Those questions separate search quality from control quality. For agent systems, that separation is not optional.

The fast production shortcut is the same one from MCP has a security model: filter first by caller-visible scope, acting principal, and surviving evidence, then let relevance rank the smaller safe set.

Pricing boundary

Trust filters run before the paid route exists

Runtime discovery is still proof work until a safe candidate becomes a callable route. Do not estimate, authorize, or bill a broad directory hit before the narrowed route card says what executes, who pays, and what must fail closed.

Keep candidate search, scoring, route-card inspection, and fit evaluation free while the runtime is still narrowing the pool.

Price only the selected callable route: capability id, provider constraint, credential lane, quota owner, estimate, denied neighbor, and receipt fields should already be named.

If trust filters return no safe candidate, that is a no-call outcome, not a paid fallback to a broader marketplace result.

Pricing path: see the MCP discovery pricing boundary / Free-proof guide: separate discovery from paid execution / Execution preflight: scope managed execution

MCP Route Review fork

A filtered discovery result becomes useful only when one route can be reviewed

Runtime discovery should stay free proof while it narrows a giant index. The moment it identifies one MCP route that a team wants to repeat with real credentials, budget, and side effects, the question changes from “which server looks best?” to “can this exact route prove its allowed lane and denied neighbor?”

The selected server/tool is not just relevant; it will run again inside a loop or workflow.

The nearest unsafe neighbor is named: adjacent tenant, path, domain, account, amount, or side-effect class.

Credential lane, caller principal, quota or budget owner, repeat volume, and receipt or typed-denial proof are visible before promotion.

Ask for MCP Route Review → Shape the minimum proof packet →

Next honest step

Filter by trust and authority before the model ranks anything

If runtime discovery is part of the control plane, the first production lane should surface only the tools this caller can safely consider, not a giant mixed-authority marketplace.

See the bounded onboarding path → Open the managed path →

Production follow-through

If this article reframes discovery as mediation, start with the MCP marketplace proof guide, the free-proof versus paid-execution guide, and the pricing boundary, then use these pages to sharpen the operator model around the current manifest-and-governance signal too: the core security model, the auth-versus-authority split, workflow fit versus trust class, governed capability surfaces, and the checklist for real remote readiness.

MCP Has a Security Model

Scope, principals, and evidence are the pre-ranking filters that keep giant catalogs from becoming mixed-authority traps.

Identity vs Authority

A directory entry only gets safer when authentication narrows discovery and the backend authority still matches the caller after the remote hop.

Workflow Fit vs Trust Class

The right shortlist starts by separating what job the server improves from what authority it carries.

Governed Capability Surfaces

The safer answer is not raw endpoint sprawl. It is a bounded capability surface with visible authority and policy.

Remote MCP Production Readiness Checklist

Auth, scope, tenant isolation, governors, recovery, and auditability belong in one operator checklist.

Fleet follow-through

The candidate pool stays useful only if the runtime stays narrow under load

Once trust filters narrow the pool, the next operator questions are what breaks in the loop, how shared provider budgets are contained, and how credentials stay narrow as more agents come online.

LLM APIs in Agent Loops

What actually breaks once retries, tool use, and unattended execution are live.

Designing Agent Fleets That Survive Rate Limits

How shared provider budgets and retry windows turn discovery and execution into a fleet coordination problem.

API Credentials in Autonomous Agent Fleets

Why bounded discovery still fails if the credential layer widens faster than the trust model.