MCP Marketplaces Are Discovery, Not Production Proof

Current truth boundary

Treat directory inventory as a way to find candidates, not as a claim that the candidate is safe to run. Rhumb separates discovery breadth (1,038 scored services and 415 capability definitions) from the narrower current callable surface (16 callable providers strongest for research, extraction, generation, and narrow enrichment).

The mistake: ranking before filtering

A giant MCP directory solves a real problem: agents and developers need to discover what exists. But production selection has a different failure mode. If the model sees every listed server before trust filters run, semantic relevance can outrank authority, freshness, cost, and blast radius.

The operator sequence should be recall first, proof second. Use marketplaces to discover candidates, then collapse the pool to the smallest workflow-safe set before the agent plans with it.

What each directory signal can and cannot prove

Large marketplace

Useful for

Broad recall. You find projects, servers, skills, and adapters that would be invisible in a hand-built shortlist.

Cannot prove

The model may treat inventory volume as quality and rank a mixed-authority surface before trust filters run.

Curated registry

Useful for

Editorial inclusion, metadata, install hints, and maintenance signals reduce obvious junk.

Cannot prove

A curated listing still does not prove caller-specific auth, runtime scope, typed denials, or freshness at invocation time.

Install wrapper

Useful for

One-click setup and client config make it easier to test the server quickly.

Cannot prove

Convenient install can widen authority if the wrapper writes credentials or tool config without preserving the actor and rollback path.

Fetch or browser server

Useful for

URL tools are attractive because they turn arbitrary web tasks into one visible server capability.

Cannot prove

A generic fetch listing does not prove DNS, redirect, private-network, cloud-metadata, or credential-lane containment for the host that actually runs it.

Filesystem or repo server

Useful for

File and repository tools look high-value because they can give agents immediate context inside a project or workspace.

Cannot prove

A generic filesystem listing does not prove canonical path normalization, allowed roots, symlink handling, sibling-project denial, secret redaction, or write containment for the host that actually runs it.

Quality score

Useful for

A score can make the shortlist more legible and expose static weaknesses faster than manual inspection.

Cannot prove

Static scoring must stay separate from live execution proof. Scores are the map; failure modes are the production test.

The proof filters before promotion

A server should not move from directory hit to agent candidate until these filters are explicit. They are the difference between a useful marketplace and a tool graveyard with better search.

Workflow: what repeat job does this server make safer, cheaper, or more reliable?

Trust class: local helper, read-mostly tool, reversible write, high-side-effect execution, or shared remote integration?

Authority: which principal acts after installation, and which tools are hidden from this caller?

Credential rail: public, static key, delegated user auth, governed key, BYOK, Agent Vault, or provider-pinned account?

Budget owner: whose quota, contract, wallet, or shared provider limit burns when the agent retries?

Denied neighbor: what adjacent tenant, path, domain, row, amount, network range, or action must fail closed?

Egress policy: can fetch, browser, or crawl tools deny cloud metadata, loopback, private ranges, IPv6 ULA, and in-cluster service names before the request leaves the server?

Path policy: can filesystem, repo, workspace, or local-resource tools prove canonical path, allowed root, symlink decision, operation class, redaction rule, and denied sibling/parent/host-mount behavior before a file becomes model context?

Freshness: did handshake, auth, schema, and endpoint behavior pass recently enough for an unattended loop?

Receipt: can an operator reconstruct success, denial, retry, cost, provider drift, and recovery after the call?

Fetch-server marketplace drill

A fetch listing is not ready until the metadata neighbor fails closed.

The fresh MCP fetch SSRF report changes the marketplace checklist: URL-capable servers are not harmless read tools by default. A hosted agent can turn “fetch this page” into cloud metadata or internal-service access unless the listing carries egress policy evidence, not just install instructions.

Do not promote a listed fetch server because it can retrieve a public URL once. Promote it only after the same caller succeeds on an allowed public URL and gets a typed denial for a metadata-neighbor URL.

Record raw URL, normalized host, DNS answers, resolved IP class, redirect chain, credential mode, tenant, quota owner, response-size cap, and retry ceiling as part of the directory proof packet.

If the listing cannot explain whether the hosted server can reach 169.254.169.254, loopback, RFC1918 ranges, IPv6 local addresses, or Kubernetes service names, keep it in review instead of the agent-visible candidate set.

Filesystem-server marketplace drill

A filesystem listing is not ready until the neighboring path fails closed.

The fresh MCP filesystem security scans change the marketplace checklist: file, repo, and workspace tools are not harmless local read helpers by default. A hosted or local agent can turn “read this file” into adjacent-project, secret, parent-directory, or host-mount exposure unless the listing carries path-boundary evidence, not just a scan grade or install command.

Do not promote a listed filesystem or repo server because it can read one happy-path file. Promote it only after the same caller succeeds on an allowed path and gets typed denials for parent traversal, sibling workspace, hidden config, host mount, and out-of-policy write fixtures.

Record requested path, canonical path, allowed root or repo prefix, symlink decision, operation class, credential mode, tenant, quota or budget owner, output class, and redaction rule as part of the directory proof packet.

If the listing cannot explain whether the hosted server can see .env files, SSH keys, adjacent checkouts, parent directories, or mounted host volumes, keep it in review instead of the agent-visible candidate set.

A safer selection flow

1. Use directories for recall

Search marketplaces, registries, GitHub, and docs to gather candidates. Keep this phase broad and cheap, but do not promote anything yet.

2. Collapse by workflow

Remove servers that do not fit the exact repeated job. A generic catalog hit is not useful if the agent still has to improvise the action shape.

3. Apply authority filters

Check trust class, auth shape, caller-visible tool scope, side-effect class, and quota owner before semantic relevance ranks the final set.

4. Run the denied-neighbor drill

Pick the nearest unsafe adjacent target and prove it fails closed with a typed denial before you let the agent repeat the happy path.

5. Keep a receipt

Preserve capability, server/provider, principal, credential mode, cost, denial, outcome, and recovery context so retries do not become folklore.

Verified vertical directories need a second proof gate

Fresh MCP submissions are starting to package vertical discovery — lawyers, vendors, marketplaces, data providers — as verified agent surfaces. That is useful, but verified discovery is still not execution authority. Regulated or high-trust verticals need proof that the listing, license, jurisdiction, freshness, and allowed action all match the exact workflow before the agent treats a directory hit as a route.

Separate directory freshness from professional or regulatory verification. A fresh listing can still carry a stale license, jurisdiction, disciplinary status, or practice-area constraint.

Require the route card to preserve verified source, verification time, allowed jurisdiction, caller intent, and explicit non-advice boundary before the agent ranks the candidate.

Run the denied-neighbor test against an adjacent jurisdiction, expired credential, unverified profile, or out-of-scope service instead of only checking the happy-path listing.

Keep recommendation, contact, booking, payment, and advice-like actions as separate authority lanes. Discovery proof should not silently graduate into regulated side effects.

Where Rhumb fits

Rhumb should not try to be the loudest marketplace. The stronger wedge is workflow-level proof: resolve the capability, estimate the route, choose the credential rail, cap the budget, test the denied neighbor, and preserve the receipt.

The operator starts with a capability or repeat job, not a favorite server name.

The answer should distinguish discovery breadth from the current callable execution surface.

Provider choice, credential rail, budget ceiling, denied neighbor, and trace proof matter more than catalog size.

The team wants to prove one narrow workflow before exposing a wider tool set to an agent loop.

Pricing stays downstream of proof: candidate search, scoring, route-card inspection, and fit evaluation remain free until Rhumb routes a selected capability through a paid execution rail.

Signals that are not enough

Thousands of listed servers

A green install command

A maintainer badge or launch post

Tool count or star count

A demo that never tests the wrong tenant, wrong path, wrong budget, or expired credential

MCP Route Review fit check

A directory hit becomes reviewable only when one route has to repeat.

Marketplaces are good at finding candidates. E-009 is the fork after selection: can you name the exact MCP tool call, the unsafe neighbor that must fail closed, the credential lane and budget owner, the repeat volume, and the receipt or typed denial you would trust? If yes, ask for route review; if no, keep the work in server-listing proof.

Ask for MCP Route Review Shape the minimum proof packet

E-006 proof sprint

Have one server or workflow you want to promote? Prove the boundary first.

Send the repeat job, candidate server/provider, credential rail, expected volume, denied neighbor, and receipt fields you would need before letting an agent loop. The useful artifact is not another list; it is a proof path for one workflow.

Scope managed execution Check the pricing boundary Use the readiness checklist

MCP marketplaces are discovery, not production proof

The mistake: ranking before filtering

What each directory signal can and cannot prove

Useful for

Cannot prove

Useful for

Cannot prove

Useful for

Cannot prove

Useful for

Cannot prove

Useful for

Cannot prove

Useful for

Cannot prove

The proof filters before promotion

A fetch listing is not ready until the metadata neighbor fails closed.

A filesystem listing is not ready until the neighboring path fails closed.

A safer selection flow

Verified vertical directories need a second proof gate

Where Rhumb fits

Signals that are not enough

A directory hit becomes reviewable only when one route has to repeat.

Have one server or workflow you want to promote? Prove the boundary first.