Treat directory inventory as a way to find candidates, not as a claim that the candidate is safe to run. Rhumb separates discovery breadth (1,038 scored services and 415 capability definitions) from the narrower current callable surface (16 callable providers strongest for research, extraction, generation, and narrow enrichment).
The mistake: ranking before filtering
A giant MCP directory solves a real problem: agents and developers need to discover what exists. But production selection has a different failure mode. If the model sees every listed server before trust filters run, semantic relevance can outrank authority, freshness, cost, and blast radius.
The operator sequence should be recall first, proof second. Use marketplaces to discover candidates, then collapse the pool to the smallest workflow-safe set before the agent plans with it.
What each directory signal can and cannot prove
Useful for
Broad recall. You find projects, servers, skills, and adapters that would be invisible in a hand-built shortlist.
Cannot prove
The model may treat inventory volume as quality and rank a mixed-authority surface before trust filters run.
Useful for
Editorial inclusion, metadata, install hints, and maintenance signals reduce obvious junk.
Cannot prove
A curated listing still does not prove caller-specific auth, runtime scope, typed denials, or freshness at invocation time.
Useful for
One-click setup and client config make it easier to test the server quickly.
Cannot prove
Convenient install can widen authority if the wrapper writes credentials or tool config without preserving the actor and rollback path.
Useful for
URL tools are attractive because they turn arbitrary web tasks into one visible server capability.
Cannot prove
A generic fetch listing does not prove DNS, redirect, private-network, cloud-metadata, or credential-lane containment for the host that actually runs it.
Useful for
File and repository tools look high-value because they can give agents immediate context inside a project or workspace.
Cannot prove
A generic filesystem listing does not prove canonical path normalization, allowed roots, symlink handling, sibling-project denial, secret redaction, or write containment for the host that actually runs it.
Useful for
A score can make the shortlist more legible and expose static weaknesses faster than manual inspection.
Cannot prove
Static scoring must stay separate from live execution proof. Scores are the map; failure modes are the production test.
The proof filters before promotion
A server should not move from directory hit to agent candidate until these filters are explicit. They are the difference between a useful marketplace and a tool graveyard with better search.
A fetch listing is not ready until the metadata neighbor fails closed.
The fresh MCP fetch SSRF report changes the marketplace checklist: URL-capable servers are not harmless read tools by default. A hosted agent can turn “fetch this page” into cloud metadata or internal-service access unless the listing carries egress policy evidence, not just install instructions.
A filesystem listing is not ready until the neighboring path fails closed.
The fresh MCP filesystem security scans change the marketplace checklist: file, repo, and workspace tools are not harmless local read helpers by default. A hosted or local agent can turn “read this file” into adjacent-project, secret, parent-directory, or host-mount exposure unless the listing carries path-boundary evidence, not just a scan grade or install command.
A safer selection flow
Search marketplaces, registries, GitHub, and docs to gather candidates. Keep this phase broad and cheap, but do not promote anything yet.
Remove servers that do not fit the exact repeated job. A generic catalog hit is not useful if the agent still has to improvise the action shape.
Check trust class, auth shape, caller-visible tool scope, side-effect class, and quota owner before semantic relevance ranks the final set.
Pick the nearest unsafe adjacent target and prove it fails closed with a typed denial before you let the agent repeat the happy path.
Preserve capability, server/provider, principal, credential mode, cost, denial, outcome, and recovery context so retries do not become folklore.
Verified vertical directories need a second proof gate
Fresh MCP submissions are starting to package vertical discovery — lawyers, vendors, marketplaces, data providers — as verified agent surfaces. That is useful, but verified discovery is still not execution authority. Regulated or high-trust verticals need proof that the listing, license, jurisdiction, freshness, and allowed action all match the exact workflow before the agent treats a directory hit as a route.
Where Rhumb fits
Rhumb should not try to be the loudest marketplace. The stronger wedge is workflow-level proof: resolve the capability, estimate the route, choose the credential rail, cap the budget, test the denied neighbor, and preserve the receipt.
Signals that are not enough
A directory hit becomes a hardening request only when one route has to repeat.
Marketplaces are good at finding candidates. E-007 is the fork after selection: can you name the exact MCP tool call, the unsafe neighbor that must fail closed, the credential lane and budget owner, the repeat volume, and the receipt or typed denial you would trust? If yes, that is a route-hardening ask, not another server-listing review.
Have one server or workflow you want to promote? Prove the boundary first.
Send the repeat job, candidate server/provider, credential rail, expected volume, denied neighbor, and receipt fields you would need before letting an agent loop. The useful artifact is not another list; it is a proof path for one workflow.