← Blog · MCP directories · May 1, 2026 · Rhumb · 9 min read
Answer target: MCP marketplace readiness

MCP marketplaces are discovery, not production proof

The MCP ecosystem is getting noisier fast: directories, marketplaces, registries, package managers, skills indexes, and one-click installers all promise more surface area. That is useful for recall. It is not enough for an unattended agent.

Current truth boundary

Treat directory inventory as a way to find candidates, not as a claim that the candidate is safe to run. Rhumb separates discovery breadth (999 scored services and 435 capability definitions) from the narrower current callable surface (18 callable providers strongest for research, extraction, generation, and narrow enrichment).

The mistake: ranking before filtering

A giant MCP directory solves a real problem: agents and developers need to discover what exists. But production selection has a different failure mode. If the model sees every listed server before trust filters run, semantic relevance can outrank authority, freshness, cost, and blast radius.

The operator sequence should be recall first, proof second. Use marketplaces to discover candidates, then collapse the pool to the smallest workflow-safe set before the agent plans with it.

What each directory signal can and cannot prove

Large marketplace

Useful for

Broad recall. You find projects, servers, skills, and adapters that would be invisible in a hand-built shortlist.

Cannot prove

The model may treat inventory volume as quality and rank a mixed-authority surface before trust filters run.

Curated registry

Useful for

Editorial inclusion, metadata, install hints, and maintenance signals reduce obvious junk.

Cannot prove

A curated listing still does not prove caller-specific auth, runtime scope, typed denials, or freshness at invocation time.

Install wrapper

Useful for

One-click setup and client config make it easier to test the server quickly.

Cannot prove

Convenient install can widen authority if the wrapper writes credentials or tool config without preserving the actor and rollback path.

Quality score

Useful for

A score can make the shortlist more legible and expose static weaknesses faster than manual inspection.

Cannot prove

Static scoring must stay separate from live execution proof. Scores are the map; failure modes are the production test.

The proof filters before promotion

A server should not move from directory hit to agent candidate until these filters are explicit. They are the difference between a useful marketplace and a tool graveyard with better search.

Workflow: what repeat job does this server make safer, cheaper, or more reliable?
Trust class: local helper, read-mostly tool, reversible write, high-side-effect execution, or shared remote integration?
Authority: which principal acts after installation, and which tools are hidden from this caller?
Credential rail: public, static key, delegated user auth, governed key, BYOK, Agent Vault, or provider-pinned account?
Budget owner: whose quota, contract, wallet, or shared provider limit burns when the agent retries?
Denied neighbor: what adjacent tenant, path, domain, row, amount, or action must fail closed?
Freshness: did handshake, auth, schema, and endpoint behavior pass recently enough for an unattended loop?
Receipt: can an operator reconstruct success, denial, retry, cost, provider drift, and recovery after the call?

A safer selection flow

1. Use directories for recall

Search marketplaces, registries, GitHub, and docs to gather candidates. Keep this phase broad and cheap, but do not promote anything yet.

2. Collapse by workflow

Remove servers that do not fit the exact repeated job. A generic catalog hit is not useful if the agent still has to improvise the action shape.

3. Apply authority filters

Check trust class, auth shape, caller-visible tool scope, side-effect class, and quota owner before semantic relevance ranks the final set.

4. Run the denied-neighbor drill

Pick the nearest unsafe adjacent target and prove it fails closed with a typed denial before you let the agent repeat the happy path.

5. Keep a receipt

Preserve capability, server/provider, principal, credential mode, cost, denial, outcome, and recovery context so retries do not become folklore.

Verified vertical directories need a second proof gate

Fresh MCP submissions are starting to package vertical discovery — lawyers, vendors, marketplaces, data providers — as verified agent surfaces. That is useful, but verified discovery is still not execution authority. Regulated or high-trust verticals need proof that the listing, license, jurisdiction, freshness, and allowed action all match the exact workflow before the agent treats a directory hit as a route.

Separate directory freshness from professional or regulatory verification. A fresh listing can still carry a stale license, jurisdiction, disciplinary status, or practice-area constraint.
Require the route card to preserve verified source, verification time, allowed jurisdiction, caller intent, and explicit non-advice boundary before the agent ranks the candidate.
Run the denied-neighbor test against an adjacent jurisdiction, expired credential, unverified profile, or out-of-scope service instead of only checking the happy-path listing.
Keep recommendation, contact, booking, payment, and advice-like actions as separate authority lanes. Discovery proof should not silently graduate into regulated side effects.

Where Rhumb fits

Rhumb should not try to be the loudest marketplace. The stronger wedge is workflow-level proof: resolve the capability, estimate the route, choose the credential rail, cap the budget, test the denied neighbor, and preserve the receipt.

The operator starts with a capability or repeat job, not a favorite server name.
The answer should distinguish discovery breadth from the current callable execution surface.
Provider choice, credential rail, budget ceiling, denied neighbor, and trace proof matter more than catalog size.
The team wants to prove one narrow workflow before exposing a wider tool set to an agent loop.
Pricing stays downstream of proof: candidate search, scoring, route-card inspection, and fit evaluation remain free until Rhumb routes a selected capability through a paid execution rail.

Signals that are not enough

Thousands of listed servers
A green install command
A maintainer badge or launch post
Tool count or star count
A demo that never tests the wrong tenant, wrong path, wrong budget, or expired credential
E-006 proof sprint

Have one server or workflow you want to promote? Prove the boundary first.

Send the repeat job, candidate server/provider, credential rail, expected volume, denied neighbor, and receipt fields you would need before letting an agent loop. The useful artifact is not another list; it is a proof path for one workflow.