What Nobody Tells You About Building a Multi-Provider MCP Server
Every MCP server tutorial follows the same script: install the SDK, define a tool, return a response. That works for a single API. It does not work when you need an agent to reliably choose between, authenticate to, and call 1,000+ different APIs across 92 categories — and handle everything that goes wrong at 3am with no human.
APIs Scored
1,000+
Categories
92
Dimensions
20
Bugs Covered
7
We built Rhumb, an MCP server that proxies hundreds of real APIs for AI agents. Here's what actually broke, why the tutorials don't cover it, and what you need to know if you're building anything beyond hello-world.
Bug #1: The Slug Aliasing Problem
First surprise: APIs don't have stable identifiers.
Brave's search API appears as both brave-search-api
and brave-search
depending on which documentation page you read. When an agent asks to “search
with Brave,” your MCP server needs to know these are the same service.
This isn't unique to Brave. We found alias collisions in payment providers (same company, multiple API versions with different names), communication platforms (SMS vs messaging vs voice — same provider, different “APIs”), and analytics tools (legacy vs v2 naming).
The fix isn't a lookup table. It's a canonical slug system with alias resolution that treats identity as a first-class problem. Every service in Rhumb has exactly one canonical identifier, with aliases mapped explicitly.
Why tutorials skip this: they show one API. You never hit naming collisions with one API.
Bug #2: Authentication Is Not a Solved Problem
The tutorials say: “Add your API key to the header.” That covers maybe 40% of real APIs.
| Method | How It Works | Share |
|---|---|---|
| Bearer token | Authorization: Bearer {key} | ~45% |
| Custom header | X-API-Key, X-Subscription-Token, Api-Key | ~25% |
| Basic Auth | Base64 encoded credentials | ~15% |
| OAuth2 with refresh | Token exchange + refresh cycle | ~10% |
| Query parameter | ?api_key=... | ~5% |
The problem isn't supporting all five patterns. It's that your MCP server needs to know which pattern each API uses before the agent's first call. If the agent sends a Bearer token to an API expecting X-API-Key, you get a 401 that tells the agent nothing useful.
Worse: some APIs accept the wrong auth method silently and return empty results instead of errors. The agent thinks it worked. It didn't.
What we built: A credential resolution layer that knows the auth pattern for each service. The agent provides a key; Rhumb knows how to present it.
Bug #3: The Payload Translation Trap
Your agent constructs a JSON payload. The API expects multipart form data.
This hits hardest with document processing APIs. An agent wants to send a file for parsing. It constructs a reasonable JSON body with the file content. The API returns 400 because it only accepts multipart uploads with specific field names.
The gap between “what the agent sends” and “what the API wants”
- Parameter naming:
queryvsqvssearch_queryvsprompt - Body format: JSON vs form-encoded vs multipart
- Array handling:
tags=a,b,cvstags[]=a&tags[]=bvs{"tags":["a","b"]} - Date formats: ISO 8601 vs Unix timestamps vs custom strings
- Pagination: cursor vs offset vs page-number vs link-header
Why this matters for agents specifically: A human developer reads the docs and adapts. An agent will retry the same malformed request until it hits rate limits.
Bug #4: Error Messages That Lie
Here's a real error response from a production API:
{"error": "An error occurred. Please try again later."}
An agent receiving this will try again later. The actual problem? Invalid API key format. Retrying will never help.
Good (Stripe-class)
{
"error": {
"type": "invalid_request_error",
"code": "parameter_missing",
"param": "amount",
"message": "Missing required param: amount"
}
}
Bad (more common than you'd think)
{"status": "error", "message": "Bad Request"}
<html><body><h1>500 Internal Server Error</h1></body></html>
In our scoring of 1,000+ APIs, structured error responses (with error codes, specific parameters, and actionable messages) are a minority. Most APIs return human-readable error strings that agents can't reliably parse.
This is why error handling quality is the single highest-weighted dimension in our AN Score methodology. An API with great docs but bad errors will fail silently in production.
Bug #5: Rate Limits Without Information
Good APIs tell you exactly where you stand:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 247
X-RateLimit-Reset: 1616000000
Retry-After: 30
Bad APIs return 429 and nothing else. The agent has to guess: wait 1 second? 10 seconds? 60 seconds? Back off exponentially?
Some APIs have undocumented secondary rate limits. GitHub's REST API has a primary rate limit (5,000/hour for authenticated requests) and a secondary rate limit on “content-creating” endpoints that's lower and not reflected in the rate limit headers. An agent creating issues or comments will hit the secondary limit and get a 403 with a message about “secondary rate limits” that doesn't appear in any getting-started guide.
| Service | AN Score | Headers | Retry-After | Burst Docs |
|---|---|---|---|---|
| Stripe | 8.1 | ✅ | ✅ | ✅ |
| GitHub | 7.8 | ✅ | ⚠️ | ⚠️ (secondary limits not in headers) |
| PayPal | 4.9 | Inconsistent | ❌ | ❌ |
Bug #6: The Sandbox Illusion
“Just use the sandbox.” Every API says this. Few deliver.
Real example: PayPal's sandbox requires CAPTCHA verification to create accounts. That one detail drops it from “agent-friendly” to “requires a human for setup.” And setup isn't once — sandbox credentials expire.
Bug #7: The Versioning Time Bomb
APIs change. Response fields get renamed, deprecated, or removed. Versioning is supposed to protect you.
Gold Standard (Stripe)
Explicit API version in every request. Your agent pins a version and gets consistent responses. Forever.
Most APIs
Unversioned endpoints that change without notice. Your agent's response parser
breaks silently when a field name changes from email_address
to emailAddress.
The insidious part: breaking changes often affect edge cases first. Your happy-path tests pass. Your agent in production hits the edge case at 3am, fails silently, and you find out Monday morning.
What We Learned
After building through all of this, we distilled the problems into a scoring framework. Every API gets evaluated on 20 dimensions across two axes:
Execution (70%)
Can the agent get work done? Error handling, schema stability, idempotency, latency, rate limit transparency.
Access Readiness (30%)
Can the agent get started? Signup friction, auth complexity, docs quality, sandbox, rate limits.
Some results that surprised us
| Service | AN Score | Takeaway |
|---|---|---|
| Stripe | 8.1 | Genuinely built for automation |
| Twilio | 8.0 | What agent-native almost looks like |
| GitHub | 7.8 | Excellent but sneaky secondary rate limits |
| Resend | 7.8 | Newer email API that got the details right from day one |
| SendGrid | 6.4 | Dominant but showing age in error handling |
| PayPal | 4.9 | CAPTCHA sandbox alone is disqualifying for autonomous use |
| Salesforce | 4.8 | Powerful but OAuth dance is hostile to agents |
The full leaderboard across 92 categories is at rhumb.dev/leaderboard.
The MCP tools are open source: npx rhumb-mcp
gives your agent access to scores, failure modes, and alternatives at decision time.
If You're Building an MCP Server
Treat service identity as a first-class problem.
You will hit naming collisions.
Build an auth resolution layer.
Don't make the agent know which header format each API uses.
Expect payload translation.
What the agent sends and what the API wants are rarely the same shape.
Parse errors defensively.
Most APIs don't return structured errors. Build fallback parsing.
Implement rate limit tracking per-provider.
Don't share a single backoff strategy across APIs with different limits.
Test against production, not just sandboxes.
Many sandboxes are incomplete or behave differently.
Pin API versions where possible.
If the API doesn't support versioning, monitor for breaking changes.
The MCP protocol gives you a great transport layer. It tells you nothing about what happens when your tools hit real APIs. That part is on you.
Try It
See how your tools score
We've scored 1,000+ services across 92 categories on 20 dimensions. The methodology is published. The MCP server is open source.