Guide · March 28, 2026 · Updated March 28, 2026 · Pedro Nunes

What Nobody Tells You About Building a Multi-Provider MCP Server

Every MCP server tutorial follows the same script: install the SDK, define a tool, return a response. That works for a single API. It does not work when you need an agent to reliably choose between, authenticate to, and call 1,000+ different APIs across 92 categories — and handle everything that goes wrong at 3am with no human.

APIs Scored

1,000+

Categories

92

Dimensions

20

Bugs Covered

7

We built Rhumb, an MCP server that proxies hundreds of real APIs for AI agents. Here's what actually broke, why the tutorials don't cover it, and what you need to know if you're building anything beyond hello-world.

Bug #1: The Slug Aliasing Problem

First surprise: APIs don't have stable identifiers.

Brave's search API appears as both brave-search-api and brave-search depending on which documentation page you read. When an agent asks to “search with Brave,” your MCP server needs to know these are the same service.

This isn't unique to Brave. We found alias collisions in payment providers (same company, multiple API versions with different names), communication platforms (SMS vs messaging vs voice — same provider, different “APIs”), and analytics tools (legacy vs v2 naming).

The fix isn't a lookup table. It's a canonical slug system with alias resolution that treats identity as a first-class problem. Every service in Rhumb has exactly one canonical identifier, with aliases mapped explicitly.

Why tutorials skip this: they show one API. You never hit naming collisions with one API.

Bug #2: Authentication Is Not a Solved Problem

The tutorials say: “Add your API key to the header.” That covers maybe 40% of real APIs.

Method How It Works Share
Bearer token Authorization: Bearer {key} ~45%
Custom header X-API-Key, X-Subscription-Token, Api-Key ~25%
Basic Auth Base64 encoded credentials ~15%
OAuth2 with refresh Token exchange + refresh cycle ~10%
Query parameter ?api_key=... ~5%

The problem isn't supporting all five patterns. It's that your MCP server needs to know which pattern each API uses before the agent's first call. If the agent sends a Bearer token to an API expecting X-API-Key, you get a 401 that tells the agent nothing useful.

Worse: some APIs accept the wrong auth method silently and return empty results instead of errors. The agent thinks it worked. It didn't.

What we built: A credential resolution layer that knows the auth pattern for each service. The agent provides a key; Rhumb knows how to present it.

Bug #3: The Payload Translation Trap

Your agent constructs a JSON payload. The API expects multipart form data.

This hits hardest with document processing APIs. An agent wants to send a file for parsing. It constructs a reasonable JSON body with the file content. The API returns 400 because it only accepts multipart uploads with specific field names.

The gap between “what the agent sends” and “what the API wants”

  • Parameter naming: query vs q vs search_query vs prompt
  • Body format: JSON vs form-encoded vs multipart
  • Array handling: tags=a,b,c vs tags[]=a&tags[]=b vs {"tags":["a","b"]}
  • Date formats: ISO 8601 vs Unix timestamps vs custom strings
  • Pagination: cursor vs offset vs page-number vs link-header

Why this matters for agents specifically: A human developer reads the docs and adapts. An agent will retry the same malformed request until it hits rate limits.

Bug #4: Error Messages That Lie

Here's a real error response from a production API:

            {"error": "An error occurred. Please try again later."}
          

An agent receiving this will try again later. The actual problem? Invalid API key format. Retrying will never help.

Good (Stripe-class)

              {
  "error": {
    "type": "invalid_request_error",
    "code": "parameter_missing",
    "param": "amount",
    "message": "Missing required param: amount"
  }
}
            

Bad (more common than you'd think)

              {"status": "error", "message": "Bad Request"}
            
              <html><body><h1>500 Internal Server Error</h1></body></html>
            

In our scoring of 1,000+ APIs, structured error responses (with error codes, specific parameters, and actionable messages) are a minority. Most APIs return human-readable error strings that agents can't reliably parse.

This is why error handling quality is the single highest-weighted dimension in our AN Score methodology. An API with great docs but bad errors will fail silently in production.

Bug #5: Rate Limits Without Information

Good APIs tell you exactly where you stand:

            X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 247
X-RateLimit-Reset: 1616000000
Retry-After: 30
          

Bad APIs return 429 and nothing else. The agent has to guess: wait 1 second? 10 seconds? 60 seconds? Back off exponentially?

Some APIs have undocumented secondary rate limits. GitHub's REST API has a primary rate limit (5,000/hour for authenticated requests) and a secondary rate limit on “content-creating” endpoints that's lower and not reflected in the rate limit headers. An agent creating issues or comments will hit the secondary limit and get a 403 with a message about “secondary rate limits” that doesn't appear in any getting-started guide.

Service AN Score Headers Retry-After Burst Docs
Stripe 8.1
GitHub 7.8 ⚠️ ⚠️ (secondary limits not in headers)
PayPal 4.9 Inconsistent

Bug #6: The Sandbox Illusion

“Just use the sandbox.” Every API says this. Few deliver.

Sandbox requires production credentials Defeats the purpose
Sandbox has different behavior You test against a lie
Sandbox has stricter rate limits Can't performance test
Sandbox doesn't support all endpoints Partial testing only
Sandbox requires CAPTCHA to create Agents can't self-provision
"Sandbox" is just a flag on production One wrong call and you bill a real customer

Real example: PayPal's sandbox requires CAPTCHA verification to create accounts. That one detail drops it from “agent-friendly” to “requires a human for setup.” And setup isn't once — sandbox credentials expire.

Bug #7: The Versioning Time Bomb

APIs change. Response fields get renamed, deprecated, or removed. Versioning is supposed to protect you.

Gold Standard (Stripe)

Explicit API version in every request. Your agent pins a version and gets consistent responses. Forever.

Most APIs

Unversioned endpoints that change without notice. Your agent's response parser breaks silently when a field name changes from email_address to emailAddress.

The insidious part: breaking changes often affect edge cases first. Your happy-path tests pass. Your agent in production hits the edge case at 3am, fails silently, and you find out Monday morning.

What We Learned

After building through all of this, we distilled the problems into a scoring framework. Every API gets evaluated on 20 dimensions across two axes:

Execution (70%)

Can the agent get work done? Error handling, schema stability, idempotency, latency, rate limit transparency.

Access Readiness (30%)

Can the agent get started? Signup friction, auth complexity, docs quality, sandbox, rate limits.

Some results that surprised us

Service AN Score Takeaway
Stripe 8.1 Genuinely built for automation
Twilio 8.0 What agent-native almost looks like
GitHub 7.8 Excellent but sneaky secondary rate limits
Resend 7.8 Newer email API that got the details right from day one
SendGrid 6.4 Dominant but showing age in error handling
PayPal 4.9 CAPTCHA sandbox alone is disqualifying for autonomous use
Salesforce 4.8 Powerful but OAuth dance is hostile to agents

The full leaderboard across 92 categories is at rhumb.dev/leaderboard. The MCP tools are open source: npx rhumb-mcp gives your agent access to scores, failure modes, and alternatives at decision time.

If You're Building an MCP Server

1

Treat service identity as a first-class problem.

You will hit naming collisions.

2

Build an auth resolution layer.

Don't make the agent know which header format each API uses.

3

Expect payload translation.

What the agent sends and what the API wants are rarely the same shape.

4

Parse errors defensively.

Most APIs don't return structured errors. Build fallback parsing.

5

Implement rate limit tracking per-provider.

Don't share a single backoff strategy across APIs with different limits.

6

Test against production, not just sandboxes.

Many sandboxes are incomplete or behave differently.

7

Pin API versions where possible.

If the API doesn't support versioning, monitor for breaking changes.

The MCP protocol gives you a great transport layer. It tells you nothing about what happens when your tools hit real APIs. That part is on you.

Try It

See how your tools score

We've scored 1,000+ services across 92 categories on 20 dimensions. The methodology is published. The MCP server is open source.