Rhumb Intelligence api research · tool scoring

API Intelligence

Tool autopsies, head-to-head comparisons, and agent infrastructure deep-dives from Rhumb's 1,038-service scored index. Structured for autonomous parsing. Readable by humans.

79 posts
15 comparisons
6 autopsies
47 guides
9 articles
Next honest step

If you are here to make something work, do not get stranded in research mode.

The blog is the evidence layer. If you have one concrete MCP route with an allowed fixture, denied neighbor, authority owner, budget owner, and receipt fields, route it to MCP Route Review. If you are still choosing a capability, use capability-first onboarding or the managed lane.

Current production signal

The live MCP conversation is still about blast radius, security model, auth shape, tenant isolation, recovery, route cards, verified discovery, shared-budget control, loop discipline, and runtime evidence

The freshest production signal keeps clustering around scope constraints, principals, evidence, credential model, reliability or crash handling, tenant isolation, remote-hosted MCP operations, rate-limit pressure, route-card proof, verified vertical discovery, and token-burn discipline inside live loops. If you are here because MCP has to survive real operators instead of a demo, start with the Route Review intake or the eighteen proof pages below.

MCP Route Review

When the route is no longer abstract, bring one allowed fixture, one denied neighbor, authority and budget owners, and receipt fields for review.

MCP Route Hardening Checklist

Make one MCP tool call repeat-safe with a route card, denied neighbor, authority lane, budget ceiling, and proof.

MCP Server Quality Signals for Agents

A production quality standard for workflow fit, trust class, authority, scope, failure shape, and evidence.

MCP Has a Security Model

The shortest honest frame for the current debate: scope, principals, and evidence are the real operator boundary.

Prompt Injection, Scope Constraints, and Blast Radius

Why unconstrained parameters turn prompt injection into arbitrary file, repo, or network reach.

Tool-Level Permission Scoping in MCP

Why backend authority stays too wide until tool visibility, parameter shape, and write boundaries are explicit.

Remote MCP Auth: Identity vs Authority

Why a valid token still is not the control plane unless tool visibility, backend credentials, and evidence stay narrow after auth.

A Production Readiness Checklist for Remote MCP Servers

How to separate liveness from principal model, scope boundaries, tenant isolation, governors, and recovery.

MCP Observability: Logging, Auditing, and Debugging

Why evidence, audit trails, and retry-safe traces are what make quota pain and partial failures governable.

Governed Capability Surfaces

Why tool route cards, no-call branches, side-effect classes, and quota owners matter before an agent chooses a capability.

How to Evaluate MCP Servers

A selection guide for workflow fit, invocation maps, schema legibility, verified vertical claims, and runtime reality checks.

MCP Marketplaces Need Workflow Proof

Why directory matches, vertical verification, and marketplace rows are route candidates, not execution authority.

Runtime MCP Discovery Trust Filters

How live discovery should narrow candidates by caller, trust class, jurisdiction, freshness, and denied-neighbor behavior.

Multi-Tenant MCP Server Design

How shared MCP stays safe only when credentials, tool manifests, resources, and session state stay tenant-aware.

Agent State Management Recovery Patterns

How checkpointing, verification, and explicit recovery paths keep partial failures from turning into morning cleanup.

Designing Agent Fleets That Survive Rate Limits

What changes when one retry storm becomes a shared-budget problem across the whole fleet.

LLM APIs in Agent Loops

Why token burn, fallback churn, and repeated tool chatter usually surface before the visible 429 storm.

MCP Credential Lifecycle in Production

Why expiry, rotation, and revocation need explicit operator handling before a broken tool call becomes the first alert.

API Credentials in Autonomous Agent Fleets

How credential reuse, scope drift, and shared-key rotation turn one auth incident into a fleet-wide outage.

⚖ Comparison
🏆 Adyen raw score; Stripe practical default wins

Payment Provider Profiles for Agent Task Markets

Settlement receipts prove money moved. Provider-profile receipts prove the agent chose the right payment API for the route.

A scorecard for choosing payment APIs in autonomous task markets, with provider-profile receipt fields for Stripe, Adyen, Braintree, Lemon Squeezy, Square, and PayPal.

May 27, 2026 · 8 min read · AN 4.9 – 8.8
Read →
⚖ Comparison
🏆 AWS S3 wins

AWS S3 vs Cloudflare R2 vs Backblaze B2 for AI Agents

S3 dominates on execution. R2 on egress economics. B2 on raw storage cost.

AN Scores, egress costs, and agent-native access patterns compared across three object storage APIs.

March 20, 2026 · 8 min read · AN 6.6 – 8.1
Read →
⚖ Comparison
🏆 Vercel wins

Vercel vs Netlify vs Render for AI Agents

Vercel leads on execution. Render on simplicity. Netlify on ecosystem features.

Live AN Score data across the three dominant deployment platforms.

March 20, 2026 · 8 min read · AN 6.2 – 7.1
Read →
⚖ Comparison
🏆 Datadog wins

Datadog vs New Relic vs Grafana Cloud for AI Agents

Datadog leads on execution. Grafana on openness. New Relic on querying power.

Live AN Score data across the three dominant observability platforms.

March 20, 2026 · 8 min read · AN 7.0 – 7.8
Read →
⚖ Comparison
🏆 Stripe wins

Stripe vs Square vs PayPal for AI Agents

Stripe wins by default. Square for physical commerce. PayPal constraint-driven.

A side-by-side decision page for operators and agents, backed by current Rhumb payment scores.

March 17, 2026 · 9 min read · AN 4.9 – 8.1
Read →
⚖ Comparison
🏆 Anthropic wins

Anthropic vs OpenAI vs Google AI for AI Agents

Anthropic leads on execution. OpenAI on ecosystem breadth. Google on multimodal depth.

Live AN Score data across the three dominant model providers.

March 18, 2026 · 9 min read · AN 7.2 – 9.1
Read →
⚖ Comparison
🏆 Postmark raw score; Resend practical default wins

Postmark vs Resend vs SendGrid for AI Agents

Postmark leads raw score. Resend wins the self-serve default. SendGrid is constraint-driven by Twilio or marketing breadth.

A current email API scorecard: Postmark leads raw score, Resend remains the practical default, and SendGrid fits Twilio or mixed marketing workflows.

March 17, 2026 · 9 min read · AN 8.5 – 8.9
Read →
⚖ Comparison
🏆 Clerk wins

Auth0 vs Clerk vs Firebase Auth for AI Agents

Clerk is the default at 9.0. Auth0 is the enterprise-governance near-tie at 8.9. Firebase Auth is stack-contingent at 6.3.

A current auth API scorecard: Clerk leads raw score, Auth0 is nearly tied for enterprise governance, and Firebase Auth only wins inside GCP/Firebase stacks.

March 17, 2026 · 9 min read · AN 6.3 – 9.0
Read →
⚖ Comparison
🏆 Neon wins

Supabase vs PlanetScale vs Neon for AI Agents

Neon edges on raw score. Supabase on confidence and platform breadth. PlanetScale at MySQL scale.

The closest race in any category. All three within 0.5 points.

March 17, 2026 · 9 min read · AN 7.8 – 8.3
Read →
⚖ Comparison
🏆 Exa wins

Exa vs Tavily vs Serper vs Brave Search for AI Agents

Exa wins on semantic retrieval. Tavily on agent-first ergonomics. Serper on freshness. Brave on index independence.

Web search API comparison for research loops, with the contract and retrieval failures that matter once agents run unattended.

March 30, 2026 · 9 min read · AN 6.8 – 8.7
Read →
⚖ Comparison
🏆 PostHog wins

PostHog vs Mixpanel vs Amplitude for AI Agents

PostHog wins on breadth and agent-friendliness. Amplitude for warehouse-native enterprise.

A side-by-side decision page for analytics APIs.

March 17, 2026 · 9 min read · AN 6.2 – 7.9
Read →
⚖ Comparison
🏆 Pipedrive wins

HubSpot vs Salesforce vs Pipedrive for AI Agents

Pipedrive has least friction. Salesforce has governance ceiling. HubSpot broadest surface.

No CRM is agent-native yet. Three different flavors of friction.

March 17, 2026 · 10 min read · AN 4.6 – 5.2
Read →
⚖ Comparison
🏆 Twilio wins

Twilio vs Vonage vs Plivo for AI Agents

Twilio is the default. Vonage is the platform play. Plivo for cost optimization.

Live AN Score comparison across messaging APIs.

March 18, 2026 · 8 min read · AN 5.8 – 8.0
Read →
⚖ Comparison
🏆 Linear wins

Linear vs Jira vs Asana for AI Agents

Linear on API ergonomics. Jira on enterprise depth. Asana has the cleanest REST API.

All within 0.5 points — the tightest race in PM tooling.

March 18, 2026 · 9 min read · AN 6.5 – 7.0
Read →
⚕ Autopsy
8.0/10

Twilio API Autopsy: What Agent-Native Almost Looks Like

Best-in-class error ergonomics and idempotency. Friction lives in webhook signature verification edge cases.

Simple auth, idempotency, error codes that teach. The highest-scoring API in our database — and the friction that remains.

March 18, 2026 · 10 min read
Read →
⚕ Autopsy
8.1/10

Stripe API Autopsy: What 8.1/10 Agent-Native Actually Looks Like

Stripe is the payment ceiling because retries, auth, and error handling are disciplined. The remaining friction is mostly SCA, Radar opacity, and Connect complexity.

Strong auth, first-class idempotency, actionable payment errors, and the remaining failure modes operators can actually plan around.

March 30, 2026 · 10 min read
Read →
⚕ Autopsy
7.8/10

Shopify API Autopsy: The GraphQL Bet

GraphQL-only approach creates friction for agents preferring REST. Cost-based rate limiting is non-obvious.

Strong fundamentals gated behind GraphQL complexity. Query cost budgets, forced version migration, cursor pagination everywhere.

March 18, 2026 · 10 min read
Read →
⚕ Autopsy
4.8/10

Salesforce API Autopsy: The Enterprise Maze

Governance ceiling is real. Agents can't sign contracts or navigate compliance flows autonomously.

Governance 10.0, payment autonomy 2.0. SOQL barriers, governor limits, sandbox/production split, metadata complexity.

March 18, 2026 · 11 min read
Read →
⚕ Autopsy
4.6/10

HubSpot API Autopsy: What Breaks When Agents Try

Six failure modes. No idempotency keys. Rate limits that vary by tier without documentation.

Rate limit traps, cross-hub API inconsistency, OAuth maze, no idempotency — six failure modes dissected.

March 18, 2026 · 10 min read
Read →
⚕ Autopsy
6.8/10

We Scored Ourselves: Rhumb's AN Score Self-Assessment

Payment Autonomy scores 9.0, but Sandbox/Test Mode is just 4.0 — we're L3, not L4, by our own standard.

Rhumb applies its own 20-dimension AN Score methodology to itself. Score: 6.8/10 — every dimension, every gap, fully transparent.

March 20, 2026 · 12 min read
Read →
→ Guide
Intermediate

API Versioning Is Table Stakes. Agent Readiness Depends on Machine-Parseable Change Communication

An API can be versioned and still be operationally unstable for agents. The real readiness test is whether non-human clients can detect drift in time to fail safely.

April 14, 2026 · 8 min read · ~8 min
Read →
→ Guide
Intermediate

Signed MCP Receipts Create Evidence After the Call. They Do Not Make the Call Safe

Signed MCP tool-call receipts improve auditability and forensic confidence after execution, but they do not replace scope control, trust-class filtering, or authority checks before the call.

April 14, 2026 · 8 min read · ~8 min
Read →
→ Guide
Intermediate

Tool-Level Permission Scoping in MCP: Why Server Authentication Isn't Enough

Server authentication and tool authorization are different layers. Production MCP needs caller-scoped manifests, typed denials, and auditable tool boundaries.

April 4, 2026 · 8 min read · ~8 min
Read →
→ Guide
Intermediate

MCP Observability: Logging, Auditing, and Debugging Agent-Server Interactions in Production

Production MCP observability means principal-aware tool logs, typed errors, session trails, spend attribution, and checkpoints that make partial failure recoverable.

April 3, 2026 · 8 min read · ~8 min
Read →
→ Guide
Intermediate

Governed Capabilities Are Becoming the Real Control Plane for Agent Integrations

The safer agent interface is not raw endpoint sprawl and not merely fewer tools. It is a governed capability surface that preserves authority, policy, and failure semantics.

April 14, 2026 · 8 min read · ~8 min
Read →
→ Guide
Intermediate

Static MCP Scores Are a Baseline. Runtime Trust Is the Missing Overlay

Static MCP scoring and runtime trust are not competing systems. The useful operator model is baseline evaluation plus live overlays that catch drift, auth breakage, and caller-visible failure patterns.

April 14, 2026 · 8 min read · ~8 min
Read →
→ Guide
Intermediate

Flat 'Best MCP Server' Lists Hide the Decision That Actually Matters: Workflow Fit vs Trust Class

The real MCP selection question is not popularity. It is workflow fit plus trust class: what job the server improves, what authority it carries, and how safely it behaves once real use begins.

April 11, 2026 · 8 min read · ~8 min
Read →
→ Guide
Intermediate

What Nobody Tells You About Building a Multi-Provider MCP Server

Every MCP tutorial shows you 'hello world.' None of them warn you about the 16 different ways real APIs break when agents call them. Here's what we learned building an MCP server on top of 1,038 scored services.

March 28, 2026 · 10 min read · ~20 min
Read →
→ Guide
Intermediate

How APIs Fail When Agents Use Them: A Failure Engineering Guide

Failure mode data matters more than aggregate scores once agents run unattended. This guide maps six API failure categories and the telemetry needed to catch them.

March 31, 2026 · 12 min read · ~12 min
Read →
→ Guide
Intermediate

How to Secure Your API Keys for Agent Use

Three credential paths (Rhumb-managed, BYOK, Agent Vault), a storage hierarchy from OS keychain to plaintext, and honest threat modeling. No enterprise-grade theater.

March 18, 2026 · 8 min read · ~20 min
Read →
✦ Agent Infrastructure

How AI Agents Get Wallets and Pay for Things

Three paths to agent-autonomous payments: prepaid credits, x402 USDC on Base, and enterprise agent cards. Working code for each.

March 17, 2026 · 11 min read
Read →
✦ Access

Which SSO Gives Agents the Most Power?

We tried to bootstrap 23 developer tools autonomously. GitHub unlocked 8. Email unlocked 0. The full agent passport ranking.

March 14, 2026 · 9 min read
Read →
✦ Transparency

We Scored Ourselves First — Here's What We Found

Rhumb's initial March 11 self-score baseline: 3.5/10 (Emerging), published before launch. See the later full self-assessment for the current score.

March 11, 2026 · 12 min read
Read →
✦ Framework

The WCAG for AI Agents: Why Your Web App Isn't Built for Its Fastest-Growing User Base

Agent Accessibility Guidelines (AAG): 6 interaction channels × 3 compliance levels. The framework for making web apps work for autonomous AI agents.

March 10, 2026 · 10 min read
Read →
Agent Index · machine-parseable content map
79 entries
[Comparison] winner=Adyen raw score; Stripe practical default cat=payments
[Comparison] winner=AWS S3 cat=storage
[Comparison] winner=Vercel cat=devops
[Comparison] winner=Datadog cat=monitoring
[Comparison] winner=Stripe cat=payments
[Comparison] winner=Anthropic cat=ai-models
[Comparison] winner=Postmark raw score; Resend practical default cat=email
[Comparison] winner=Clerk cat=auth
[Comparison] winner=Neon cat=databases
[Infrastructure] winner=Scoped, rotating credentials cat=credentials
[Infrastructure] winner=Tier 1 rate-limit surfaces cat=reliability
[Comparison] winner=Pinecone cat=vector-databases
[Comparison] winner=Exa cat=web-search
[Comparison] winner=PostHog cat=analytics
[Comparison] winner=Pipedrive cat=crm
[Comparison] winner=Twilio cat=messaging
[Comparison] winner=Linear cat=project-management
[Autopsy] score=8.0 tier=ready cat=messaging
[Autopsy] score=8.1 tier=ready cat=payments
[Autopsy] score=7.8 tier=ready cat=commerce
[Autopsy] score=4.8 tier=emerging cat=crm
[Autopsy] score=4.6 tier=emerging cat=crm
[Autopsy] score=6.8 tier=developing cat=self-assessment
[Guide] cat=mcp-security
[Guide] cat=credential-lifecycle
[Guide] cat=mcp-architecture
[Guide] cat=production-readiness
[Guide] cat=mcp-reliability
[Guide] cat=mcp-security
[Guide] cat=reliability
[Guide] cat=ai-models
[Guide] cat=migration
[Article] cat=agent-infrastructure
[Article] cat=agent-infrastructure
[Article] cat=agent-readability
[Article] cat=access
[Article] cat=transparency
[Article] cat=framework
[Article] cat=payments