Blog / Self-Assessment
We Scored Ourselves
Rhumb applies its own 20-dimension AN Score methodology to itself. The result: 7.0/10 — Tier L3 (Fluent). Not agent-native yet, by our own standard.
Why Score Ourselves?
We built a scoring system for APIs. If we won't use it on ourselves, why should anyone trust it on others?
This isn't a marketing exercise. We applied the same 20 dimensions, the same weighting, the same severity framework we use for every service in our directory. The result is honest — and honestly humbling. We're L3 (Fluent), not L4 (Native). There are real gaps.
Every score below includes two lines: what we can legitimately claim, and what we can't. If you're evaluating Rhumb, this is the document to read.
Execution Dimensions (70% weight)
API Reliability
6.0✅ Reasonable uptime on Railway, structured error handling
⚠️ No published SLA, no status page, limited production traffic history
Error Ergonomics
8.0✅ Structured JSON errors, correct HTTP codes, x402 payment instructions, Retry-After headers
⚠️ No machine-readable error code enum beyond HTTP status
Schema Stability
7.0✅ Versioned at /v1, consistent response envelope, zero breaking changes
⚠️ API is young — MTBBC is undefined because there haven't been enough months to measure
Latency Distribution
7.5✅ Proxy P50 overhead 4.1ms, direct calls <200ms
⚠️ No published P99 figures, no multi-region deployment
Idempotency
8.0✅ Idempotency keys on execution, x402 replay prevention, GET naturally idempotent
⚠️ Full coverage on the execution path
Concurrent Behavior
6.0✅ Asyncio handles concurrent requests, per-agent rate limits
⚠️ No explicit documentation of concurrent connection handling or queue behavior
Cold-Start Latency
7.0✅ Persistent container (no serverless cold starts), health check keeps warm
⚠️ No published cold-start vs warm figures
Output Structure Quality
9.0✅ All structured JSON, consistent envelope, rich score responses with failure modes
⚠️ This is genuinely strong — structured data is in our DNA
State Leakage
8.0✅ Stateless by design, no implicit caching, no cross-agent data leakage
⚠️ Rate limit counters are the only per-request state
Graceful Degradation
6.0✅ Proxy handles upstream failures, capability execution reports fallbacks
⚠️ No CDN/cache layer for reads, no public health endpoint, single point of failure
Access Readiness Dimensions (30% weight)
Signup Autonomy
7.0✅ OAuth signup (GitHub + Google), x402 needs zero signup
⚠️ OAuth requires browser interaction — not ideal for headless agents
Payment Autonomy
9.0✅ x402 USDC zero-signup pay-per-call, Stripe prepaid, free tier for discovery
⚠️ No fiat wire/invoice for enterprise
Provisioning Speed
8.0✅ x402 instant, OAuth <30s to API key, MCP needs no key for discovery
⚠️ No programmatic key issuance API
Credential Management
5.0✅ Key rotation via dashboard
⚠️ Single key per user, no scoped tokens, no key management API — this is thin
Rate Limit Transparency
7.0✅ 429 returns Retry-After, per-agent limits enforced
⚠️ No published rate limit docs page, headers only on 429 not all responses
Documentation Quality
5.0✅ Methodology, quickstart, glossary, llms.txt, agent-capabilities.json
⚠️ No complete API reference, no Python SDK, OpenAPI disabled for security
Sandbox/Test Mode
4.0✅ Free tier as implicit sandbox
⚠️ No dedicated test environment — agents can't safely experiment without production consequences
Autonomy Dimensions (bonus)
Payment Integration
9.0✅ x402 USDC native, Stripe programmatic, budget controls, ledger API
⚠️ No programmatic refund API
Governance & Compliance
6.0✅ Execution logging, budget enforcement
⚠️ No compliance certs (SOC 2), no audit export, no GDPR deletion endpoint, ToS pending legal review
Web Agent Accessibility
8.0✅ Astro static HTML, JSON-LD, agent meta tags, keyboard navigable
⚠️ Missing ARIA labels on some dashboard components
Our Failure Modes
No Sandbox Environment
Agents testing integrations affect production data. A misfire on email.send sends a real email.
Workaround: Use discovery endpoints (free, read-only) for evaluation. Use x402 with small amounts for execution testing.
Single API Key Per User
Cannot scope access per agent or per capability. All-or-nothing access model.
Workaround: Use x402 path (no key needed, per-call payment acts as implicit scoping).
Documentation Gaps
Agent must discover API behavior through trial and error. No OpenAPI spec available publicly.
Workaround: Use MCP tool descriptions and llms.txt for agent-parseable documentation.
No Multi-Region
High latency for non-US users. Single Railway container in us-west.
Workaround: None currently. Plan for multi-region deployment post-launch.
Score Evidence Mostly Documentation-Derived
Scores may not reflect actual runtime behavior. Disclosed transparently on methodology page.
Runtime-backed reviews are labeled separately. Ratio is tracked and published when it meets our quality floor.
What This Tells Us
We're L3, not L4. We preach agent-native but our own access patterns have friction. The x402 zero-signup path is genuinely L4 — an agent can discover, evaluate, and pay for a capability with no human involvement. But the OAuth path, the dashboard, the credential management? L3 at best.
Sandbox is the biggest miss. Every API we score highly has a test mode. We don't. An agent integrating Rhumb has to make real calls to validate its workflow. For a platform that evaluates API quality for agents, this is ironic.
Our best feature is Payment Autonomy (9.0). x402 on USDC is genuinely novel. Zero signup, zero credential management, pay-per-call with cryptographic proof. This is the future of agent-to-service interaction, and we're one of the first to ship it.
Our worst dimension is Sandbox (4.0). We know. It's on the roadmap. But publishing this score with a 4.0 in it — instead of waiting until we fix it — is the whole point. If we only score ourselves when we look good, the methodology is worthless.
See How Your API Scores
We use the same methodology on 258+ services. Check your API's AN Score, or try the MCP server to evaluate programmatically.