Why trust Rhumb?
A scoring system is only as good as its integrity. Here's how we earn trust — and what we're still working on.
Trust should be inspectable
Agent-native trust means we do not ask humans or agents to rely on vibes. Each trust claim on this page should resolve to a public, inspectable artifact.
- Method: published methodology
- Self-audit: Rhumb scored itself first
- Machine-readable entry point: llms.txt and docs
- Dispute path: public GitHub issue or provider email, both linked below
Neutrality is non-negotiable
Scores cannot be bought. We do not accept payment for higher scores. Our pricing model charges for operations (API access, webhooks, enterprise features) — never for outcomes (score changes). This is a hard boundary, not a preference.
We scored ourselves first
Before scoring anyone else, we ran the AN Score methodology on Rhumb itself. We scored 3.5 out of 10 (L1 Limited) and published the full breakdown . If we can't be honest about our own shortcomings, why would you trust our assessment of anyone else?
Open methodology
Our scoring methodology is fully documented — 20 dimensions, 2 axes, tier definitions, data sources, limitations. You can read exactly how every score is calculated. The code is open source (MIT) .
Dispute any score
Disagree with a score? File a dispute via GitHub issue or email . Every dispute is reviewed, and outcomes are public. We don't hide from criticism — we use it to improve.
CC BY 4.0 data license
All scores and failure modes are licensed Creative Commons Attribution 4.0 International . Use them in your research, embed them in your products, cite them in your papers. We just ask for attribution.
Honest about limitations
Most scores start as documentation-derived , built from published API docs and provider claims. These capture what should work, but cannot catch undocumented behaviors, silent schema changes, or production edge cases.
We are actively closing this gap. Over 20% of our reviews are now runtime-backed — supported by real evidence from agent execution, tester-generated probes, and live API calls. Every review labels its evidence source. You can inspect the evidence behind any score via our API .
We will always be transparent about what we know and how we know it. Our goal is 100% runtime-backed coverage — but we won't hide the gap while we close it.