Trust

Why trust Rhumb?

A scoring system is only as good as its integrity. Here's how we earn trust — and what we're still working on.

Trust should be inspectable

Agent-native trust means we do not ask humans or agents to rely on vibes. Each trust claim on this page should resolve to a public, inspectable artifact.

Method: published methodology
Self-audit: current self-assessment and historical baseline
Machine-readable entry point: llms.txt and agent capabilities JSON
Canonical definitions: public glossary
Dispute path: provider guide with public GitHub template, private email path, and a 5-business-day response target

⚖

Neutrality is non-negotiable

Scores cannot be bought. We do not accept payment for higher scores. Our pricing model charges for operations (API access, webhooks, enterprise features) — never for outcomes (score changes). This is a hard boundary, not a preference.

🔬

We scored ourselves first

Before scoring anyone else, we ran the AN Score methodology on Rhumb itself. We first published a 3.5/10 (L1 Emerging) baseline before launch, then later published a fuller 6.8/10 (L3 Fluent) self-assessment . The original March 11 baseline is still public too. If we can't be honest about our own shortcomings, why would you trust our assessment of anyone else?

📖

Open methodology

Our scoring methodology is fully documented — 20 dimensions, 2 axes, tier definitions, data sources, limitations. You can read exactly how every score is calculated. The code is open source (MIT) .

⚡

Dispute any score

Disagree with a score? File a dispute via GitHub issue template or email . Every dispute is reviewed, we aim to respond within 5 business days, and public outcomes live on GitHub . We don't hide from criticism — we use it to improve.

📄

CC BY 4.0 data license

All scores and failure modes are licensed Creative Commons Attribution 4.0 International . Use them in your research, embed them in your products, cite them in your papers. We just ask for attribution.

⚠

Honest about limitations

Most scores start as documentation-derived , built from published API docs and provider claims. These capture what should work, but cannot catch undocumented behaviors, silent schema changes, or production edge cases.

We are actively closing this gap. A growing share of our reviews are runtime-backed — supported by real evidence from agent execution, tester-generated probes, and live API calls. Every review labels its evidence source. You can inspect the evidence behind any score via our API .

We will always be transparent about what we know and how we know it. Our goal is 100% runtime-backed coverage — but we won't hide the gap while we close it.