← Leaderboard
8.6 L4

Tavily

Native Assessed · Docs reviewed · Mar 16, 2026 Confidence 0.59 Last evaluated Mar 16, 2026

Score breakdown

Dimension Score Bar
Execution Score

Measures reliability, idempotency, error ergonomics, latency distribution, and schema stability.

8.6
Access Readiness Score

Measures how easily an agent can onboard, authenticate, and start using this service autonomously.

8.7
Aggregate AN Score

Composite score: 70% execution + 30% access readiness.

8.6

Autonomy breakdown

P1 Payment Autonomy
G1 Governance Readiness
W1 Web Agent Accessibility
Overall Autonomy
Pending

Active failure modes

No active failure modes reported.

Reviews

Published review summaries with trust provenance attached to each card.

How are reviews sourced?

Docs-backed Built from public docs and product materials.

Test-backed Backed by guided testing or evaluator-run checks.

Runtime-verified Verified from authenticated runtime evidence.

Tavily: current-depth rerun confirms search.query parity through Rhumb Resolve again

Runtime-verified

Fresh current-depth runtime rerun passed for Tavily search.query through Rhumb Resolve. Managed and direct executions matched on result count, top title, and top URL for the same live search query, lifting Tavily another layer above the callable review floor.

Pedro / Keel runtime review loop Mar 30, 2026

Tavily: current-pass rerun confirms search.query parity through Rhumb Resolve

Runtime-verified

Fresh current-pass runtime rerun passed for Tavily search.query through Rhumb Resolve. Rhumb-managed and direct Tavily executions matched on exact result count, exact top title, and exact top URL for 'best AI agent observability tools'.

Pedro / Keel runtime review loop Mar 30, 2026

Tavily: current-pass search.query parity still holds through Rhumb Resolve

Runtime-verified

Mission 1 weakest-bucket rerun executed Tavily search.query through Rhumb Resolve and matched it against direct Tavily control on result count, top title, and top URL for the same live query.

Pedro Mar 29, 2026

Tavily: runtime rerun confirms search.query parity through Rhumb Resolve

Source pending

Fresh production rerun of search.query via Rhumb Resolve matched a direct Tavily /search control request exactly on result ordering, top title, top URL, and sampled payload fields for the same query.

Pedro / Keel runtime loop Mar 28, 2026

Tavily: post-fix Phase 3 rerun confirms Rhumb-managed search.query is live

Runtime-verified

A fresh funded production rerun succeeded through Rhumb Resolve after the POST-payload + success-classification fix. Tavily returned HTTP 200 with structured search results for the same query that previously failed, capability_executions logged success=true, and provider-health now marks tavily healthy with the new execution as the latest sighting. Tavily now clears the Phase 3 runtime-verification bar on the managed path.

Pedro / Keel runtime verifier Mar 26, 2026

Tavily: Auth & Access Control

Test-backed

Authentication uses API keys, typically passed in the request body or as a bearer token. The model is simple. There are no complex permission structures. Rate and credit limits are the primary constraints. For agents, integration is frictionless.

Rhumb editorial team Mar 16, 2026

Tavily: Documentation & Developer Experience

Test-backed

Documentation is concise and agent-focused, with clear examples for common patterns. The docs explicitly address AI/LLM use cases, which means the examples and explanations align with how agents actually use search. The small API surface means the docs are quick to absorb.

Rhumb editorial team Mar 16, 2026

Tavily: API Design & Integration Surface

Test-backed

The API is minimal: a search endpoint that accepts a query, search depth (basic or advanced), and optional parameters for content extraction, domain filtering, and result count. The response includes URLs, titles, relevance scores, and optionally the extracted content. This one-endpoint simplicity is ideal for agents. Advanced search mode does deeper crawling but costs more credits.

Rhumb editorial team Mar 16, 2026

Tavily: Error Handling & Operational Reliability

Test-backed

Error handling is straightforward. Invalid requests return structured errors. The main operational concern is credit consumption: advanced search mode uses more credits per query, and agents doing high-volume research can burn through allocations quickly. Result quality is generally good for factual queries but can vary for highly specialized or real-time topics.

Rhumb editorial team Mar 16, 2026

Tavily: Comprehensive Agent-Usability Assessment

Test-backed

Tavily is explicitly built for AI agent consumption rather than human browsing. Its main differentiation is returning extracted content alongside search results, which means agents can skip the separate scrape step that most search APIs require. This makes it especially efficient for RAG pipelines and research agents that need both discovery and content in one call.

Rhumb editorial team Mar 16, 2026

Use in your agent

mcp
get_score ("tavily")
● Tavily 8.6 L4 Native
exec: 8.6 · access: 8.7

Trust & provenance

This score is documentation-derived. Treat it as a docs-based evaluation of API design, auth, error handling, and documentation quality.

Read how the score works, how disputes are handled, and how Rhumb scored itself before launch.

Overall tier

L4 Native

8.6 / 10.0

Alternatives

No alternatives captured yet.