Route-level retry budget
Set max attempts, max elapsed time, max queued delay, max tokens or dollars, and max provider calls per route. Do not inherit a global retry policy blindly across tools with different side effects.
Retry and rate-limit budget checklist
An unattended agent can turn one 429 into a retry storm, one timeout into a duplicate write, or one fallback into unapproved provider spend. The production boundary is not "does the client retry?" It is whether the route can prove when it must stop.
Fast answer
The production checklist
Set max attempts, max elapsed time, max queued delay, max tokens or dollars, and max provider calls per route. Do not inherit a global retry policy blindly across tools with different side effects.
Name the budget owner: user, tenant, workspace, Rhumb-managed lane, customer key, provider account, or explicit test quota. The receipt should show which lane was charged or protected.
Separate read, search, estimate, create, update, send, delete, purchase, and external-message calls. Only replay when the route has an idempotency key or a verified no-side-effect class.
Record the retry-after header, provider reset time, chosen delay, jitter range, queue position, and whether the model is allowed to ask for a manual recovery step instead of hammering the provider.
Force a timeout after provider acceptance, duplicate the same request id, and verify the second call resolves to the original receipt or a typed duplicate denial instead of repeating the side effect.
When the budget is spent, return a typed denial with attempts, elapsed time, quota owner, protected provider, next retry window, and safe recovery path. Do not let the model improvise another route around the budget.
Failure fixtures
Expected: Respect Retry-After or reset metadata, stop at route budget, and receipt the protected quota owner.
Expected: Retry only idempotent or explicitly replay-safe classes; include backoff decision, elapsed ceiling, and final recovery hint.
Expected: Use idempotency key or status lookup before replay. A second side effect is a failed gate, even if the final response is 200.
Expected: Collapse duplicate intent into one receipt or deny after budget exhaustion; do not multiply provider calls because the planner rephrased the task.
Expected: Require a separate budget owner, data-use rule, credential lane, and receipt. Fallback is not a hidden retry path.
Trace evidence
Rate-limit and timeout handling only become operator-grade when every attempt is reconstructable. The evidence should identify the protected budget, the replay decision, the provider response, and the recovery path without depending on the model's explanation after the fact.
Copy-paste route card
MCP route:
Caller / tenant:
Operation and side-effect class:
Quota owner:
Credential lane:
Max attempts:
Max elapsed time:
Token / dollar / provider-call cap:
Retry-after / backoff rule:
Idempotency key or replay guard:
Forbidden fallback routes:
Exhaustion denial code:
Receipt fields: Common misreads
Related Rhumb guides
Fleet-level capacity patterns once multiple agents share provider quotas and retry pressure.
Trace fields and drills for reconstructing tool decisions after the agent moves on.
Turn one repeat route into a caller, credential, budget, denied-neighbor, and receipt proof packet.