Automatic Retries
Customizable backoff retry logic for failed AI provider requests
Introduction
The AI Gateway automatically retries transient errors from AI providers using configurable strategies, improving reliability without overwhelming providers with rapid successive requests.
Retries use smart failure detection with configurable maximum attempts, backoff timing, and are configurable either globally or per-router.
Why Use Retries
- Improve reliability by automatically recovering from transient provider failures
- Handle network issues by retrying requests that fail due to connectivity problems
- Maintain user experience by transparently recovering from failures
- Reduce manual intervention by automatically handling temporary service disruptions
Rate limit handling is automatic - when providers return 429 status codes, the AI Gateway automatically removes them from load balancing rotation. Retries are for handling other types of failures like 5xx errors or network issues.
Quick Start
Create your configuration
Create ai-gateway-config.yaml
with basic retry configuration (3 retries with 50ms constant delay):
Start the gateway
Test retries
Send a request to a provider that might have transient failures. The gateway will automatically retry 5xx errors and network issues:
✅ If the provider returns a 5xx error or network issue, the gateway will automatically retry up to 3 times with 50ms delays!
For complete configuration options and syntax, see the Configuration Reference.
Use Cases
Use case: Production API that needs high reliability and can tolerate slightly higher latency for better success rates.
Total retry time: Up to 7 seconds
Use case: Production API that needs high reliability and can tolerate slightly higher latency for better success rates.
Total retry time: Up to 7 seconds
Use case: Background batch processing where reliability is more important than latency.
Total retry time: Up to 62 seconds
Use case: Real-time applications that need quick failure detection with minimal retry overhead.
Total retry time: Up to 1.5 seconds
Use case: High-volume processing with moderate reliability requirements.
Total retry time: Up to 8 seconds
How Retries Work
The AI Gateway automatically retries failed requests using the configured strategy with smart failure detection.
Request Fails
Request fails with a retryable error (5xx server error or network issue)
Wait Period
AI Gateway waits for the calculated backoff period based on the configured strategy
Retry Request
Request is retried with the same parameters to the same provider
Repeat or Return
Process repeats until success or max retries reached
Retry Strategies
Constant Strategy
Fixed delay between retry attempts with jitter to prevent thundering herd.
Timing: 50ms → 100ms → 150ms → fail
Exponential Strategy
Exponentially increasing delays with jitter and configurable bounds.
Timing: 200ms → 400ms → 800ms → 1.6s → 3.2s → fail
What Gets Retried
Condition | Retried? | Reason | Examples |
---|---|---|---|
5xx Server Errors | ✅ Yes | Temporary provider issues | 500 Internal Server Error , 502 Bad Gateway , 503 Service Unavailable , 504 Gateway Timeout |
Network Transport Errors | ✅ Yes | Connection/network problems | Connection refused, timeouts, DNS failures, TLS handshake errors |
Stream Interruptions | ✅ Yes | Streaming response failures | Stream ended unexpectedly, transport errors during streaming |
4xx Client Errors | ❌ No | Request format/auth issues | 400 Bad Request , 401 Unauthorized , 403 Forbidden , 404 Not Found , 422 Unprocessable Entity |
429 Rate Limits | ❌ No | Handled by load balancing | Provider temporarily removed from rotation based on Retry-After header |
2xx Success Responses | ❌ No | Request succeeded | 200 OK , 201 Created , 202 Accepted |
Auth/Config Errors | ❌ No | Setup/configuration issues | Invalid Helicone API keys, missing auth headers, provider not configured |
Cache/Storage Errors | ❌ No | Persistent storage issues | Cache operation failures, malformed request/response bodies |
Load Balancing Integration
Retries and load balancing work together to maximize reliability:
- Per-request: When a request fails, retries attempt the same provider multiple times
- Per-provider: If a provider keeps failing requests, health monitoring removes it from the load balancer
- Result: New requests automatically go to healthy providers while failed requests still get retried
Jitter and Backoff
All retry strategies include automatic jitter to prevent thundering herd problems:
- Constant strategy: ±25% random variation in delay timing
- Exponential strategy: ±25% random variation at each backoff level
You can override global retry settings for specific routers by adding a retries
section to individual router configurations.