Introduction

The AI Gateway automatically retries transient errors from AI providers using configurable strategies, improving reliability without overwhelming providers with rapid successive requests.

Retries use smart failure detection with configurable maximum attempts, backoff timing, and are configurable either globally or per-router.

Why Use Retries

  • Improve reliability by automatically recovering from transient provider failures
  • Handle network issues by retrying requests that fail due to connectivity problems
  • Maintain user experience by transparently recovering from failures
  • Reduce manual intervention by automatically handling temporary service disruptions

Rate limit handling is automatic - when providers return 429 status codes, the AI Gateway automatically removes them from load balancing rotation. Retries are for handling other types of failures like 5xx errors or network issues.

Quick Start

1

Create your configuration

Create ai-gateway-config.yaml with basic retry configuration (3 retries with 50ms constant delay):

global:
  retries:
    strategy: "constant"
    delay: "50ms"
    max-retries: 3
2

Start the gateway

npx @helicone/ai-gateway@latest --config ai-gateway-config.yaml
3

Test retries

Send a request to a provider that might have transient failures. The gateway will automatically retry 5xx errors and network issues:

curl -X POST http://localhost:8080/ai/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

✅ If the provider returns a 5xx error or network issue, the gateway will automatically retry up to 3 times with 50ms delays!

For complete configuration options and syntax, see the Configuration Reference.

Use Cases

Use case: Production API that needs high reliability and can tolerate slightly higher latency for better success rates.

global:
  retries:
    strategy: "constant"
    delay: "50ms"
    max-retries: 3

Total retry time: Up to 7 seconds

How Retries Work

The AI Gateway automatically retries failed requests using the configured strategy with smart failure detection.

1

Request Fails

Request fails with a retryable error (5xx server error or network issue)

2

Wait Period

AI Gateway waits for the calculated backoff period based on the configured strategy

3

Retry Request

Request is retried with the same parameters to the same provider

4

Repeat or Return

Process repeats until success or max retries reached

Retry Strategies

Constant Strategy

Fixed delay between retry attempts with jitter to prevent thundering herd.

retries:
  strategy: "constant"
  delay: "50ms"
  max-retries: 3

Timing: 50ms → 100ms → 150ms → fail

Exponential Strategy

Exponentially increasing delays with jitter and configurable bounds.

retries:
  strategy: "exponential"
  min-delay: "200ms"
  max-delay: "60s"
  max-retries: 5
  factor: 2.0

Timing: 200ms → 400ms → 800ms → 1.6s → 3.2s → fail

What Gets Retried

ConditionRetried?ReasonExamples
5xx Server Errors✅ YesTemporary provider issues500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout
Network Transport Errors✅ YesConnection/network problemsConnection refused, timeouts, DNS failures, TLS handshake errors
Stream Interruptions✅ YesStreaming response failuresStream ended unexpectedly, transport errors during streaming
4xx Client Errors❌ NoRequest format/auth issues400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 422 Unprocessable Entity
429 Rate Limits❌ NoHandled by load balancingProvider temporarily removed from rotation based on Retry-After header
2xx Success Responses❌ NoRequest succeeded200 OK, 201 Created, 202 Accepted
Auth/Config Errors❌ NoSetup/configuration issuesInvalid Helicone API keys, missing auth headers, provider not configured
Cache/Storage Errors❌ NoPersistent storage issuesCache operation failures, malformed request/response bodies

Load Balancing Integration

Retries and load balancing work together to maximize reliability:

  • Per-request: When a request fails, retries attempt the same provider multiple times
  • Per-provider: If a provider keeps failing requests, health monitoring removes it from the load balancer
  • Result: New requests automatically go to healthy providers while failed requests still get retried

Jitter and Backoff

All retry strategies include automatic jitter to prevent thundering herd problems:

  • Constant strategy: ±25% random variation in delay timing
  • Exponential strategy: ±25% random variation at each backoff level

You can override global retry settings for specific routers by adding a retries section to individual router configurations.