Never worry about provider outages again. The AI Gateway automatically routes your requests to the best available provider, with instant failover when things go wrong.
Currently, only BYOK (Bring Your Own Keys) and passthrough routing are supported. Pass-through billing (PTB) is coming soon.

The Problem

Using LLMs in production means dealing with:
  • Provider outages that break your app
  • Rate limits that block your users
  • Regional restrictions that limit availability
  • Vendor lock-in that prevents optimization

The Solution

Provider routing gives you access to the same model across multiple providers. When OpenAI goes down, your app automatically switches to Azure or AWS Bedrock. When you hit rate limits, traffic flows to another provider. All without changing your code.

How It Works

1

You request a model

Your app asks for gpt-4o-mini just like normal
2

Gateway finds providers

Consults the Model Registry to find all providers offering this model
3

Smart routing

Applies sorting algorithm (cheapest first) then attempts providers
4

Automatic failover

If a provider fails, instantly tries the next one
The result? Your request succeeds even when providers fail.

You request a model

The simplest approach lets the gateway handle everything:
// Just specify the model - gateway handles the rest
const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }]
});

// Behind the scenes, the gateway tries:
// OpenAI → Azure OpenAI → AWS Bedrock → Others
// Until one succeeds
The gateway only tries providers where you’ve configured API keys. See Provider Setup to add your keys.

Routing Options

Format: model: "gpt-4o-mini"Best for:
  • Maximum uptime in production
  • Automatic cost optimization
  • Zero-config reliability
How it works: The gateway tries ALL providers offering this model (sorted by cheapest first). Requests almost never fail because if one provider is down, another takes over instantly.Example scenario: Your production chat app needs to stay online no matter what. You don’t care which provider serves the request as long as it works.

How the Gateway Finds Models

The Model Registry is our source of truth for which providers support which models. This powers intelligent routing.

Two Ways to Access Models

Option 1: Passthrough Billing (PTB) - Coming Soon

Use Helicone’s API keys in supported regions. Zero configuration required.
model: "gpt-4o-mini"  // Will automatically use Helicone's keys
Available in: Major US/EU regions for popular providers (coming soon)

Option 2: Your Own Keys (BYOK)

Add your provider keys in Provider Settings. The gateway uses YOUR keys for all requests.
When you add a provider deployment, ALL models and regions that provider supports become available through your deployment. PTB fallback for reliability is coming soon.
Example: You add an Azure deployment in Brazil. When you request any Azure-supported model:
model: "gpt-4o-mini"  // Uses your Brazil deployment
The gateway uses your configured deployment for all requests.

Passthrough Routing (Unknown Models)

The gateway forwards ANY model/provider combination, even if not in our registry:
// Brand new model
model: "o3-preview/openai"

// Custom fine-tuned model
model: "ft:gpt-3.5-turbo:my-org::abc123/openai"
Important: Unknown models ONLY route through YOUR deployments (BYOK). The provider’s API determines if the model is valid. Cost tracking is best-effort for unknown models.

Smart Routing Algorithm

When multiple deployments are available, the gateway intelligently selects which to use:

Routing Priority

  1. Your deployments (BYOK) - Always tried first
  2. PTB endpoints - Automatic fallback for reliability (coming soon)

Selection Logic

Within each priority level, we:
  1. Sort by cost - Cheapest deployments first
  2. Load balance - If costs are equal or unknown, randomly distribute requests
Example with multiple Azure deployments: You have Azure Brazil + Azure US deployments. Routing order:
  1. Your cheapest deployment (e.g., Brazil if cheaper)
  2. Your other deployments (e.g., US)
  3. Helicone PTB endpoints (coming soon)
This ensures optimal cost while maintaining reliability.

Failover Triggers

The gateway automatically tries the next provider when encountering these errors:
ErrorDescription
429Rate limit errors
401Authentication errors
400Context length errors
408Timeout errors
500+Server errors
The gateway only attempts providers where you have configured API keys (BYOK). When Pass-through Billing (PTB) launches, the gateway will automatically try Helicone’s API keys as a fallback.

Real World Examples

Scenario: OpenAI Outage

Your production app uses GPT-4. OpenAI goes down at 3am.
// Your code doesn't change
const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Process this customer request" }]
});
What happens: Gateway automatically fails over to Azure OpenAI, then AWS Bedrock if needed. Your app stays online, customers never notice.

Scenario: Using Azure Credits

Your company has $100k in Azure credits to burn before year-end.
// Prioritize Azure but keep fallback for reliability
const response = await client.chat.completions.create({
  model: "gpt-4o-mini/azure,gpt-4o-mini",  
  messages: messages
});
What happens: Tries your Azure deployment first (using credits), but falls back to other providers if Azure fails. Balances credit usage with reliability.

Scenario: EU Compliance Requirements

GDPR requires EU customer data to stay in EU regions.
// Use your custom EU deployment
await client.chat.completions.create({
  model: "gpt-4o/azure/eu-frankfurt-deployment",  // Your CUID
  messages: messages
});
What happens: Requests ONLY go through your Frankfurt deployment. No data leaves the EU.

Next Steps