Ready to unlock the full power of the AI Gateway? This guide will walk you through creating custom routers with load balancing, caching, and rate limiting. You’ll go from basic routing to production-ready configurations.
Prerequisites: Make sure you’ve completed the main quickstart and have the gateway running with your API keys configured.
What Are Routers?
Think of routers as separate “virtual gateways” within your single AI Gateway deployment. Each router has its own:
URL endpoint - http://localhost:8080/router/{name}
Load balancing strategy - How requests are distributed across providers
Provider pool - Which LLM providers are available
Features - Caching, rate limiting, retries, and more
This lets you have different configurations for different use cases - all from one gateway deployment.
Create Your First Router
Step 1: Basic Router Setup
Let’s start with a basic router configuration. Create a file called ai-gateway-config.yaml
:
routers :
my-router :
load-balance :
chat :
strategy : latency
providers :
- openai
- anthropic
What this does:
Creates a router named my-router
Available at http://localhost:8080/router/my-router
Uses latency-based load balancing between OpenAI and Anthropic
Automatically routes to whichever provider responds fastest
Save the configuration
Save the YAML above as ai-gateway-config.yaml
in your current directory.
Restart the gateway
npx @helicone/ai-gateway@latest --config ai-gateway-config.yaml
Test your router
import { OpenAI } from "openai" ;
const openai = new OpenAI ({
baseURL: "http://localhost:8080/router/my-router" ,
apiKey: "fake-api-key" , // Required by SDK, but gateway handles real auth
});
const response = await openai . chat . completions . create ({
model: "openai/gpt-4o-mini" ,
messages: [{ role: "user" , content: "Hello from my custom router!" }],
});
console . log ( response );
🎉 Success! Your request was automatically load-balanced between OpenAI and Anthropic based on which responds faster.
Step 2: Add Intelligent Caching
Now let’s add caching to dramatically reduce costs and improve response times:
cache-store :
type : "in-memory"
routers :
my-router :
load-balance :
chat :
strategy : latency
providers :
- openai
- anthropic
cache :
directive : "max-age=3600"
What this adds:
Caches identical requests for 1 hour
Subsequent identical requests return instantly from cache
Can reduce costs by 90%+ for repeated requests
Update your configuration
Replace your ai-gateway-config.yaml
with the configuration above.
Restart the gateway
npx @helicone/ai-gateway@latest --config ai-gateway-config.yaml
Test caching
Make the same request twice and notice the second one is much faster:
# First request - goes to provider
time curl -X POST http://localhost:8080/router/my-router/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "To be or not to be?"}]}'
# Second request - returns from cache instantly
time curl -X POST http://localhost:8080/router/my-router/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "To be or not to be?"}]}'
Step 3: Rate Limit per Environment
Real applications need different rate limits for different environments. Rate limiting in the AI Gateway works per-API-key , which requires authentication to identify users. Let’s set up Helicone authentication and create production and development routers with appropriate protections.
Authentication Required : Rate limiting is applied per-API-key, so you need Helicone authentication enabled to track and limit requests for different users.
First, get your Helicone API key:
Go to Helicone Settings
Click “Generate New Key”
Copy the key (starts with sk-helicone-
)
Set it as an environment variable:
export HELICONE_CONTROL_PLANE_API_KEY = "sk-helicone-your-api-key"
Now create the configuration with authentication and rate limits:
helicone :
# Set to `features: observability` to enable observability
features : auth
routers :
production :
rate-limit :
per-api-key :
capacity : 1000
refill-frequency : 1m # 1000 requests per minute
load-balance :
chat :
strategy : latency
providers :
- openai
- anthropic
cache :
directive : "max-age=1800" # 30 minutes for production freshness
development :
rate-limit :
per-api-key :
capacity : 100
refill-frequency : 1h # 100 requests per hour for cost safety
load-balance :
chat :
strategy : latency
providers :
- openai
- anthropic
cache :
directive : "max-age=7200" # 2 hours to reduce dev costs
What this creates:
Router Endpoint Rate Limit Use Case production /router/production...
1000/min High-traffic customer requests development /router/development...
100/hour Cost-controlled development
Update configuration
Replace your ai-gateway-config.yaml
with the multi-environment config above.
Restart the gateway
npx @helicone/ai-gateway@latest --config ai-gateway-config.yaml
Test your routers
Now requests require authentication. Test each environment:
# Production router
curl -X POST http://localhost:8080/router/production/chat/completions \
-H "Authorization: Bearer sk-helicone-your-api-key" \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Production test"}]}'
# Development router
curl -X POST http://localhost:8080/router/development/chat/completions \
-H "Authorization: Bearer sk-helicone-your-api-key" \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Development test"}]}'
Step 4: Use in Your Applications
Just change the base URL to use different routers. Remember to use your Helicone API key for authentication:
Production
Development
Node.js
import openai
client = openai.OpenAI(
base_url = "http://localhost:8080/router/production" ,
api_key = "sk-helicone-..." # Your Helicone API key
)
response = client.chat.completions.create(
model = "openai/gpt-4o-mini" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
Key Concepts You’ve Learned
Click on these if you’d like to dive deeper:
What’s Next?
You now have a solid foundation with custom routers! Here are the next steps to explore:
Deploy Your Router Learn how to deploy your router to production
Responses are generated using AI and may contain mistakes.