Create Your First Router
Build your first custom router with load balancing, caching, and rate limiting in 5 minutes
Ready to unlock the full power of the AI Gateway? This guide will walk you through creating custom routers with load balancing, caching, and rate limiting. You’ll go from basic routing to production-ready configurations.
Prerequisites: Make sure you’ve completed the main quickstart and have the gateway running with your API keys configured.
What Are Routers?
Think of routers as separate “virtual gateways” within your single AI Gateway deployment. Each router has its own:
- URL endpoint -
http://localhost:8080/router/{name}
- Load balancing strategy - How requests are distributed across providers
- Provider pool - Which LLM providers are available
- Features - Caching, rate limiting, retries, and more
This lets you have different configurations for different use cases - all from one gateway deployment.
Create Your First Router
Step 1: Basic Router Setup
Let’s start with a basic router configuration. Create a file called ai-gateway-config.yaml
:
What this does:
- Creates a router named
my-router
- Available at
http://localhost:8080/router/my-router
- Uses latency-based load balancing between OpenAI and Anthropic
- Automatically routes to whichever provider responds fastest
Save the configuration
Save the YAML above as ai-gateway-config.yaml
in your current directory.
Restart the gateway
Test your router
🎉 Success! Your request was automatically load-balanced between OpenAI and Anthropic based on which responds faster.
Step 2: Add Intelligent Caching
Now let’s add caching to dramatically reduce costs and improve response times:
What this adds:
- Caches identical requests for 1 hour
- Subsequent identical requests return instantly from cache
- Can reduce costs by 90%+ for repeated requests
Update your configuration
Replace your ai-gateway-config.yaml
with the configuration above.
Restart the gateway
Test caching
Make the same request twice and notice the second one is much faster:
Step 3: Rate Limit per Environment
Real applications need different rate limits for different environments. Rate limiting in the AI Gateway works per-API-key, which requires authentication to identify users. Let’s set up Helicone authentication and create production and development routers with appropriate protections.
Authentication Required: Rate limiting is applied per-API-key, so you need Helicone authentication enabled to track and limit requests for different users.
First, get your Helicone API key:
- Go to Helicone Settings
- Click “Generate New Key”
- Copy the key (starts with
sk-helicone-
) - Set it as an environment variable:
Now create the configuration with authentication and rate limits:
What this creates:
Router | Endpoint | Rate Limit | Use Case |
---|---|---|---|
production | /router/production... | 1000/min | High-traffic customer requests |
development | /router/development... | 100/hour | Cost-controlled development |
Update configuration
Replace your ai-gateway-config.yaml
with the multi-environment config above.
Restart the gateway
Test your routers
Now requests require authentication. Test each environment:
Step 4: Use in Your Applications
Just change the base URL to use different routers. Remember to use your Helicone API key for authentication:
Key Concepts You’ve Learned
Click on these if you’d like to dive deeper:
What’s Next?
You now have a solid foundation with custom routers! Here are the next steps to explore:
Deploy Your Router
Learn how to deploy your router to production