Create Your First Router

Ready to unlock the full power of the AI Gateway? This guide will walk you through creating custom routers with load balancing, caching, and rate limiting. You’ll go from basic routing to production-ready configurations.

Prerequisites: Make sure you’ve completed the main quickstart and have the gateway running with your API keys configured.

What Are Routers?

Think of routers as separate “virtual gateways” within your single AI Gateway deployment. Each router has its own:

URL endpoint - http://localhost:8080/router/{name}
Load balancing strategy - How requests are distributed across providers
Provider pool - Which LLM providers are available
Features - Caching, rate limiting, retries, and more

This lets you have different configurations for different use cases - all from one gateway deployment.

Step 1: Basic Router Setup

Let’s start with a basic router configuration. Create a file called ai-gateway-config.yaml:

routers:
  my-router:
    load-balance:
      chat:
        strategy: latency
        providers:
          - openai
          - anthropic

What this does:

Creates a router named my-router
Available at http://localhost:8080/router/my-router
Uses latency-based load balancing between OpenAI and Anthropic
Automatically routes to whichever provider responds fastest

Save the configuration

Save the YAML above as ai-gateway-config.yaml in your current directory.

Restart the gateway

npx @helicone/ai-gateway@latest --config ai-gateway-config.yaml

Test your router

import { OpenAI } from "openai";

const openai = new OpenAI({
  baseURL: "http://localhost:8080/router/my-router",
  apiKey: "fake-api-key", // Required by SDK, but gateway handles real auth
});

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Hello from my custom router!" }],
});

console.log(response);

🎉 Success! Your request was automatically load-balanced between OpenAI and Anthropic based on which responds faster.

Step 2: Add Intelligent Caching

Now let’s add caching to dramatically reduce costs and improve response times:

cache-store:
  type: "in-memory"

routers:
  my-router:
    load-balance:
      chat:
        strategy: latency
        providers:
          - openai
          - anthropic
    cache:
      directive: "max-age=3600"

What this adds:

Caches identical requests for 1 hour
Subsequent identical requests return instantly from cache
Can reduce costs by 90%+ for repeated requests

Update your configuration

Replace your ai-gateway-config.yaml with the configuration above.

Restart the gateway

npx @helicone/ai-gateway@latest --config ai-gateway-config.yaml

Test caching

Make the same request twice and notice the second one is much faster:

# First request - goes to provider
time curl -X POST http://localhost:8080/router/my-router/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "To be or not to be?"}]}'

# Second request - returns from cache instantly
time curl -X POST http://localhost:8080/router/my-router/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "To be or not to be?"}]}'

Step 3: Rate Limit per Environment

Real applications need different rate limits for different environments. Rate limiting in the AI Gateway works per-API-key, which requires authentication to identify users. Let’s set up Helicone authentication and create production and development routers with appropriate protections.

Authentication Required: Rate limiting is applied per-API-key, so you need Helicone authentication enabled to track and limit requests for different users.

First, get your Helicone API key:

Go to Helicone Settings
Click “Generate New Key”
Copy the key (starts with sk-helicone-)
Set it as an environment variable:

export HELICONE_CONTROL_PLANE_API_KEY="sk-helicone-your-api-key"

Now create the configuration with authentication and rate limits:

helicone:
  # Set to `features: observability` to enable observability
  features: auth

routers:
  production:
    rate-limit:
      per-api-key:
        capacity: 1000
        refill-frequency: 1m # 1000 requests per minute
    load-balance:
      chat:
        strategy: latency
        providers:
          - openai
          - anthropic
    cache:
      directive: "max-age=1800" # 30 minutes for production freshness

  development:
    rate-limit:
      per-api-key:
        capacity: 100
        refill-frequency: 1h # 100 requests per hour for cost safety
    load-balance:
      chat:
        strategy: latency
        providers:
          - openai
          - anthropic
    cache:
      directive: "max-age=7200" # 2 hours to reduce dev costs

What this creates:

Router	Endpoint	Rate Limit	Use Case
production	`/router/production...`	1000/min	High-traffic customer requests
development	`/router/development...`	100/hour	Cost-controlled development

Update configuration

Replace your ai-gateway-config.yaml with the multi-environment config above.

Restart the gateway

npx @helicone/ai-gateway@latest --config ai-gateway-config.yaml

Test your routers

Now requests require authentication. Test each environment:

# Production router
curl -X POST http://localhost:8080/router/production/chat/completions \
  -H "Authorization: Bearer sk-helicone-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Production test"}]}'

# Development router
curl -X POST http://localhost:8080/router/development/chat/completions \
  -H "Authorization: Bearer sk-helicone-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Development test"}]}'

Step 4: Use in Your Applications

Just change the base URL to use different routers. Remember to use your Helicone API key for authentication:

import openai

client = openai.OpenAI(
    base_url="http://localhost:8080/router/production",
    api_key="sk-helicone-..."  # Your Helicone API key
)

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

Key Concepts You’ve Learned

Click on these if you’d like to dive deeper:

Load Balancing Strategies

Intelligent Caching

Multiple Routers

Rate Limiting

What’s Next?

You now have a solid foundation with custom routers! Here are the next steps to explore:

Deploy Your Router

Learn how to deploy your router to production

Getting Started

Integrations

Tracing

Prompts & Evals

AI Gateway

References

Create Your First Router

What Are Routers?