Intelligent request routing across providers with latency-based P2C and weighted algorithms
Create your configuration
ai-gateway-config.yaml
with latency-based routing (automatically picks the fastest provider):Ensure your provider API keys are set
Start the gateway
Test load balancing
Latency-based (P2C + PeakEWMA) - Default
Weighted Strategy
Request Arrives
gpt-4o-mini
)Provider Selection
Health Check
Request Forwarded
Response & Learning
gpt-4o-mini
:
gpt-4o-mini
claude-3-5-haiku
llama3.2
Use Case | Recommended Strategy |
---|---|
Production APIs | Latency-based - Automatically optimizes for speed |
Provider migration | Weighted - Gradual traffic shifting with instant rollback |
A/B testing | Weighted - Controlled traffic splits for comparison |
Cost optimization | Weighted - Route more traffic to cheaper providers |
Compliance routing | Multiple AI Gateways - Better isolation |
Feature | Description | Version |
---|---|---|
Cost-Optimized Strategy | Route to the cheapest equivalent model - picks the provider that offers the same model or configured equivalent models for the lowest price | v2 |
Model-Level Weighted Strategy | Provider + model specific weighting - configure weights for provider+model pairs (e.g., openai/o1 vs bedrock/claude-3-5-sonnet) | v2 |
Tag-based Routing | Header-driven routing decisions - route requests to specific providers and models based on tags passed via request headers | v3 |