Complete reference for configuring your LLM Gateway
ai-gateway-config.yaml
file that defines how requests are routed, load balanced, and processed across different LLM providers.
http://localhost:8080/router/latency
or http://localhost:8080/router/weighted
latency
for automatic load balancing that routes to the provider with the lowest latency.weighted
to distribute requests based on specific percentages.1.0
268435456
(256MB worth of entries)1000
(good for development)10000
(good for moderate traffic)536870912
(512MB worth for high traffic)max-age
) and staleness tolerance (max-stale
) in seconds for all requests. Optionally override with cache-control
request headers.rate-limit-store
configuration.constant
- Fixed delay between retry attempts with jitterexponential
- Exponentially increasing delays with jitter and configurable boundsauth
- Enable authentication for secure API accessobservability
- Enable authentication and request logging to your Helicone dashboardprompts
- Enable authentication and prompt management with prompt_id
supportall
- Enable all Helicone features (authentication, observability, and prompts)HELICONE_CONTROL_PLANE_API_KEY
environment variable with your Helicone API key when deploying the AI Gateway."info"
- General information for all modules, recommended for production"info,ai_gateway=debug"
- Debug for dependencies, info for gateway, recommended for developmentstdout
- Export telemetry data to standard output (default)otlp
- Export telemetry data to OTLP collector endpointboth
- Export to both stdout and OTLP collectorhelicone-provider
header showing which provider handled the request.helicone-provider: openai
or helicone-provider: anthropic
.helicone-provider-req-id
header showing the provider’s request ID.helicone-provider-req-id: req-12345
for request tracing.error-ratio
- Monitor based on error rate thresholds (only option currently)