GCRA-based rate limiting with burst capacity and smooth request throttling
Get your Helicone API key
sk-helicone-
)Create your configuration
ai-gateway-config.yaml
with authentication and rate limiting:Start the gateway
Test rate limiting
Per-API-Key Rate Limiting - Default
Request Arrives
Rate Limit Check
Token Consumption
Request Processing
Token Refill
Level | Description | When Applied |
---|---|---|
Global Rate Limits | Application-wide limits across all routers | Checked first as safety net |
Router-Specific Rate Limits | Individual router limits or opt-out | Checked after global limits pass |
In-Memory Storage
Redis Storage
host-url
: Redis connection string
redis://[username:password@]host[:port][/database]
connection-timeout
: Connection timeout in seconds (default: 5)Mixed Storage
Feature | Description | Version |
---|---|---|
Database Storage | Persistent rate limiting state with advanced querying capabilities for analytics and compliance | v2 |
Per-End-User Limits | Rate limits applied to end users via Helicone-User-Id header for SaaS user quotas | v1 |
Per-Team Limits | Rate limits applied to teams for budget and governance controls | v2 |
Per-Team-Member Limits | Rate limits applied to individual team members for governance | v2 |
Spend Limits | Cost-based limits that restrict usage based on dollar amounts spent per time period | v2 |
Usage Limits | Token-based limits that restrict usage based on input/output tokens consumed | v2 |