Set custom rate limits for model provider API calls. Control usage by request count, cost, or custom properties to manage expenses and prevent unintended overuse.
1000 requests per day
or 60 requests per minute
. By implementing rate limits, you can prevent abuse while protecting your resources from being overwhelmed by excessive traffic.
Helicone-RateLimit-Policy
header in your request. This will rate limit all requests made with the specified API key.
The header follows this format:
Parameter | Description |
---|---|
quota (required) | The maximum number of requests allowed within the specified time window. |
time_window (required) | The unit is seconds. For example, you would use w=86400 (60 _ 60 _ 24 = 86400) to set the time window for a single day. The minimum is 60 seconds. |
unit (optional) | Must be request or cents . If left blank, unit is set to request by default. |
segment (optional) | Must be user or a custom property. If left blank, segment is set to global by default. We’ll explain the difference in the Filtering By Segments section. |
s=[segment]
parameter is used to specify the scope in which you want to apply rate limits to all requests made with an API key. You can apply rate limits globally, by users, or by a custom property.
1000;w=60
).1000;w=60;s=user
).
"Helicone-User-Id": "username"
).1000;w=60;s=[property_name]
).
"Helicone-Property-{property_name}": "some label"
).Helicone-RateLimit-Policy
:
Rate Limiting Globally
10000;w=3600
s=[segment]
parameter is ignored since the default is
global.Rate Limiting By User
500000;w=86400;s=user
Rate Limiting By Custom Property
300;w=1800;s=organization
Helicone-RateLimit-Limit
: The quota for the number of requests allowed in the time window.Helicone-RateLimit-Policy
: The active rate limit policy.Helicone-RateLimit-Remaining
: The remaining quota in the current window.tokens
and cost
. Additionally, you will be able to see how close your requests, users, and properties are to hitting their rate limits in the web UI.
Need more help?