Features
Retries
Configure Helicone to automatically retry failed LLM requests, overcoming rate limits and server issues using intelligent exponential backoff.
Who can use this feature: Anyone on any plan.
Introduction
Retrying requests is a common best practice when dealing with overloaded servers or hitting rate limits. These issues typically manifest as HTTP status codes 429
(Too Many Requests) and 500
(Internal Server Error).
For more information on error codes, see the OpenAI API error codes documentation.
Why Retries
- Overcoming rate limits and server overload.
- Reducing the load on the server, increasing the likelihood of request success on subsequent attempts.
Quick Start
To get started, set Helicone-Retry-Enabled
to true
.
When a retry happens, the request will be logged in Helicone.
Retries Parameters
You can customize the behavior of the retries feature by setting additional headers in your request.
Parameter | Description |
---|---|
helicone-retry-num | Number of retries |
helicone-retry-factor | The exponential backoff factor used to increaase the wait time between subsequent retries. The default is usually 2 . |
helicone-retry-min-timeout | Minimum timeout (in milliseconds) between retries |
helicone-retry-max-timeout | Maximum timeout (in milliseconds) between retries |
Header values have to be strings. For example, "helicone-retry-num": "3"
.
Questions?
Questions or feedback? Reach out to help@helicone.ai or schedule a call with us.