Retries
Retrying requests is a common best practice when dealing with overloaded servers or hitting rate limits. These issues typically manifest as HTTP status codes 429 (Too Many Requests) and 500 (Internal Server Error). For more information on error codes, see the OpenAI API error codes documentation.
Exponential Backoff
To effectively deal with retries, we use a strategy called exponential backoff. Exponential backoff involves increasing the wait time between retries exponentially, which helps to spread out the request load and gives the server a chance to recover. This is done by multiplying the wait time by a factor (default is 2) for each subsequent retry.
Quick Start
To get started, set Helicone-Retry-Enabled
to true
curl https://oai.hconeai.com/v1/completions \
-H 'Content-Type: application/json' \
-H 'Helicone-Auth: Bearer YOUR_API_KEY' \
-H 'Helicone-Retry-Enabled: true' \
-d '{
"model": "text-davinci-003",
"prompt": "How do I enable retries?",
}'
Advanced Usage
You can customize the behavior of the retries feature by setting additional headers in your request.
Parameter | Description |
---|---|
helicone-retry-num | Number of retries |
helicone-retry-factor | Exponential backoff factor |
helicone-retry-min-timeout | Minimum timeout (in milliseconds) between retries |
helicone-retry-max-timeout | Maximum timeout (in milliseconds) between retries |
"helicone-retry-num": "3"