Retrying requests is a common best practice when dealing with overloaded servers or hitting rate limits. These issues typically manifest as HTTP status codes 429 (Too Many Requests) and 500 (Internal Server Error). For more information on error codes, see the OpenAI API error codes documentation.

Exponential Backoff

To effectively deal with retries, we use a strategy called exponential backoff. Exponential backoff involves increasing the wait time between retries exponentially, which helps to spread out the request load and gives the server a chance to recover. This is done by multiplying the wait time by a factor (default is 2) for each subsequent retry.

Quick Start

To get started, set Helicone-Retry-Enabled to true

curl \
  -H 'Content-Type: application/json' \
  -H 'Helicone-Auth: Bearer YOUR_API_KEY' \
  -H 'Helicone-Retry-Enabled: true' \
  -d '{
    "model": "text-davinci-003",
    "prompt": "How do I enable retries?",

Advanced Usage

You can customize the behavior of the retries feature by setting additional headers in your request.

helicone-retry-numNumber of retries
helicone-retry-factorExponential backoff factor
helicone-retry-min-timeoutMinimum timeout (in milliseconds) between retries
helicone-retry-max-timeoutMaximum timeout (in milliseconds) between retries
Header values have to be strings, for example "helicone-retry-num": "3"