Who can use this feature: Anyone on any plan.


Retrying requests is a common best practice when dealing with overloaded servers or hitting rate limits. These issues typically manifest as HTTP status codes 429 (Too Many Requests) and 500 (Internal Server Error).

For more information on error codes, see the OpenAI API error codes documentation.

Why Retries

  • Overcoming rate limits and server overload.
  • Reducing the load on the server, increasing the likelihood of request success on subsequent attempts.

Quick Start

To get started, set Helicone-Retry-Enabled to true.

curl https://oai.helicone.ai/v1/completions \
  -H 'Content-Type: application/json' \
  -H 'Helicone-Auth: Bearer YOUR_API_KEY' \
  -H 'Helicone-Retry-Enabled: true' \ # Add this header and set to true
  -d '{
    "model": "text-davinci-003",
    "prompt": "How do I enable retries?",
When a retry happens, the request will be logged in Helicone.

Retries Parameters

You can customize the behavior of the retries feature by setting additional headers in your request.

helicone-retry-numNumber of retries
helicone-retry-factorThe exponential backoff factor used to increaase the wait time between subsequent retries. The default is usually 2.
helicone-retry-min-timeoutMinimum timeout (in milliseconds) between retries
helicone-retry-max-timeoutMaximum timeout (in milliseconds) between retries

Header values have to be strings. For example, "helicone-retry-num": "3".


Questions or feedback? Reach out to help@helicone.ai or schedule a call with us.