Introduction

Retrying requests is a common best practice when dealing with overloaded servers or hitting rate limits. These issues typically manifest as HTTP status codes 429 (Too Many Requests) and 500 (Internal Server Error).

For more information on error codes, see the OpenAI API error codes documentation.

Why Retries

  • Overcoming rate limits and server overload.
  • Reducing the load on the server, increasing the likelihood of request success on subsequent attempts.

Quick Start

To get started, set Helicone-Retry-Enabled to true.

When a retry happens, the request will be logged in Helicone.

Retries Parameters

You can customize the behavior of the retries feature by setting additional headers in your request.

ParameterDescription
helicone-retry-numNumber of retries
helicone-retry-factorThe exponential backoff factor used to increaase the wait time between subsequent retries. The default is usually 2.
helicone-retry-min-timeoutMinimum timeout (in milliseconds) between retries
helicone-retry-max-timeoutMaximum timeout (in milliseconds) between retries

Header values have to be strings. For example, "helicone-retry-num": "3".