Helicone leverages Cloudflare’s global network of servers as proxies for efficient web traffic routing. Cloudflare workers maintain extremely low latency through their worldwide distribution. This results in a fast and reliable proxy for your LLM requests

Benchmarking Helicone’s proxy service

Our experiment:

  • We interweaved 500 requests with unique prompts to both OpenAI and Helicone.

  • Both got the same requests in the same 1s window, and we varied which endpoint went first for each request.

  • We maxed out the prompt context window to make these requests as big as possible.

  • We used text-ada-001.

  • We logged the latency of the roundtrip time to complete both roundtrips.


StatisticOpenai (s)Helicone (s)
Standard Deviation1.121.12

The metrics are almost the same, except that Helicone had a few requests that ran longer at the right tail.

If you have any suggestions about how to benchmark the latency addition to Helicone’s proxy, drop a note in our Discord!