Helicone leverages Cloudflare’s global network of servers as proxies for efficient web traffic routing. Cloudflare workers maintain extremely low latency through their worldwide distribution. This results in a fast and reliable proxy for your LLM requests

Benchmarking Helicone’s proxy service

Our experiment:

  • We interweaved 500 requests with unique prompts to both OpenAI and Helicone.

  • Both got the same requests in the same 1s window, and we varied which endpoint went first for each request.

  • We maxed out the prompt context window to make these requests as big as possible.

  • We used text-ada-001.

  • We logged the latency of the roundtrip time to complete both roundtrips.

Results

StatisticOpenai (s)Helicone (s)
Mean2.212.21
Median2.872.90
Standard Deviation1.121.12
Min0.140.14
Max3.563.76
p100.520.52
p903.273.29

The metrics are almost the same, except that Helicone had a few requests that ran longer at the right tail.

If you have any suggestions about how to benchmark the latency addition to Helicone’s proxy, drop a note in our Discord!