Helicone leverages Cloudflare’s global network of servers as proxies for efficient web traffic routing. Cloudflare workers maintain extremely low latency through their worldwide distribution. This results in a fast and reliable proxy for your LLM requests
Benchmarking Helicone’s proxy service
We interweaved 500 requests with unique prompts to both OpenAI and Helicone.
Both got the same requests in the same 1s window, and we varied which endpoint went first for each request.
We maxed out the prompt context window to make these requests as big as possible.
We logged the latency of the roundtrip time to complete both roundtrips.
|Statistic||Openai (s)||Helicone (s)|
The metrics are almost the same, except that Helicone had a few requests that ran longer at the right tail.
If you have any suggestions about how to benchmark the latency addition to Helicone’s proxy, drop a note in our Discord!