How We Calculate Cost
Learn how Helicone calculates the cost per request for nearly all models, including both streamed and non-streamed requests. Detailed explanations and examples provided.
OpenAI Non-Streaming
OpenAI Non-Streaming are requests made to the OpenAI API where the entire response is delivered in a single payload rather than in a series of streamed chunks.
For these non-streaming requests, OpenAI provides a usage
tag in the response, which includes data such as the number of prompt tokens, completion tokens, and total tokens used.
Here is an example of how the usage
tag might look in a response:
"usage": {
"prompt_tokens": 11,
"completion_tokens": 9,
"total_tokens": 20
},
We capture this data, and we estimate the cost based on the model returned in the response body, using OpenAI’s pricing tables.
OpenAI Streaming
Unlike non-streaming requests, OpenAI streaming requests do not return the usage
tag. Therefore, we rely on 3rd party tokenizers like TikToken to count our tokens. The calculation becomes complex when handling Chat messages, as hidden tokens are abstracted away from the user. To accurately estimate the amount of prompt and completion tokens, we have reverse-engineered the hidden tokens.
Anthropic Requests
In the case of Anthropic requests, there is no supported method for calculating tokens in Typescript. So, we have to manually calculate the tokens using a Python server. For more discussion and details on this topic, see our comments in this thread: https://github.com/anthropics/anthropic-sdk-typescript/issues/16
Developer
For a detailed look at how we calculate LLM costs, please follow this link: https://github.com/Helicone/helicone/tree/main/costs
Please note that these methods are based on our current understanding and may be subject to changes in the future as APIs and token counting methodologies evolve.