How We Calculate Cost
OpenAI Non-Streaming
For non-streaming requests, OpenAI responds with a usage
tag that contains useful data for calculating the cost. The tag generally looks like this:
"usage": {
"prompt_tokens": 11,
"completion_tokens": 9,
"total_tokens": 20
},
We capture this data, and using OpenAI’s pricing tables, we estimate the cost based on the model returned in the response body. You can check OpenAI’s pricing tables here: https://openai.com/pricing#language-models
OpenAI Streaming
Unlike non-streaming requests, OpenAI streaming requests do not return the usage
tag. Therefore, we rely on 3rd party tokenizers like TikToken to count our tokens. The calculation becomes complex when handling Chat messages, as hidden tokens are abstracted away from the user. To accurately estimate the amount of prompt and completion tokens, we have reverse-engineered the hidden tokens.
Anthropic Requests
In the case of Anthropic requests, there is no supported method for calculating tokens in Typescript. So, we have to manually calculate the tokens using a Python server. For more discussion and details on this topic, see our comments in this thread: https://github.com/anthropics/anthropic-sdk-typescript/issues/16
Developer
For a detailed look at our pricing table, please follow this link: https://github.com/Helicone/helicone/blob/main/web/lib/sql/constants.ts#L4-L17
Please note that these methods are based on our current understanding and may be subject to changes in the future as APIs and token counting methodologies evolve.