OpenAI Non-Streaming

For non-streaming requests, OpenAI responds with a usage tag that contains useful data for calculating the cost. The tag generally looks like this:

"usage": {
	"prompt_tokens": 11,
	"completion_tokens": 9,
	"total_tokens": 20
},

We capture this data, and using OpenAI’s pricing tables, we estimate the cost based on the model returned in the response body. You can check OpenAI’s pricing tables here: https://openai.com/pricing#language-models

OpenAI Streaming

Unlike non-streaming requests, OpenAI streaming requests do not return the usage tag. Therefore, we rely on 3rd party tokenizers like TikToken to count our tokens. The calculation becomes complex when handling Chat messages, as hidden tokens are abstracted away from the user. To accurately estimate the amount of prompt and completion tokens, we have reverse-engineered the hidden tokens.

Anthropic Requests

In the case of Anthropic requests, there is no supported method for calculating tokens in Typescript. So, we have to manually calculate the tokens using a Python server. For more discussion and details on this topic, see our comments in this thread: https://github.com/anthropics/anthropic-sdk-typescript/issues/16

Developer

For a detailed look at how we calculate LLM costs, please follow this link: https://github.com/Helicone/helicone/tree/main/costs

Please note that these methods are based on our current understanding and may be subject to changes in the future as APIs and token counting methodologies evolve.