To accurately calculate costs when using streaming features, you can either include a include_usage key in the body of your request or use the Helicone header.

Method 1: Request Body

client.chat.completions.create(
    model="gpt-4o",
    max_tokens=100,
    messages=[
        {"role": "system", "content": "You are a great storyteller."},
        {"role": "user", "content": "Once upon a time in a galaxy far, far away..."}
    ],
    stream=True,
    stream_options={
        "include_usage": True # set to true
    }
)

Method 2: Helicone Header

You can also enable stream usage by adding the helicone-stream-usage header to your request:

client = OpenAI(
    # ...
    headers={
        "helicone-stream-usage": "true"
    }
)

client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)