Streaming
Helicone smoothly integrates streaming functionality and offers benefits that you can’t find with the standard OpenAI package!
Currently, OpenAI doesn’t provide usage statistics such as prompt and completion tokens. However, Helicone overcomes this limitation by estimating these statistics with the help of the gpt3-tokenizer package, which is designed to work with all tokenized OpenAI GPT models.
All you have to do is import helicone or import openai through the Helicone package and the rest of your code works as it does.
The following examples work with or without Helicone!
Streaming mode with synchronous requests
In this mode, the request is made synchronously, but the response is streamed.
Streaming mode with asynchronous requests
In this mode, both the request is made asynchronously and the response is streamed. You’ll need to use the await
keyword when calling openai.ChatCompletion.acreate
, and use an async
for loop to iterate over the response.
Enhanced Streaming Support
Helicone now provides significantly improved streaming functionality with several key updates:
Stream Fixes and Improvements
We’ve made several improvements to our stream handling across different LLM providers:
- Better handling of stream interruptions and reconnections
- Enhanced error handling for streaming responses
- Improved compatibility with different LLM provider streaming formats
- More reliable token counting for streamed content
- Accurate timing calculations for streamed responses
New Streaming Methods
The HeliconeManualLogger
class now includes enhanced methods for working with streams:
logStream
: Logs a streaming operation with full control over stream handlinglogSingleStream
: Simplified method for logging a single ReadableStreamlogSingleRequest
: Logs a single request with a response body
Asynchronous Stream Parser
Our new asynchronous stream parser significantly improves performance when working with streamed responses:
- Processes stream chunks asynchronously for reduced latency
- Provides more reliable token counting for streamed responses
- Accurately captures time-to-first-token metrics
- Efficiently handles multiple concurrent streams
Using the Enhanced Streaming Features
OpenAI Streaming Example
Together AI Streaming Example
Anthropic Streaming Example
Calculating Costs with Streaming
For information on how to accurately calculate costs when using streaming features, please refer to our streaming usage guide.
You can enable accurate cost calculation by either:
- Including
stream_options: { include_usage: true }
in your request - Adding the
helicone-stream-usage: true
header to your request
This ensures that token usage is properly tracked even when using streaming responses.
Vercel App Router Integration
When using Next.js App Router with Vercel, you can use the after
function to log streaming responses without blocking the response to the client:
This approach ensures that logging doesn’t delay the response to the user, providing the best possible experience while still capturing all the necessary data.
Learn More
For a comprehensive guide on using the Manual Logger with streaming functionality, check out our Manual Logger with Streaming cookbook.
Was this page helpful?