Caching
Reduce latency and save costs by caching on the edge
Caching, by temporarily storing data closer to the user at the edge, significantly speeds up access time and enhances application performance. This edge deployment ensures low latency, resulting in faster responses and an efficient app development process.

To get started, just set
Helicone-Cache-Enabled
to true
in the headers, or use the Python or NPM packages to turn it on via parameters.Curl
Python
Node.js
curl https://oai.hconeai.com/v1/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Helicone-Cache-Enabled: true' \
-d '{
"model": "text-davinci-003",
"prompt": "How do I enable caching?",
}'
With the package:
response = openai.Completion.create(
model="text-davinci-003",
prompt="How do I cache with helicone?",
cache=True,
)
Without the package:
openai.api_base = "https://oai.hconeai.com/v1"
openai.Completion.create(
model="text-davinci-003",
prompt="How do I cache with helicone?",
headers={
"Helicone-Cache-Enabled": "true",
}
)
import { Configuration, OpenAIApi } from "openai";
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
basePath: "https://oai.hconeai.com/v1",
baseOptions: {
headers: {
"Helicone-Cache-Enabled": "true",
},
},
});
const openai = new OpenAIApi(configuration);
The default caching limit is
7 days
, if you want a longer cache please refer to the Cache Parameters section and add the Cache-Control
header to your request.The max time limit for a cache limit is
365 days
Helicone-Cache-Enabled (required)
: This will enable storing and loading from your cacheCache-Control (optional)
: Allow you to configure based on the Cloudflare Cache Directive, currently only max-age
is supported, but we will be adding more configuration options soon.Example of setting the cache to
2592000 seconds
aka 30 days:
"Cache-Control": "max-age=
2592000"
You can increase the size of the cache bucket, so that after the
n'th
request we randomly choose from a previously cached element within the bucket. Here is an example with a bucket size of
3
openai.completion("give me a random number") -> "42"
# Cache Miss
openai.completion("give me a random number") -> "47"
# Cache Miss
openai.completion("give me a random number") -> "17"
# Cache Miss
openai.completion("give me a random number") -> This will randomly choose 42 | 47 | 17
# Cache Hit
Simply add
Helicone-Cache-Bucket-Max-Size
with some number to choose how large you want your Bucket size to be.Note: The max number of caches you can store is
20
within a bucket, if you want more you will need to upgrade to an enterprise plan.Python
Typescript
openai.api_base = "https://oai.hconeai.com/v1"
openai.Completion.create(
model="text-davinci-003",
prompt="Say this is a test",
headers={
"Helicone-Auth": "Bearer HELICONE_API_KEY",
"Helicone-Cache-Enabled": "true",
"Helicone-Cache-Bucket-Max-Size": "5",
}
)
import { Configuration, OpenAIApi } from "openai";
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
basePath: "https://oai.hconeai.com/v1",
baseOptions: {
headers: {
"Helicone-Cache-Enabled": "true",
"Helicone-Cache-Bucket-Max-Size": "5",
},
},
});
const openai = new OpenAIApi(configuration);
We allow dynamic settings to configure whether or not you want to treat your cache as a read-only or write-only.
Ex: When developing locally you might want to only be loading in the cache and not reading from the cache, while on prod you might only want reads from the cache.
You can set either one of these headers instead of
Helicone-Cache-Enabled
.// Only saves to the cache
"Helicone-Cache-Save": "true"
// Only reads and no saves
"Helicone-Cache-Read": "true"
Last modified 18d ago