When developing and testing LLM applications, you often make the same requests repeatedly during debugging and iteration. Caching stores responses on the edge using Cloudflare Workers, eliminating redundant API calls and reducing both latency and costs.
All header values must be strings. For example, "Helicone-Cache-Bucket-Max-Size": "10".
Cache Duration
Set how long responses stay cached using the Cache-Control header:
Copy
Ask AI
{ "Cache-Control": "max-age=3600" // 1 hour}
Common durations:
1 hour: max-age=3600
1 day: max-age=86400
7 days: max-age=604800 (default)
30 days: max-age=2592000
Maximum cache duration is 365 days (max-age=31536000)
Bucket Size
Control how many different responses are stored for the same request:
Copy
Ask AI
{ "Helicone-Cache-Bucket-Max-Size": "3"}
With bucket size 3, the same request can return one of 3 different cached responses randomly:
Copy
Ask AI
openai.completion("give me a random number") -> "42" # Cache Missopenai.completion("give me a random number") -> "47" # Cache Miss openai.completion("give me a random number") -> "17" # Cache Missopenai.completion("give me a random number") -> "42" | "47" | "17" # Cache Hit
Maximum bucket size is 20. Enterprise plans support larger buckets.
Cache Seeds
Create separate cache namespaces using seeds:
Copy
Ask AI
{ "Helicone-Cache-Seed": "user-123"}
Different seeds maintain separate cache states:
Copy
Ask AI
# Seed: "user-123"openai.completion("random number") -> "42"openai.completion("random number") -> "42" # Same response# Seed: "user-456" openai.completion("random number") -> "17" # Different responseopenai.completion("random number") -> "17" # Consistent per seed
Change the seed value to effectively clear your cache for testing.
Avoid repeated charges while debugging and iterating on prompts:
Copy
Ask AI
import OpenAI from "openai";const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, baseURL: "https://oai.helicone.ai/v1", defaultHeaders: { "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`, "Helicone-Cache-Enabled": "true", "Cache-Control": "max-age=86400" // Cache for 1 day during development },});// This request will be cachedconst response = await openai.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: "Explain quantum computing" }]});// Subsequent identical requests return cached response instantly
Avoid repeated charges while debugging and iterating on prompts:
Copy
Ask AI
import OpenAI from "openai";const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, baseURL: "https://oai.helicone.ai/v1", defaultHeaders: { "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`, "Helicone-Cache-Enabled": "true", "Cache-Control": "max-age=86400" // Cache for 1 day during development },});// This request will be cachedconst response = await openai.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: "Explain quantum computing" }]});// Subsequent identical requests return cached response instantly
Cache explanations for commonly asked code snippets across your team:
Copy
Ask AI
// Use a consistent identifier for the code snippetconst codeIdentifier = `code-${codeSnippet.length}-${codeSnippet.slice(0, 20)}`;const response = await openai.chat.completions.create( { model: "gpt-4o-mini", messages: [{ role: "user", content: `Explain this code:\n\n${codeSnippet}` }] }, { headers: { "Helicone-Cache-Enabled": "true", "Helicone-Cache-Seed": codeIdentifier, // Same code = same cache "Cache-Control": "max-age=604800" // Cache for 1 week } });// Multiple developers asking about the same function get instant responses
Cache answers to frequently asked questions about your API or product:
Copy
Ask AI
const response = await openai.chat.completions.create( { model: "gpt-4o-mini", messages: [{ role: "user", content: "How do I authenticate with the API?" }] }, { headers: { "Helicone-Cache-Enabled": "true", "Cache-Control": "max-age=86400", // Cache for 24 hours "Helicone-Cache-Bucket-Max-Size": "1" // Consistent answers } });// Common documentation questions get instant, consistent responses// Perfect for chatbots, help widgets, and FAQ systems