Intelligent LLM response caching to reduce costs and improve latency
Create your configuration
ai-gateway-config.yaml
with basic caching (1-hour TTL with 30-minute stale allowance):Start the gateway
Test caching
helicone-cache: HIT
header!helicone-cache: HIT
header!Multiple Responses (Buckets)
Cache Namespacing (Seeds)
Helicone-Cache-Seed
header.Best for: SaaS apps and multi-tenant systems that need user-level isolationHow it works:seed
valueRequest Arrives
Configuration Merge
Cache Key Generation
Cache Lookup
Cache Hit or Miss
helicone-cache: HIT
headerhelicone-cache: MISS
headerLevel | Description | When Applied |
---|---|---|
Request Headers | Per-request cache control via headers | Overrides all other settings |
Router Configuration | Per-router cache policies | Overrides global defaults |
Global Configuration | Application-wide cache defaults | Used as fallback |
Helicone-Cache-Enabled: true/false
- Enable/disable cachingCache-Control: "max-age=3600"
- Override cache directiveHelicone-Cache-Seed: "custom-seed"
- Set cache namespaceHelicone-Cache-Bucket-Max-Size: 5
- Override bucket sizehelicone-cache: HIT/MISS
- Whether response was served from cachehelicone-cache-bucket-idx: 2
- Index of cache bucket used (0-based)In-Memory Storage
Redis
Use Case | Recommended Approach |
---|---|
Production APIs | 1-hour TTL, buckets 1-3 |
Development/Testing | 24-hour TTL, buckets 5-10 |
Creative applications | 30-min TTL, buckets 10+ |
High-traffic systems | Short TTL (≤2 h), buckets 3-5 |
User-specific caching | Seeds for namespace isolation |
Single instance | In-memory storage |
Multiple instances | Redis storage |
Feature | Description | Version |
---|---|---|
Database Storage | Persistent cache storage with advanced analytics and compliance features | v1 |