```env theme={null} HELICONE_API_KEY=sk-helicone-... ```

{strings.installRequiredDependencies}

```bash theme={null} pip install litellm python-dotenv ``` Add the `helicone/` prefix to any model name to logg requests for Helicone: ```python theme={null} import os from litellm import completion from dotenv import load_dotenv load_dotenv() # Route through Helicone by adding "helicone/" prefix response = completion( model="helicone/gpt-4o", messages=[{"role": "user", "content": "What is the capital of France?"}], api_key=os.getenv("HELICONE_API_KEY") ) print(response.choices[0].message.content) ```

While you're here, why not give us a star on GitHub? It helps us a lot! ## Complete Working Examples ### Basic Completion ```python theme={null} import os from litellm import completion from dotenv import load_dotenv load_dotenv() # Simple completion response = completion( model="helicone/gpt-4o-mini", messages=[{"role": "user", "content": "Tell me a fun fact about space"}], api_key=os.getenv("HELICONE_API_KEY") ) print(response.choices[0].message.content) ``` ### Streaming Responses ```python theme={null} import os from litellm import completion from dotenv import load_dotenv load_dotenv() # Streaming example response = completion( model="helicone/claude-4.5-sonnet", messages=[{"role": "user", "content": "Write a short story about a robot learning to paint"}], stream=True, api_key=os.getenv("HELICONE_API_KEY") ) print("🤖 Assistant (streaming):") for chunk in response: if hasattr(chunk.choices[0].delta, 'content') and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) print("\n") ``` ### Custom Properties and Session Tracking Add metadata to track and filter your requests: ```python theme={null} import os from litellm import completion from dotenv import load_dotenv load_dotenv() response = completion( model="helicone/gpt-4o-mini", messages=[{"role": "user", "content": "What's the weather like?"}], api_key=os.getenv("HELICONE_API_KEY"), metadata={ "Helicone-Session-Id": "session-abc-123", "Helicone-Session-Name": "Weather Assistant", "Helicone-User-Id": "user-789", "Helicone-Property-Environment": "production", "Helicone-Property-App-Version": "2.1.0", "Helicone-Property-Feature": "weather-query" } ) print(response.choices[0].message.content) ``` ## Provider Selection and Fallback Helicone's AI Gateway supports automatic failover between providers: ```python theme={null} import os from litellm import completion from dotenv import load_dotenv load_dotenv() # Automatic routing (cheapest provider) response = completion( model="helicone/gpt-4o", messages=[{"role": "user", "content": "Hello!"}], api_key=os.getenv("HELICONE_API_KEY") ) # Manual provider selection response = completion( model="helicone/claude-4.5-sonnet/anthropic", messages=[{"role": "user", "content": "Hello!"}], api_key=os.getenv("HELICONE_API_KEY") ) # Multiple provider fallback chain # Try OpenAI first, then Anthropic if it fails response = completion( model="helicone/gpt-4o/openai,claude-4.5-sonnet/anthropic", messages=[{"role": "user", "content": "Hello!"}], api_key=os.getenv("HELICONE_API_KEY") ) ``` ## Advanced Features ### Caching Enable caching to reduce costs and latency for repeated requests: ```python theme={null} import os from litellm import completion from dotenv import load_dotenv load_dotenv() # Enable caching for this request response = completion( model="helicone/gpt-4o", messages=[{"role": "user", "content": "What is 2+2?"}], api_key=os.getenv("HELICONE_API_KEY"), metadata={ "Helicone-Cache-Enabled": "true" } ) print(response.choices[0].message.content) # Subsequent identical requests will be served from cache response2 = completion( model="helicone/gpt-4o", messages=[{"role": "user", "content": "What is 2+2?"}], api_key=os.getenv("HELICONE_API_KEY"), metadata={ "Helicone-Cache-Enabled": "true" } ) print(response2.choices[0].message.content) ``` ### Rate Limiting Apply rate limiting policies to control request rates: ```python theme={null} import os from litellm import completion from dotenv import load_dotenv load_dotenv() response = completion( model="helicone/gpt-4o", messages=[{"role": "user", "content": "Hello"}], api_key=os.getenv("HELICONE_API_KEY"), metadata={ "Helicone-Rate-Limit-Policy": "basic-100" } ) print(response.choices[0].message.content) ``` ## Related Documentation Learn about Helicone's AI Gateway features and capabilities Configure intelligent routing and automatic failover Browse all available models and providers Add metadata to track and filter your requests Track multi-turn conversations and user sessions Configure rate limits for your applications Reduce costs and latency with intelligent caching Official LiteLLM documentation