LiteLLM Integration

Introduction

LiteLLM is an self-hosted interface for calling LLM APIs.

Integration Steps

HELICONE_API_KEY=sk-helicone-...

pip install litellm python-dotenv

Use LiteLLM with Helicone

Add the helicone/ prefix to any model name to logg requests for Helicone:

import os
from litellm import completion
from dotenv import load_dotenv

load_dotenv()

# Route through Helicone by adding "helicone/" prefix
response = completion(
    model="helicone/gpt-4o",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    api_key=os.getenv("HELICONE_API_KEY")
)

print(response.choices[0].message.content)

While you’re here, why not give us a star on GitHub? It helps us a lot!

Complete Working Examples

Basic Completion

import os
from litellm import completion
from dotenv import load_dotenv

load_dotenv()

# Simple completion
response = completion(
    model="helicone/gpt-4o-mini",
    messages=[{"role": "user", "content": "Tell me a fun fact about space"}],
    api_key=os.getenv("HELICONE_API_KEY")
)

print(response.choices[0].message.content)

Streaming Responses

import os
from litellm import completion
from dotenv import load_dotenv

load_dotenv()

# Streaming example
response = completion(
    model="helicone/claude-4.5-sonnet",
    messages=[{"role": "user", "content": "Write a short story about a robot learning to paint"}],
    stream=True,
    api_key=os.getenv("HELICONE_API_KEY")
)

print("🤖 Assistant (streaming):")
for chunk in response:
    if hasattr(chunk.choices[0].delta, 'content') and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

Custom Properties and Session Tracking

Add metadata to track and filter your requests:

import os
from litellm import completion
from dotenv import load_dotenv

load_dotenv()

response = completion(
    model="helicone/gpt-4o-mini",
    messages=[{"role": "user", "content": "What's the weather like?"}],
    api_key=os.getenv("HELICONE_API_KEY"),
    metadata={
        "Helicone-Session-Id": "session-abc-123",
        "Helicone-Session-Name": "Weather Assistant",
        "Helicone-User-Id": "user-789",
        "Helicone-Property-Environment": "production",
        "Helicone-Property-App-Version": "2.1.0",
        "Helicone-Property-Feature": "weather-query"
    }
)

print(response.choices[0].message.content)

Provider Selection and Fallback

Helicone’s AI Gateway supports automatic failover between providers:

import os
from litellm import completion
from dotenv import load_dotenv

load_dotenv()

# Automatic routing (cheapest provider)
response = completion(
    model="helicone/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key=os.getenv("HELICONE_API_KEY")
)

# Manual provider selection
response = completion(
    model="helicone/claude-4.5-sonnet/anthropic",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key=os.getenv("HELICONE_API_KEY")
)

# Multiple provider fallback chain
# Try OpenAI first, then Anthropic if it fails
response = completion(
    model="helicone/gpt-4o/openai,claude-4.5-sonnet/anthropic",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key=os.getenv("HELICONE_API_KEY")
)

Advanced Features

Caching

Enable caching to reduce costs and latency for repeated requests:

import os
from litellm import completion
from dotenv import load_dotenv

load_dotenv()

# Enable caching for this request
response = completion(
    model="helicone/gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}],
    api_key=os.getenv("HELICONE_API_KEY"),
    metadata={
        "Helicone-Cache-Enabled": "true"
    }
)

print(response.choices[0].message.content)

# Subsequent identical requests will be served from cache
response2 = completion(
    model="helicone/gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}],
    api_key=os.getenv("HELICONE_API_KEY"),
    metadata={
        "Helicone-Cache-Enabled": "true"
    }
)

print(response2.choices[0].message.content)

Rate Limiting

Apply rate limiting policies to control request rates:

import os
from litellm import completion
from dotenv import load_dotenv

load_dotenv()

response = completion(
    model="helicone/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    api_key=os.getenv("HELICONE_API_KEY"),
    metadata={
        "Helicone-Rate-Limit-Policy": "basic-100"
    }
)

print(response.choices[0].message.content)

AI Gateway Overview

Learn about Helicone’s AI Gateway features and capabilities

Provider Routing

Configure intelligent routing and automatic failover

Model Registry

Browse all available models and providers

Custom Properties

Add metadata to track and filter your requests

Sessions

Track multi-turn conversations and user sessions

Rate Limiting

Configure rate limits for your applications

Caching

Reduce costs and latency with intelligent caching

LiteLLM Documentation

Official LiteLLM documentation

Getting Started

AI Gateway

Observability & Analytics

Prompt Management

Legacy Integrations

References

Introduction

Integration Steps

Complete Working Examples

Basic Completion

Streaming Responses

Custom Properties and Session Tracking

Provider Selection and Fallback

Advanced Features

Caching

Rate Limiting

AI Gateway Overview

Provider Routing

Model Registry

Custom Properties

Sessions

Rate Limiting

Caching

LiteLLM Documentation

Getting Started

AI Gateway

Observability & Analytics

Prompt Management

Legacy Integrations

References

​Introduction

​Integration Steps

​Complete Working Examples

​Basic Completion

​Streaming Responses

​Custom Properties and Session Tracking

​Provider Selection and Fallback

​Advanced Features

​Caching

​Rate Limiting

​Related Documentation

AI Gateway Overview

Provider Routing

Model Registry

Custom Properties

Sessions

Rate Limiting

Caching

LiteLLM Documentation

Introduction

Integration Steps

Complete Working Examples

Basic Completion

Streaming Responses

Custom Properties and Session Tracking

Provider Selection and Fallback

Advanced Features

Caching

Rate Limiting

Related Documentation