What is Proxy Integration?
Using Helicone as a proxy allows your LiteLLM requests to flow through Helicone’s infrastructure before reaching the LLM provider. This enables powerful features like:
- Request Caching - Save money by reusing identical requests
- Rate Limiting - Control your API usage
- Retries - Automatically retry failed requests
- Advanced Logging - Capture detailed metrics and request/response payloads
Unlike callback-based integration, proxy integration works at the network level and can provide more functionality for supported providers.
OpenAI and Azure Integration
Helicone offers native proxy support for OpenAI and Azure through LiteLLM. This is the simplest integration method.
Implementation Steps
-
Install dependencies
-
Configure LiteLLM to use the Helicone proxy
import os
import litellm
from litellm import completion
litellm.api_base = "https://oai.helicone.ai/v1"
litellm.headers = {"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}"}
-
Make API calls as usual
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "how does a court case get to the Supreme Court?"}],
metadata={
"Helicone-Property-Hello": "World"
}
)
print(response)
Gemini Integration
Gemini integration requires a special approach because of how the Vertex AI APIs are structured. We need to use a monkey patch with LiteLLM’s Router to correctly route requests through Helicone.
Why a Monkey Patch?
Gemini’s API structure differs from OpenAI’s, and LiteLLM’s default proxy handling doesn’t properly route Gemini requests through Helicone. The patch modifies LiteLLM’s internal URL handling to correctly work with Helicone’s Gemini proxy endpoints.
Implementation Steps
-
Install dependencies
pip install litellm asyncio
-
Set up environment variables
import os
HELICONE_API_KEY = os.getenv("HELICONE_API_KEY")
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
if not HELICONE_API_KEY or not GEMINI_API_KEY:
print("Error: HELICONE_API_KEY and GEMINI_API_KEY must be set.")
-
Configure the LiteLLM Router
import litellm
from litellm.router import Router
MODEL_NAME = "gemini-1.5-flash-latest"
router = Router(
model_list=[
{
"model_name": "gemini-hconeai",
"litellm_params": {
"model": f"gemini/{MODEL_NAME}",
"api_base": f"https://gateway.helicone.ai/v1beta/models/{MODEL_NAME}:generateContent?key={GEMINI_API_KEY}",
"extra_headers": {
"Content-Type": "application/json",
"Helicone-Auth": f"Bearer {HELICONE_API_KEY}",
"Helicone-Target-Url": "https://generativelanguage.googleapis.com"
},
"custom_llm_provider": "gemini"
}
}
]
)
-
Apply the monkey patch
from urllib.parse import urlparse, urlunparse, parse_qs, urlencode
from litellm.llms.vertex_ai.vertex_llm_base import VertexBase
_original_check_proxy = VertexBase._check_custom_proxy
def fixed_check_custom_proxy(
self, api_base: str | None, custom_llm_provider: str,
gemini_api_key: str | None, endpoint: str, stream: bool | None,
auth_header: str | None, url: str
) -> tuple[str | None, str]:
is_helicone_gemini_proxy = (
api_base is not None
and "gateway.helicone.ai" in api_base
and custom_llm_provider == "gemini"
)
if is_helicone_gemini_proxy:
final_url = api_base
auth = None
if stream:
parsed = urlparse(final_url)
qs = parse_qs(parsed.query)
qs['alt'] = ['sse']
final_url = urlunparse((parsed.scheme, parsed.netloc, parsed.path, parsed.params, urlencode(qs, doseq=True), parsed.fragment))
return (auth, final_url)
elif _original_check_proxy:
return _original_check_proxy(
self, api_base, custom_llm_provider, gemini_api_key,
endpoint, stream, auth_header, url
)
else:
return (auth_header, url)
VertexBase._check_custom_proxy = fixed_check_custom_proxy
-
Make API calls
import asyncio
async def main():
response = await router.acompletion(
model="gemini-hconeai",
messages=[{"role": "user", "content": "Say 'Hello World'"}]
)
print(f"Response: {response.choices[0].message.content}")
if __name__ == "__main__":
asyncio.run(main())
Using with Other Providers
For LLM providers beyond OpenAI, Azure, and Gemini, the integration approach varies by provider:
- Each provider has a specific Helicone proxy URL format (e.g., OpenAI uses
oai.helicone.ai/v1
)
- Some providers require the
Helicone-Target-Url
header, while others don’t
# Example pattern (will vary by provider)
litellm.api_base = "<provider-specific-helicone-endpoint>"
litellm.headers = {
"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
# Additional headers may be required depending on the provider
}
Consult the Helicone documentation for provider-specific proxy endpoints and required headers. If your provider isn’t explicitly supported, reach out to the Helicone team for guidance.
For callback-based integration with LiteLLM, see our LiteLLM Callbacks Integration guide.