LLM Security

Introduction

Generative AI is quickly changing the cybersecurity landscape. Helicone provides built-in security measures powered by Meta’s state-of-the-art security models to protect your LLM applications.

Adversarial Instructions

Indirect Injection

Data Exfiltration

Phishing

Security Implementation

Helicone’s LLM security is powered by two advanced models from Meta:

Prompt Guard (86M): A specialized model for detecting:
- Direct prompt injections
- Indirect/embedded malicious instructions
- Jailbreak attempts
- Multi-language attacks (supports 8 languages)

Advanced Security Analysis: Optional deeper security analysis using Meta’s Llama Guard (3.8B) for comprehensive threat detection across 14 categories:

Category	Description
Violent Crimes	Violence toward people or animals
Non-Violent Crimes	Financial crimes, property crimes, cyber crimes
Sex-Related Crimes	Trafficking, assault, harassment
Child Exploitation	Any content related to child abuse
Defamation	False statements harming reputation
Specialized Advice	Unauthorized financial/medical/legal advice
Privacy	Handling of sensitive personal information
Intellectual Property	Copyright and IP violations
Indiscriminate Weapons	Creation of dangerous weapons
Hate Speech	Content targeting protected characteristics
Suicide & Self-Harm	Content promoting self-injury
Sexual Content	Adult content and erotica
Elections	Misinformation about voting
Code Interpreter Abuse	Malicious code execution attempts

Quick Start

To enable LLM security in Helicone, simply add Helicone-LLM-Security-Enabled: true to your request headers. For advanced security analysis using Llama Guard, add Helicone-LLM-Security-Advanced: true:

curl https://oai.helicone.ai/v1/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <YOUR_API_KEY>' \
  -H 'Helicone-LLM-Security-Enabled: true' \
  -H 'Helicone-LLM-Security-Advanced: true' \
  -d '{
    "model": "text-davinci-003",
    "prompt": "How do I enable LLM security with helicone?",
}'

Security Checks

When LLM Security is enabled, Helicone:

Analyzes each user message using Meta’s Prompt Guard model (86M parameters) to detect:
- Direct jailbreak attempts
- Indirect injection attacks
- Malicious content in 8 languages (English, French, German, Hindi, Italian, Portuguese, Spanish, Thai)
When advanced security is enabled (Helicone-LLM-Security-Advanced: true), activates Meta’s Llama Guard (3.8B) model for:
- Deeper content analysis across 14 threat categories
- Higher accuracy threat detection
- More nuanced understanding of context and intent

Blocks detected threats and returns an error response:

{
  "success": false,
  "error": {
    "code": "PROMPT_THREAT_DETECTED",
    "message": "Prompt threat detected. Your request cannot be processed.",
    "details": "See your Helicone request page for more info."
  }
}

Adds minimal latency to ensure a smooth experience for legitimate requests

Advanced Security Features

Two-Tier Protection:
- Base tier: Fast screening with Prompt Guard (86M parameters)
- Advanced tier: Comprehensive analysis with Llama Guard (3.8B parameters)
Multilingual Support: Detects threats across 8 languages
Low Base Latency: Initial screening uses the lightweight Prompt Guard model
High Accuracy:
- Base: Over 97% detection rate on jailbreak attempts
- Advanced: Enhanced accuracy with Llama Guard’s larger model
Customizable: Security thresholds can be adjusted based on your application’s needs

Need more help?

Getting Started

Integrations

Tracing

Prompts & Evals

Cloud AI Gateway

References

Introduction

Example

Example

Security Implementation

Quick Start

Security Checks

Advanced Security Features

Getting Started

Integrations

Tracing

Prompts & Evals

Cloud AI Gateway

References

​Introduction

​Security Implementation

​Quick Start

​Security Checks

​Advanced Security Features

Introduction

Security Implementation

Quick Start

Security Checks

Advanced Security Features