March 2025 Update: We’ve enhanced our webhook implementation to provide a unified request_response_url field that contains both request and response data in a single object. This improves performance and simplifies data retrieval. Learn more.

Top use cases

  • Scoring: Score requests based on custom logic.
  • Data ETL: Moving data from one system to another.
  • Automations / Alerts: Trigger actions automatically, such as sending a Slack notification or triggering a webhook to an external tool.

Setting up webhooks

Head over to the webhooks page to set up a webhook.

Webhook configuration UI

Add the webhook URL and select the events you want to trigger on.

You will want to copy the HMAC key and add it to your webhook environment to validate the signature of the webhook request.

Configure your webhook route

We recommend for startups to use Cloudflare workers or Vercel edge functions for webhooks, they are simple to setup and scale very well.

We have a prebuilt Cloudflare worker that you can use as a starting point.

The webhook endpoint is a POST route that accepts the following JSON body:

POST /webhook

The body of the request will contain the following fields:

  • request_id: The request ID of the request that triggered the webhook.
  • user_id: The identifier of the user who made the request (if available).
  • request_body: The body of the request that triggered the webhook.
  • response_body: The body of the response that triggered the webhook.
  • request_response_url: The URL to fetch the full request and response data.

Example - NextJS

import crypto from "crypto";

export default async function handler(
  req: NextApiRequest,
  res: NextApiResponse
) {
  const {
    request_id,
    user_id,
    request_body,
    response_body,
    request_response_url,
  } = req.body;

  // STEP 1: Validate the signature of the webhook request
  const hmac = crypto.createHmac("sha256", process.env.HELICONE_WEBHOOK_SECRET);
  hmac.update(JSON.stringify(req.body)); // Use the entire body for validation
  const signature = hmac.digest("hex");
  if (signature !== req.headers["helicone-signature"]) {
    return res.status(401).json({ error: "Unauthorized" });
  }

  // STEP 2: Do something with the webhook data
  console.log(request_id, user_id, request_body, response_body);

  // STEP 3: Optionally fetch the full request/response data if needed
  if (request_response_url) {
    try {
      const response = await fetch(request_response_url);
      const fullData = await response.json();
      console.log("Full request/response data:", fullData);
      // Process the complete data...
    } catch (error) {
      console.error("Error fetching full data:", error);
    }
  }

  return res.status(200).json({ message: "Webhook received" });
}

Webhook Configuration

When setting up a webhook, you can configure the following options:

  1. Destination URL: The URL where webhook payloads will be sent.
  2. Sample Rate: Control what percentage of requests trigger webhooks (0-100%).
  3. Property Filters: Only send webhooks for requests with specific properties.
  4. Include Data: Toggle whether to include additional data like costs and S3 links (enabled by default).

Standard Webhook Payload

By default, webhooks send a minimal payload with just the request ID and truncated request/response bodies:

{
  "request_id": "uuid-string",
  "user_id": "user-identifier-string", // Only included if set in the original request
  "request_body": "request-body-string",
  "response_body": "response-body-string"
}

Enhanced Webhook Payload

When the includeData option is enabled, webhooks include additional useful information:

{
  "request_id": "uuid-string",
  "user_id": "user-identifier-string", // Only included if set in the original request
  "request_body": "request-body-string",
  "response_body": "response-body-string",
  "request_response_url": "https://s3-url-containing-full-request-and-response",
  "model": "gpt-3.5-turbo",
  "provider": "openai",
  "metadata": {
    "cost": 0.0015,
    "promptTokens": 10,
    "completionTokens": 15,
    "totalTokens": 25,
    "latencyMs": 1200
  }
}

The enhanced payload provides:

  • User Identifier: The user_id field helps track which user or entity made the request (only included when explicitly set in the original request)
  • Combined S3 URL: A single URL that provides access to both the complete request and response data
  • Model Information: The model and provider used
  • Cost Data: Calculated cost based on token usage
  • Token Counts: Prompt, completion, and total token counts
  • Latency: Request-to-response time in milliseconds

Metadata Fields Explained

The metadata object contains valuable information about the request:

  • cost: Estimated cost of the request in USD, calculated based on the model’s pricing and token usage
  • promptTokens: Number of tokens in the prompt/request
  • completionTokens: Number of tokens in the completion/response
  • totalTokens: Total number of tokens used in the request (promptTokens + completionTokens)
  • latencyMs: Time taken to process the request in milliseconds

These metadata fields are particularly useful for:

  • Cost tracking and budget management
  • Performance monitoring and optimization
  • Usage analytics and reporting
  • Identifying potential issues with specific requests

This additional data makes it easier to track costs and analyze performance without making additional API calls.

Working with the Combined Request/Response URL

The request_response_url field provides a pre-signed S3 URL that contains both the complete request and response data in a single JSON object. This approach offers several advantages:

  1. Complete Data Access: Get the full, untruncated request and response data, including all fields and metadata.
  2. Single Request: Retrieve both request and response with a single HTTP call.
  3. Structured Format: The data is returned in a structured JSON format that’s easy to parse and process.

Example: Fetching and Processing the Combined Data

Here’s how to fetch and process the data from the request_response_url:

async function processWebhook(webhookData) {
  // Extract the combined URL and metadata from the webhook payload
  const { request_response_url, metadata } = webhookData;

  // Process metadata if available
  if (metadata) {
    console.log("Request cost:", metadata.cost);
    console.log("Total tokens:", metadata.totalTokens);
    console.log("Latency (ms):", metadata.latencyMs);

    // Example: Track costs by model
    trackModelCost(webhookData.model, metadata.cost);

    // Example: Alert on high latency
    if (metadata.latencyMs > 5000) {
      sendLatencyAlert(webhookData.request_id, metadata.latencyMs);
    }
  }

  if (request_response_url) {
    try {
      // Fetch the combined data
      const response = await fetch(request_response_url);
      const combinedData = await response.json();

      // Now you have access to both request and response data
      const { request, response: llmResponse } = combinedData;

      // Process the request data
      console.log("Request model:", request.model);
      console.log("User message:", request.messages[0].content);

      // Process the response data
      console.log("Response content:", llmResponse.choices[0].message.content);
      console.log("Token usage:", llmResponse.usage.total_tokens);

      // Perform your custom logic here
      // ...
    } catch (error) {
      console.error("Error fetching combined data:", error);
    }
  }
}

Common Use Cases

The combined request/response data is particularly useful for:

  1. Advanced Analytics: Analyze the full request and response to extract insights about your LLM usage.
  2. Cost Tracking: Access detailed token usage information to track costs across different models and requests.
  3. Quality Monitoring: Evaluate the quality of responses based on the complete context of the request.
  4. Data Archiving: Store the complete interaction data for compliance or historical analysis.

Webhook Security

Securing your webhook implementation is critical to protect sensitive data and prevent unauthorized access. Follow these best practices:

Signature Verification

Always validate the webhook signature to ensure requests are coming from Helicone:

import crypto from "crypto";

function verifyWebhookSignature(payload, signature, secret) {
  const hmac = crypto.createHmac("sha256", secret);
  hmac.update(JSON.stringify(payload));
  const calculatedSignature = hmac.digest("hex");

  return crypto.timingSafeEqual(
    Buffer.from(calculatedSignature, "hex"),
    Buffer.from(signature, "hex")
  );
}

// In your webhook handler
const isValid = verifyWebhookSignature(
  req.body,
  req.headers["helicone-signature"],
  process.env.HELICONE_WEBHOOK_SECRET
);

if (!isValid) {
  return res.status(401).json({ error: "Invalid signature" });
}

Secret Management

  • Store your webhook secret securely using environment variables or a secrets manager
  • Never hardcode the secret in your application code
  • Rotate the webhook secret periodically for enhanced security

HTTPS Only

  • Only use HTTPS endpoints for your webhooks
  • Configure proper TLS/SSL settings on your server
  • Ensure certificates are valid and up-to-date

Rate Limiting

Implement rate limiting on your webhook endpoint to protect against potential abuse:

// Example using Express rate limiter
const rateLimit = require("express-rate-limit");

const webhookLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // limit each IP to 100 requests per windowMs
  message: "Too many webhook requests, please try again later",
});

app.post("/webhook", webhookLimiter, (req, res) => {
  // Your webhook handler
});

Access Control

  • Restrict access to your webhook processing logic
  • Implement proper authentication for any systems that access webhook data
  • Use the principle of least privilege for services processing webhook data

Logging and Monitoring

  • Log all webhook requests (excluding sensitive data)
  • Monitor for unusual patterns or failed signature verifications
  • Set up alerts for potential security incidents

By implementing these security measures, you can ensure your webhook integration remains secure and reliable.

Troubleshooting Webhooks

Common Issues

  1. Missing or Invalid Signature

    • Ensure you’re using the correct HMAC key provided in the Helicone dashboard.
    • Verify that you’re calculating the signature using the entire request body.
  2. URL Expiration

    • The request_response_url is a pre-signed URL that expires after 30 minutes. Make sure to fetch the data promptly after receiving the webhook.
  3. Large Payloads

    • Remember that request and response bodies in the webhook payload are truncated if they exceed 10KB. Always use the request_response_url for complete data.
  4. Webhook Timeouts

    • Webhook delivery will time out after 2 minutes. Ensure your endpoint responds quickly, and consider using a queue for processing long-running tasks.

Debugging Tips

  1. Local Testing

  2. Logging

    • Implement comprehensive logging in your webhook handler to track received payloads and any processing errors.
  3. Retry Logic

    • Consider implementing retry logic in your webhook consumer to handle temporary failures when fetching the request_response_url data.
  4. Webhook Monitoring

    • Monitor webhook deliveries in the Helicone dashboard to identify any patterns of failures or issues.

Using Webhook Metadata for Analytics

The metadata included in webhook payloads provides valuable insights for monitoring and analyzing your LLM usage. Here are some common ways to leverage this data:

Cost Tracking

// Example: Aggregate costs by model
function trackCostsByModel(webhookData) {
  const { model, metadata } = webhookData;
  if (model && metadata?.cost) {
    // Update your database or analytics system
    db.incrementModelCost(model, metadata.cost);

    // Check against budget limits
    const currentSpend = db.getModelSpend(model);
    if (currentSpend > MODEL_BUDGET_LIMITS[model]) {
      sendBudgetAlert(model, currentSpend);
    }
  }
}

Performance Monitoring

// Example: Track latency percentiles
function trackLatency(webhookData) {
  const { metadata, model } = webhookData;
  if (metadata?.latencyMs) {
    // Add to your metrics system
    metrics.recordLatency(model, metadata.latencyMs);

    // Alert on slow responses
    if (metadata.latencyMs > LATENCY_THRESHOLDS[model]) {
      alerts.sendLatencyAlert({
        model,
        latency: metadata.latencyMs,
        requestId: webhookData.request_id,
        timestamp: new Date(),
      });
    }
  }
}

Token Usage Analysis

// Example: Monitor token efficiency
function analyzeTokenEfficiency(webhookData) {
  const { metadata } = webhookData;
  if (metadata?.promptTokens && metadata?.completionTokens) {
    // Calculate prompt-to-completion ratio
    const ratio = metadata.promptTokens / metadata.completionTokens;

    // Log for analysis
    analytics.logTokenRatio(webhookData.request_id, ratio);

    // Identify potentially inefficient prompts
    if (ratio > 10) {
      // Example threshold
      flagInefficientPrompt(webhookData.request_id, ratio);
    }
  }
}

By integrating these analytics into your webhook handler, you can gain real-time insights into your LLM usage patterns, costs, and performance metrics.

User Tracking with user_id

The user_id field in webhook payloads enables powerful user-specific analytics and monitoring:

// Example: Track usage by user
function trackUserUsage(webhookData) {
  const { user_id, metadata, model } = webhookData;

  if (user_id && metadata) {
    // Record usage in your database
    db.recordUserActivity({
      userId: user_id,
      requestId: webhookData.request_id,
      model: model,
      cost: metadata.cost,
      tokens: metadata.totalTokens,
      timestamp: new Date(),
    });

    // Check if user is approaching usage limits
    const userUsage = db.getUserMonthlyUsage(user_id);
    const userLimit = db.getUserLimit(user_id);

    if (userUsage > userLimit * 0.8) {
      notifications.sendUsageWarning(user_id, userUsage, userLimit);
    }
  }
}

Setting the user_id

To ensure the user_id is included in your webhook payloads:

  1. When making requests through Helicone, include the Helicone-User-Id header:

    const response = await fetch("https://oai.hconeai.com/v1/chat/completions", {
      headers: {
        "Helicone-Auth": "Bearer YOUR_HELICONE_API_KEY",
        "Helicone-User-Id": "user-123", // This will be included in webhooks
        // ... other headers
      },
      // ... request body
    });
    
  2. Alternatively, include the user ID in your request properties:

    const response = await fetch("https://oai.hconeai.com/v1/chat/completions", {
      headers: {
        "Helicone-Auth": "Bearer YOUR_HELICONE_API_KEY",
        "Helicone-Property-UserId": "user-123", // This will be included in webhooks
        // ... other headers
      },
      // ... request body
    });
    

Important: The user_id field will only be included in webhook payloads when it has been explicitly set using one of the methods above. If no user ID is provided in the original request, this field will be omitted from the webhook payload.

The user_id field makes it possible to build user-specific analytics, implement per-user rate limiting, and track usage patterns across your application.

Fetching Historical Requests

In addition to receiving real-time webhooks, you may need to access historical request data. Helicone provides an API for retrieving past requests:

Using the Helicone API

You can fetch historical requests using the Helicone API:

async function fetchHistoricalRequests(userId, startDate, endDate) {
  const response = await fetch("https://api.helicone.ai/v1/request", {
    method: "GET",
    headers: {
      Authorization: `Bearer ${HELICONE_API_KEY}`,
      "Content-Type": "application/json",
    },
    params: {
      userId: userId,
      startDate: startDate.toISOString(),
      endDate: endDate.toISOString(),
      // Additional filters as needed
    },
  });

  const data = await response.json();
  return data.requests;
}

Common Use Cases for Historical Data

  1. Audit Trails: Create comprehensive audit logs of all LLM interactions for compliance purposes.
  2. Usage Reports: Generate periodic reports on usage patterns and costs.
  3. Training Data Collection: Gather high-quality examples for fine-tuning models.
  4. Retroactive Analysis: Analyze past interactions to identify patterns or issues.

Combining Webhooks and Historical Data

For a complete solution, you can use webhooks for real-time processing and the API for historical data:

// Real-time processing via webhook
app.post("/webhook", async (req, res) => {
  // Process incoming webhook data
  processWebhookData(req.body);
  res.status(200).send("OK");
});

// Scheduled job for historical analysis
async function runWeeklyReport() {
  const endDate = new Date();
  const startDate = new Date(endDate);
  startDate.setDate(startDate.getDate() - 7);

  // Fetch last week's data
  const requests = await fetchHistoricalRequests(null, startDate, endDate);

  // Generate and send report
  const report = generateUsageReport(requests);
  await sendReportToStakeholders(report);
}

This dual approach ensures you have both immediate access to new data and the ability to analyze historical trends.