Scores
Add custom scoring metrics to your LLM requests and experiments. Evaluate prompt performance, compare results across datasets, and quantify model outputs for continuous improvement.
Who can use this feature: Anyone on any plan.
Introduction
Helicone’s scores API allows you to score your requests and experiments. You can use this feature to evaluate the performance of your prompts and compare different experiments and datasets. E.g., if you are building an image classification application, you might need a variety of scores to help you determine how accurate the outputs are compared to what you expect. For example, an image classification app might have one score that tells you how accurate the model classifies images into the correct categories, and another that measures the confidence level of the model’s predictions.
Example: Experiment scores.
Why Scores
Scoring request allows you:
- Evaluate the performance of your prompts.
- Compare the scores of different experiments and datasets.
We are currently not supporting autoscoring, but you can write your own logic and submit scores via our API.
Quick Start
Option 1: Using the Request Page
You can add scores to your requests
directly from the request page:
Example: Adding scores on the request page.
Option 2: Setting up your own Scoring Webhook
You can set up your own scoring webhook to score your requests. Here’s an example of how you can do this with Cloudflare Workers:
Create a Webhook
Create a webhook to be able receive request and response data create a webhook with your Scoring Worker URL.
Customize your scoring logic
You can can customize the scoring logic in the index.js
file in your Scoring Worker.
// You can customize the scoring function below and add more scores as needed.
function calculateScore(data: HeliconeRequest): Record<string, number> {
if (data.response_body) {
return {
vocabulary_diversity: calculateVocabularyDiversity(data.response_body),
// Add more scores here
};
}
return {};
}
Local Testing
If you want to test your scoring logic locally you’ll need to use wrangler secrets to add appropriate value for HELICONE_AUTH
.
$ wrangler secret put HELICONE_AUTH
And run Scoring Webhook locally:
$ wrangler deploy