When running evaluation frameworks to measure model performance, you need visibility into how well your AI applications are performing across different metrics. Scores let you report evaluation results from any framework to Helicone, providing centralized observability for accuracy, hallucination rates, helpfulness, and custom metrics.

Helicone doesn’t run evaluations for you - we’re not an evaluation framework. Instead, we provide a centralized location to report and analyze evaluation results from any framework (like RAGAS, LangSmith, or custom evaluations), giving you unified observability across all your evaluation metrics.

Why use Scores

  • Centralize evaluation results: Report scores from any evaluation framework for unified monitoring and analysis
  • Track model performance over time: Visualize how accuracy, hallucination rates, and other metrics evolve
  • Compare experiments side-by-side: Evaluate different prompts, models, or configurations with consistent metrics

Quick Start

1

Run your evaluation

Use your evaluation framework or custom logic to assess model responses and generate scores (integers or booleans) for metrics like accuracy, helpfulness, or safety.

2

Report scores to Helicone

Send evaluation results using the Helicone API:

// Get the request ID from response headers
const requestId = response.headers.get("helicone-id");

// Report evaluation scores
await fetch(`https://api.helicone.ai/v1/request/${requestId}/score`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${HELICONE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    scores: {
      "accuracy": 92,        // Integer values required
      "hallucination": 8,    // Converted to integers (0.08 * 100)
      "helpfulness": 85,
      "is_safe": true        // Booleans supported
    }
  })
});
3

View score analytics

Analyze evaluation results in the Helicone dashboard to track performance trends, compare experiments, and identify areas for improvement.

Scores are processed with a 10 minute delay by default for analytics aggregation.

API Format

Request Structure

The scores API expects this exact format:

FieldTypeDescriptionRequiredExample
scoresobjectKey-value pairs of evaluation metrics✅ Yes{"accuracy": 92}

Score Values

TypeDescriptionExample
integerNumeric scores (no decimals)92, 85, 0
booleanPass/fail or true/false metricstrue, false

Float values like 0.92 are rejected. Convert to integers: 0.9292

Use Cases

Evaluate retrieval-augmented generation for accuracy and hallucination:

import requests
from ragas import evaluate
from ragas.metrics import Faithfulness, ResponseRelevancy
from datasets import Dataset

# Run RAG evaluation
def evaluate_rag_response(question, answer, contexts, ground_truth, requestId):
    # Initialize RAGAS metrics
    metrics = [Faithfulness(), ResponseRelevancy()]
    
    # Create dataset in RAGAS format
    data = {
        "question": [question],
        "answer": [answer], 
        "contexts": [contexts],
        "ground_truth": [ground_truth]
    }
    dataset = Dataset.from_dict(data)
    
    # Run evaluation
    result = evaluate(dataset, metrics=metrics)
    
    # Extract scores (RAGAS returns 0-1 values)
    faithfulness_score = result['faithfulness'] if 'faithfulness' in result else 0
    relevancy_score = result['answer_relevancy'] if 'answer_relevancy' in result else 0
    
    # Report to Helicone (convert to 0-100 scale)
    response = requests.post(
        f"https://api.helicone.ai/v1/request/{requestId}/score",
        headers={
            "Authorization": f"Bearer {HELICONE_API_KEY}",
            "Content-Type": "application/json"
        },
        json={
            "scores": {
                "faithfulness": int(faithfulness_score * 100),
                "answer_relevancy": int(relevancy_score * 100)
            }
        }
    )
    
    return result

# Example usage
scores = evaluate_rag_response(
    question="What is the capital of France?",
    answer="The capital of France is Paris.",
    contexts=["France is a country in Europe. Paris is its capital."],
    ground_truth="Paris",
    requestId="your-request-id-here"
)