> ## Documentation Index
> Fetch the complete documentation index at: https://docs.helicone.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Datasets

> Curate and export LLM request/response data for fine-tuning, evaluation, and analysis

Transform your LLM requests into curated datasets for model fine-tuning, evaluation, and analysis. Helicone Datasets let you select, organize, and export your best examples with just a few clicks.

## Why Use Datasets

<CardGroup cols={2}>
  <Card title="Fine-Tuning" icon="brain">
    Create training datasets from your best requests for custom model fine-tuning
  </Card>

  <Card title="Model Evaluation" icon="chart-bar">
    Build evaluation sets to test model performance and compare different versions
  </Card>

  <Card title="Quality Control" icon="shield-check">
    Curate high-quality examples to improve prompt engineering and model outputs
  </Card>

  <Card title="Data Analysis" icon="magnifying-glass">
    Export structured data for external analysis and research
  </Card>
</CardGroup>

## Creating Datasets

### From the Requests Page

The easiest way to create datasets is by selecting requests from your logs:

<Steps>
  <Step title="Filter your requests">
    Use [custom properties](/observability/custom-properties) and filters to find the requests you want

    <Frame>
      <img src="https://mintcdn.com/helicone/tEQUFyBH7IjDxuEd/images/datasets/filters.webp?fit=max&auto=format&n=tEQUFyBH7IjDxuEd&q=85&s=a83618a1afd919bf3739f36aa1fbe635" alt="Filtering requests with custom properties and search criteria" width="1588" height="466" data-path="images/datasets/filters.webp" />
    </Frame>
  </Step>

  <Step title="Select requests">
    Check the boxes next to requests you want to include in your dataset

    <Frame>
      <img src="https://mintcdn.com/helicone/tEQUFyBH7IjDxuEd/images/datasets/datasets-select.webp?fit=max&auto=format&n=tEQUFyBH7IjDxuEd&q=85&s=edc642c3c188ce17e5029e2b155c413f" alt="Selecting multiple requests to add to dataset" width="1662" height="1678" data-path="images/datasets/datasets-select.webp" />
    </Frame>
  </Step>

  <Step title="Add to dataset">
    Click "Add to Dataset" and choose to create a new dataset or add to an existing one

    <Frame>
      <img src="https://mintcdn.com/helicone/tEQUFyBH7IjDxuEd/images/datasets/dataset-add.webp?fit=max&auto=format&n=tEQUFyBH7IjDxuEd&q=85&s=5f44725314a4dc6ffb77d97fadd01d30" alt="Adding selected requests to a dataset" width="958" height="612" data-path="images/datasets/dataset-add.webp" />
    </Frame>
  </Step>
</Steps>

### Via API

Create datasets programmatically for automated workflows:

```typescript theme={null}
// Create a new dataset
const response = await fetch('https://api.helicone.ai/v1/helicone-dataset', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${HELICONE_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    name: 'Customer Support Examples',
    description: 'High-quality support interactions for fine-tuning'
  })
});

const dataset = await response.json();

// Add requests to the dataset
await fetch(`https://api.helicone.ai/v1/helicone-dataset/${dataset.id}/request/${requestId}`, {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${HELICONE_API_KEY}`
  }
});
```

## Building Quality Datasets

### The Curation Process

Transform raw requests into high-quality training data through careful curation:

<Steps>
  <Step title="Collect broadly, then filter">
    Start by adding many potential examples, then narrow down to the best ones. It's easier to remove than to find examples later.
  </Step>

  <Step title="Review each example">
    <Frame>
      <img src="https://mintcdn.com/helicone/tEQUFyBH7IjDxuEd/images/datasets/datasets-edit.webp?fit=max&auto=format&n=tEQUFyBH7IjDxuEd&q=85&s=9645250ae945aabe6c476c5f3e138871" alt="Dataset curation interface showing request details for review" width="2150" height="1426" data-path="images/datasets/datasets-edit.webp" />
    </Frame>

    Examine each request/response pair for:

    * **Accuracy** - Is the response correct and helpful?
    * **Consistency** - Does it match the style and format you want?
    * **Completeness** - Does it fully address the user's request?
  </Step>

  <Step title="Remove poor examples">
    Delete any examples that are:

    * Incorrect or misleading responses
    * Off-topic or irrelevant
    * Inconsistent with your desired behavior
    * Edge cases that might confuse the model
  </Step>

  <Step title="Balance your dataset">
    Ensure you have:

    * Examples covering all common use cases
    * Both simple and complex queries
    * Appropriate distribution matching real usage
  </Step>
</Steps>

<Note>
  **Quality beats quantity** - 50-100 carefully curated examples often outperform thousands of uncurated ones. Focus on consistency and correctness over volume.
</Note>

### Dataset Dashboard

Access all your datasets at [helicone.ai/datasets](https://us.helicone.ai/datasets):

<Frame caption="Manage all your curated datasets in one place">
  <img src="https://mintcdn.com/helicone/tEQUFyBH7IjDxuEd/images/datasets/datasets-dashboard.webp?fit=max&auto=format&n=tEQUFyBH7IjDxuEd&q=85&s=55696116bd4314baf8670839821a4edf" alt="Helicone datasets dashboard with list of datasets and their metadata" width="2322" height="1198" data-path="images/datasets/datasets-dashboard.webp" />
</Frame>

From the dashboard you can:

* **Track progress** - Monitor dataset size and last updated time
* **Access datasets** - Click to view and curate contents
* **Export data** - Download datasets when ready for fine-tuning
* **Maintain quality** - Regularly review and improve your collections

## Exporting Data

### Export Formats

Download your datasets in various formats:

<Frame caption="Export options for downloading your dataset">
  <img src="https://mintcdn.com/helicone/tEQUFyBH7IjDxuEd/images/datasets/datasets-export.webp?fit=max&auto=format&n=tEQUFyBH7IjDxuEd&q=85&s=1a7abf38f7c5c58910ee65ecf23df7c0" alt="Dataset export dialog showing different format options" width="1074" height="958" data-path="images/datasets/datasets-export.webp" />
</Frame>

<Tabs>
  <Tab title="Fine-Tuning (JSONL)">
    Perfect for OpenAI fine-tuning format:

    ```json theme={null}
    {"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi there!"}]}
    {"messages": [{"role": "user", "content": "Help me"}, {"role": "assistant", "content": "I'd be happy to help!"}]}
    ```

    Ready to use directly with OpenAI's fine-tuning API.
  </Tab>

  <Tab title="Analysis (CSV)">
    Structured format for spreadsheet analysis:

    ```csv theme={null}
    request_id,created_at,model,prompt_tokens,completion_tokens,cost,user_message,assistant_response
    req_123,2024-01-15,gpt-4o,50,100,0.002,"Hello","Hi there!"
    req_124,2024-01-15,gpt-4o,45,95,0.0019,"Help me","I'd be happy to help!"
    ```

    Import into Excel, Google Sheets, or data analysis tools.
  </Tab>
</Tabs>

### API Export

Retrieve dataset contents programmatically:

```typescript theme={null}
// Query dataset contents
const response = await fetch(`https://api.helicone.ai/v1/helicone-dataset/${datasetId}/query`, {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${HELICONE_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    limit: 100,
    offset: 0
  })
});

const data = await response.json();
```

## Use Cases

### Replace Expensive Models with Fine-Tuned Alternatives

The most common use case - using your expensive model logs to train cheaper, faster models:

<Steps>
  <Step title="Log high-quality outputs">
    Start logging successful requests from o3, Claude 4.1 Sonnet, Gemini 2.5 Pro, or other premium models that represent your ideal outputs
  </Step>

  <Step title="Build task-specific datasets">
    Create separate datasets for different tasks (e.g., "customer support", "code generation", "data extraction")
  </Step>

  <Step title="Curate for consistency">
    Review examples to ensure responses follow the same format, style, and quality standards
  </Step>

  <Step title="Fine-tune smaller models">
    Export JSONL and fine-tune o3-mini, GPT-4o-mini, Gemini 2.5 Flash, or other models that are 10-50x cheaper
  </Step>

  <Step title="Iterate with production data">
    Continue collecting examples from your fine-tuned model to improve it over time
  </Step>
</Steps>

### Task-Specific Evaluation Sets

Build evaluation datasets to test model performance:

```typescript theme={null}
// Create eval sets for different capabilities
const datasets = {
  reasoning: 'Complex multi-step problems with verified solutions',
  extraction: 'Structured data extraction with known correct outputs',
  creativity: 'Creative writing with human-rated quality scores',
  edge_cases: 'Unusual inputs that often cause failures'
};
```

Use these to:

* Compare model versions before deploying
* Test prompt changes against consistent examples
* Identify model weaknesses and blind spots

### Continuous Improvement Pipeline

<Frame caption="Use scores and user feedback to identify your best examples">
  <img src="https://mintcdn.com/helicone/tEQUFyBH7IjDxuEd/images/datasets/scores.webp?fit=max&auto=format&n=tEQUFyBH7IjDxuEd&q=85&s=180e89cf2dec6b109f6bc26ed4274b19" alt="Filtering requests by scores to identify best examples for datasets" width="1278" height="770" data-path="images/datasets/scores.webp" />
</Frame>

Build a data flywheel for model improvement:

1. **Tag requests** with custom properties for easy filtering
2. **Score outputs** based on user feedback or automated metrics
3. **Auto-collect winners** into datasets when they meet quality thresholds
4. **Regular retraining** with newly curated examples
5. **A/B test** new models against production traffic

<Note>
  Start small - even 50-100 high-quality examples can significantly improve performance on specific tasks. Focus on one narrow use case first rather than trying to fine-tune a general-purpose model.
</Note>

## Best Practices

<CardGroup cols={2}>
  <Card title="Quality over Quantity" icon="star">
    Choose fewer, high-quality examples rather than large datasets with mixed quality
  </Card>

  <Card title="Diverse Examples" icon="shuffle">
    Include varied inputs, edge cases, and different user types in your datasets
  </Card>

  <Card title="Regular Updates" icon="arrows-rotate">
    Continuously add new examples as your application evolves and improves
  </Card>

  <Card title="Clear Criteria" icon="list-check">
    Document what makes a "good" example for each dataset's specific purpose
  </Card>
</CardGroup>

## Related Features

<CardGroup cols={2}>
  <Card title="Custom Properties" icon="tag" href="/features/advanced-usage/custom-properties">
    Tag requests to make dataset creation easier with filtering
  </Card>

  <Card title="User Metrics" icon="users" href="/features/advanced-usage/user-metrics">
    Track which users generate the best examples for your datasets
  </Card>

  <Card title="Sessions" icon="link" href="/features/sessions">
    Include full conversation context in your datasets
  </Card>

  <Card title="Feedback" icon="message" href="/features/advanced-usage/feedback">
    Use user ratings to automatically identify dataset candidates
  </Card>
</CardGroup>

***

Datasets turn your production LLM logs into valuable training and evaluation resources. Start small with a focused use case, then expand as you see the benefits of curated, high-quality data.
