Skip to main content
Context editing enables automatic management of conversation context by intelligently clearing old tool uses and thinking blocks. This can greatly reduce costs in long-running sessions with minimal tradeoffs in context performance.
Context editing is currently supported for Anthropic models only. The configuration is ignored when routing to other providers.

Why Context Editing

Prevent Context Overflow

Automatically clear old tool results before hitting context limits

Reduce Token Costs

Keep only relevant context, reducing input tokens on subsequent calls

Enable Long Sessions

Run AI agents for longer periods without manual context management

Quick Start

Enable context editing with a simple configuration. The AI Gateway handles the translation to Anthropic’s native format.
import OpenAI from "openai";
import { HeliconeChatCreateParams } from "@helicone/helpers";

const client = new OpenAI({
  apiKey: process.env.HELICONE_API_KEY,
  baseURL: "https://ai-gateway.helicone.ai/v1",
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-20250514",
  messages: [
    { role: "system", content: "You are a helpful coding assistant." },
    { role: "user", content: "Help me debug this application..." }
    // ... many tool calls and responses
  ],
  tools: [/* your tools */],
  context_editing: {
    enabled: true
  }
} as HeliconeChatCreateParams);

Configuration Options

The context_editing object supports two strategies for managing context:

Clear Tool Uses

Automatically clear old tool use results when context grows too large:
context_editing: {
  enabled: true,
  clear_tool_uses: {
    // Trigger clearing when input tokens exceed this threshold
    trigger: 100000,

    // Keep the most recent N tool uses
    keep: 5,

    // Ensure at least this many tokens are cleared
    clear_at_least: 20000,

    // Never clear results from these tools
    exclude_tools: ["get_user_preferences", "read_config"],

    // Clear tool inputs (arguments) but keep outputs
    clear_tool_inputs: true
  }
}
ParameterTypeDescription
triggernumberToken threshold to trigger clearing
keepnumberNumber of recent tool uses to preserve
clear_at_leastnumberMinimum tokens to clear when triggered
exclude_toolsstring[]Tool names that should never be cleared
clear_tool_inputsbooleanClear tool inputs while keeping outputs

Clear Thinking

Manage thinking/reasoning blocks in multi-turn conversations:
context_editing: {
  enabled: true,
  clear_thinking: {
    // Keep the N most recent thinking turns, or "all" to keep everything
    keep: 3
  }
}
ParameterTypeDescription
keepnumber | “all”Number of thinking turns to keep, or “all”

Complete Example

Here’s a full configuration for a long-running coding agent:
import OpenAI from "openai";
import { HeliconeChatCreateParams } from "@helicone/helpers";

const client = new OpenAI({
  apiKey: process.env.HELICONE_API_KEY,
  baseURL: "https://ai-gateway.helicone.ai/v1",
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-20250514",
  messages: conversationHistory,
  tools: [
    {
      type: "function",
      function: {
        name: "read_file",
        description: "Read a file from the filesystem",
        parameters: {
          type: "object",
          properties: {
            path: { type: "string", description: "File path to read" }
          },
          required: ["path"]
        }
      }
    },
    {
      type: "function",
      function: {
        name: "write_file",
        description: "Write content to a file",
        parameters: {
          type: "object",
          properties: {
            path: { type: "string" },
            content: { type: "string" }
          },
          required: ["path", "content"]
        }
      }
    },
    {
      type: "function",
      function: {
        name: "run_command",
        description: "Execute a shell command",
        parameters: {
          type: "object",
          properties: {
            command: { type: "string" }
          },
          required: ["command"]
        }
      }
    }
  ],
  reasoning_effort: "medium",
  context_editing: {
    enabled: true,
    clear_tool_uses: {
      trigger: 150000,        // Trigger at 150k tokens
      keep: 10,               // Keep last 10 tool uses
      clear_at_least: 50000,  // Clear at least 50k tokens
      exclude_tools: ["read_file"],  // Always keep file reads
      clear_tool_inputs: true  // Clear large file contents from inputs
    },
    clear_thinking: {
      keep: 5  // Keep last 5 thinking blocks
    }
  },
  max_completion_tokens: 16000
} as HeliconeChatCreateParams);

Responses API Support

Context editing works with both the Chat Completions API and the Responses API:
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.HELICONE_API_KEY,
  baseURL: "https://ai-gateway.helicone.ai/v1",
});

const response = await client.responses.create({
  model: "claude-sonnet-4-20250514",
  input: conversationInput,
  tools: [/* your tools */],
  context_editing: {
    enabled: true,
    clear_tool_uses: {
      trigger: 100000,
      keep: 5
    }
  }
});

Default Behavior

When context_editing.enabled is true but no specific strategies are provided, the AI Gateway uses sensible defaults:
// Minimal configuration
context_editing: {
  enabled: true
}

// Equivalent to
context_editing: {
  enabled: true,
  clear_tool_uses: {}  // Uses Anthropic defaults
}

  • Reasoning - Extended thinking that benefits from context editing
  • Prompt Caching - Cache static context for cost savings
  • Sessions - Track and analyze long-running agent sessions

Learn More

Anthropic Context Editing Documentation