Documentation Index Fetch the complete documentation index at: https://docs.helicone.ai/llms.txt
Use this file to discover all available pages before exploring further.
Build a Multi-Model AI Assistant with Cost Tracking
This guide shows you how to build a customer support assistant that intelligently routes queries to different AI models based on complexity, using Vercel AI Gateway for model access and Helicone for cost tracking and analytics.
Prerequisites
Setup
Install the required packages:
npm install @ai-sdk/gateway ai
Create the AI Client
Set up a client that routes through Helicone for monitoring:
import { createGateway } from '@ai-sdk/gateway' ;
import { generateText , tool } from 'ai' ;
import { z } from 'zod' ;
const gateway = createGateway ({
apiKey: process . env . VERCEL_AI_GATEWAY_API_KEY ,
baseURL: 'https://vercel.helicone.ai/v1/ai' ,
headers: {
'Helicone-Auth' : `Bearer ${ process . env . HELICONE_API_KEY } ` ,
}
});
Classify Query Complexity
Use gpt-4o-nano with tool calling for precise classification:
import { tool } from 'ai' ;
import { z } from 'zod' ;
const classifyTool = tool ({
description: 'Classify a customer support query by complexity' ,
parameters: z . object ({
complexity: z . enum ([ 'simple' , 'complex' , 'technical' ]). describe (
'simple: Basic questions about account, passwords, features. ' +
'complex: Refunds, complaints, escalations, urgent issues. ' +
'technical: API errors, integration issues, code problems.'
),
reasoning: z . string (). describe ( 'Brief explanation for the classification' )
})
});
async function classifyQueryComplexity ( query : string ) : Promise < 'simple' | 'complex' | 'technical' > {
const result = await generateText ({
model: gateway ( 'openai/gpt-4o-nano' ),
tools: {
classify: classifyTool
},
toolChoice: 'required' ,
prompt: `Classify this customer query: " ${ query } "`
});
// Get the classification from the tool call
const toolCall = result . toolCalls [ 0 ];
return toolCall . args . complexity ;
}
Route to Appropriate Model
Use different models based on query complexity to optimize costs:
async function handleCustomerQuery ( query : string , customerId : string ) {
const complexity = await classifyQueryComplexity ( query );
// Track complexity in Helicone
const headers = {
'Helicone-User-Id' : customerId ,
'Helicone-Property-Complexity' : complexity ,
'Helicone-Property-Department' : 'customer-support'
};
let model ;
switch ( complexity ) {
case 'simple' :
model = gateway ( 'openai/gpt-4o-mini' ); // Cheapest, handles basic queries
break ;
case 'complex' :
model = gateway ( 'openai/gpt-4o' ); // Better reasoning for complex issues
break ;
case 'technical' :
model = gateway ( 'anthropic/claude-3-5-sonnet' ); // Excellent for technical support
break ;
}
const response = await generateText ({
model ,
messages: [
{
role: 'system' ,
content: 'You are a helpful customer support assistant. Be concise and professional.'
},
{
role: 'user' ,
content: query
}
],
headers ,
temperature: 0.3 , // Lower temperature for consistent support responses
maxTokens: 200
});
return {
answer: response . text ,
model: complexity ,
usage: response . usage
};
}
Implement Response Caching
Cache all queries regardless of complexity for maximum cost savings:
async function handleQueryWithCache ( query : string , customerId : string ) {
const complexity = await classifyQueryComplexity ( query );
// Enable caching for all complexity levels
const headers = {
'Helicone-User-Id' : customerId ,
'Helicone-Property-Complexity' : complexity ,
'Helicone-Cache-Enabled' : 'true' ,
'Helicone-Cache-Bucket-Max-Size' : '10' ,
'Helicone-Cache-Seed' : 'support-v1'
};
// Select model based on complexity
let model ;
switch ( complexity ) {
case 'simple' :
model = gateway ( 'openai/gpt-4o-mini' );
break ;
case 'complex' :
model = gateway ( 'openai/gpt-4o' );
break ;
case 'technical' :
model = gateway ( 'anthropic/claude-3-5-sonnet' );
break ;
}
return await generateText ({
model ,
messages: [
{ role: 'system' , content: 'You are a helpful support agent.' },
{ role: 'user' , content: query }
],
headers ,
temperature: 0 // Zero temperature for consistent cache hits
});
}
Complete Support System
Here’s the full implementation:
import { createGateway } from '@ai-sdk/gateway' ;
import { generateText } from 'ai' ;
// Initialize AI Gateway with Helicone
const gateway = createGateway ({
apiKey: process . env . VERCEL_AI_GATEWAY_API_KEY ,
baseURL: 'https://vercel.helicone.ai/v1/ai' ,
headers: {
'Helicone-Auth' : `Bearer ${ process . env . HELICONE_API_KEY } ` ,
}
});
interface SupportTicket {
id : string ;
customerId : string ;
query : string ;
priority : 'low' | 'medium' | 'high' ;
}
async function processSupportTicket ( ticket : SupportTicket ) {
const complexity = await classifyQueryComplexity ( ticket . query );
// Model selection based on complexity and priority
let model ;
if ( ticket . priority === 'high' || complexity === 'technical' ) {
model = gateway ( 'anthropic/claude-3-5-sonnet' );
} else if ( complexity === 'complex' ) {
model = gateway ( 'openai/gpt-4o' );
} else {
model = gateway ( 'openai/gpt-4o-mini' );
}
try {
const response = await generateText ({
model ,
messages: [
{
role: 'system' ,
content: `You are a customer support agent. Priority: ${ ticket . priority } . Be helpful and professional.`
},
{
role: 'user' ,
content: ticket . query
}
],
headers: {
'Helicone-User-Id' : ticket . customerId ,
'Helicone-Property-TicketId' : ticket . id ,
'Helicone-Property-Priority' : ticket . priority ,
'Helicone-Property-Complexity' : complexity ,
// Enable caching for all queries
'Helicone-Cache-Enabled' : 'true' ,
'Helicone-Cache-Bucket-Max-Size' : '20' ,
'Helicone-Cache-Seed' : 'support-v1'
},
temperature: 0 , // Zero temperature for consistent cache hits
maxTokens: 250
});
return {
ticketId: ticket . id ,
response: response . text ,
model: model . modelId ,
cost: response . usage // Track in Helicone dashboard
};
} catch ( error ) {
console . error ( 'Support ticket processing failed:' , error );
throw error ;
}
}
// Example usage
const ticket : SupportTicket = {
id: 'TICKET-12345' ,
customerId: 'CUST-789' ,
query: 'How do I reset my password?' ,
priority: 'low'
};
const result = await processSupportTicket ( ticket );
console . log ( `Response sent to customer: ${ result . response } ` );
View your assistant’s performance in Helicone:
Cost Analysis : Compare costs across different models
Response Times : Monitor latency by model and complexity
Cache Hit Rate : Track savings from cached responses
User Analytics : See which customers need the most support
Optimize Based on Data
Use Helicone’s analytics to:
Identify common queries for caching
Adjust model selection thresholds
Track cost per ticket complexity
Monitor customer satisfaction by model
Next Steps
Custom Properties Track additional metadata
Caching Reduce costs with smart caching
User Metrics Analyze per-customer usage
Alerts Set up cost and error alerts